- 14 6月, 2017 8 次提交
-
-
由 Yazen Ghannam 提交于
The bootlog option is only disabled by default on AMD Fam10h and older systems. Update bootlog description to say this. Change the family value to hex to avoid confusion. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
AMD systems have non-core, shared MCA banks within a die. These banks are controlled by a master CPU per die. If this CPU is offlined then all the shared banks are disabled in addition to the CPU's core banks. Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared between SMT thread siblings. If a CPU is offlined then all its sibling's MCA banks are also disabled. Extend the existing vendor check to AMD too. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> [ Fix up comment. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-8-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Populate the MCE injection struct before doing initial injection so that values which don't change have sane defaults. Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NYazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Not really needed. Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NYazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-6-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Make the mcelog call a notifier which lands in the injector module and does the injection. This allows for mce-inject to be a normal kernel module now. Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NYazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-5-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Reuse mce_amd_inj's debugfs interface so that mce-inject can benefit from it too. The old functionality is still preserved under CONFIG_X86_MCELOG_LEGACY. Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NYazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
In the amd_threshold_interrupt() handler, we loop through every possible block in each bank and rediscover the block's address and if it's valid, e.g. valid, counter present and not locked. However, we already have the address saved in the threshold blocks list for each CPU and bank. The list only contains blocks that have passed all the valid checks. Besides the redundancy, there's also a smp_call_function* in get_block_address() which causes a warning when servicing the interrupt: WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0 ... Call Trace: <IRQ> rdmsr_safe_on_cpu() get_block_address.isra.2() amd_threshold_interrupt() smp_threshold_interrupt() threshold_interrupt() because we do get called in an interrupt handler *with* interrupts disabled, which can result in a deadlock. Drop the redundant valid checks and move the overflow check, logging and block reset into a separate function. Check the first block then iterate over the rest. This procedure is needed since the first block is used as the head of the list. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS. This may cause the following warning: unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000) Call Trace: <IRQ> smp_threshold_interrupt() threshold_interrupt() Use msr_stat instead which has the MSR address. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 37d43acf ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 6月, 2017 1 次提交
-
-
由 Dominik Brodowski 提交于
During early boot, load_ucode_intel_ap() uses __load_ucode_intel() to obtain a pointer to the relevant microcode patch (embedded in the initrd), and stores this value in 'intel_ucode_patch' to speed up the microcode patch application for subsequent CPUs. On resuming from suspend-to-RAM, however, load_ucode_ap() calls load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is long gone so the pointer stored in 'intel_ucode_patch' no longer points to a valid microcode patch. Clear that pointer so that we effectively fall back to the CPU hotplug notifier callbacks to update the microcode. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> [ Edit and massage commit message. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.10.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 6月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
native_safe_halt enables interrupts, and you just shouldn't call rcu_irq_enter() with interrupts enabled. Reorder the call with the following local_irq_disable() to respect the invariant. Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 05 6月, 2017 1 次提交
-
-
由 Christian Sünkenberg 提交于
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to report an inverted Device ID in its DIR0 configuration register, specifically 0xb instead of the expected 0x4. Catch this presumably quirky version so it's properly recognized as GX1 and has its cache switched to write-back mode, which provides a significant performance boost in most workloads. SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip", states in section 1.1.7.1 "Device ID" that device identification values are specified in SC1100's device errata. These, however, seem to not have been publicly released. Wading through a number of boot logs and /proc/cpuinfo dumps found on pastebin and blogs, this patch should mostly be relevant for a number of now admittedly aging Soekris NET4801 and PC Engines WRAP devices, the latter being the platform this issue was discovered on. Performance impact was verified using "openssl speed", with write-back caching scaling throughput between -3% and +41%. Signed-off-by: NChristian Sünkenberg <christian.suenkenberg@student.kit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 5月, 2017 2 次提交
-
-
由 Borislav Petkov 提交于
... to raw_smp_processor_id() to not trip the BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 check. The reasoning behind it is that __warn() already uses the raw_ variants but the show_regs() path on 32-bit doesn't. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528092212.fiod7kygpjm23m3o@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 5月, 2017 2 次提交
-
-
由 Thomas Gleixner 提交于
ftrace use module_alloc() to allocate trampoline pages. The mapping of module_alloc() is RWX, which makes sense as the memory is written to right after allocation. But nothing makes these pages RO after writing to them. Add proper set_memory_rw/ro() calls to protect the trampolines after modification. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Fix kprobes to set(recover) RWX bits correctly on trampoline buffer before releasing it. Releasing readonly page to module_memfree() crash the kernel. Without this fix, if kprobes user register a bunch of kprobes in function body (since kprobes on function entry usually use ftrace) and unregister it, kernel hits a BUG and crash. Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Fixes: d0381c81 ("kprobes/x86: Set kprobes pages read-only") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 26 5月, 2017 1 次提交
-
-
由 Jan Kiszka 提交于
This ensures that adjustments to x86_platform done by the hypervisor setup is already respected by this simple calibration. The current user of this, introduced by 1b5aeebf ("x86/earlyprintk: Add support for earlyprintk via USB3 debug port"), comes much later into play. Fixes: dd759d93 ("x86/timers: Add simple udelay calibration") Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLu Baolu <baolu.lu@linux.intel.com> Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com
-
- 24 5月, 2017 2 次提交
-
-
由 Mateusz Jurczyk 提交于
In the current form of the code, if a->replacementlen is 0, the reference to *insnbuf for comparison touches potentially garbage memory. While it doesn't affect the execution flow due to the subsequent a->replacementlen comparison, it is (rightly) detected as use of uninitialized memory by a runtime instrumentation currently under my development, and could be detected as such by other tools in the future, too (e.g. KMSAN). Fix the "false-positive" by reordering the conditions to first check the replacement instruction length before referencing specific opcode bytes. Signed-off-by: NMateusz Jurczyk <mjurczyk@google.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170524135500.27223-1-mjurczyk@google.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Josh Poimboeuf 提交于
Dave Jones and Steven Rostedt reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffff8800bda0ff30 in sshd:1090 has bad value 000055b32abf1fa8 In both cases, the unwinder was attempting to unwind from an ftrace handler into entry code. The callchain was something like: syscall entry code C function ftrace handler save_stack_trace() The problem is that the unwinder's end-of-stack logic gets confused by the way ftrace lays out the stack frame (with fentry enabled). I was able to recreate this warning with: echo call_usermodehelper_exec_async:stacktrace > /sys/kernel/debug/tracing/set_ftrace_filter (exit login session) I considered fixing this by changing the ftrace code to rewrite the stack to make the unwinder happy. But that seemed too intrusive after I implemented it. Instead, just add another check to the unwinder's end-of-stack logic to detect this special case. Side note: We could probably get rid of these end-of-stack checks by encoding the frame pointer for syscall entry just like we do for interrupt entry. That would be simpler, but it would also be a lot more intrusive since it would slightly affect the performance of every syscall. Reported-by: NDave Jones <davej@codemonkey.org.uk> Reported-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: live-patching@vger.kernel.org Fixes: c32c47c6 ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/671ba22fbc0156b8f7e0cfa5ab2a795e08bc37e1.1495553739.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 5月, 2017 4 次提交
-
-
由 Yazen Ghannam 提交于
Scalable MCA systems have a new MCA_CONFIG register that we use to configure each bank. We currently use this when we set up thresholding. However, this is logically separate. Group all SMCA-related initialization into a single function. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yazen Ghannam 提交于
We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current SMCA implementations of is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. If we logged a Deferred error in MCA_STATUS then we should also clear MCA_DESTAT. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Rework __log_error() to only log an error and add helpers for the different error types being logged from the corresponding interrupt handlers. Boris: carve out common functionality into a _log_error_bank(). Cleanup comments, check MCi_STATUS bits before reading MSRs. Streamline flow. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Elena Reshetova 提交于
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Suggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com> Reviewed-by: NDavid Windsor <dwindsor@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1492695536-5947-1-git-send-email-elena.reshetova@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Export the function which checks whether an MCE is a memory error to other users so that we can reuse the logic. Drop the boot_cpu_data use, while at it, as mce.cpuvendor already has the CPU vendor in there. Integrate a piece from a patch from Vishal Verma <vishal.l.verma@intel.com> to export it for modules (nfit). The main reason we're exporting it is that the nfit handler nfit_handle_mce() needs to detect a memory error properly before doing its recovery actions. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 15 5月, 2017 1 次提交
-
-
由 Wanpeng Li 提交于
Reported by syzkaller: BUG: unable to handle kernel paging request at ffffffffc07f6a2e IP: report_bug+0x94/0x120 PGD 348e12067 P4D 348e12067 PUD 348e14067 PMD 3cbd84067 PTE 80000003f7e87161 Oops: 0003 [#1] SMP CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G OE 4.11.0+ #8 task: ffff92fdfb525400 task.stack: ffffbda6c3d04000 RIP: 0010:report_bug+0x94/0x120 RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202 do_trap+0x156/0x170 do_error_trap+0xa3/0x170 ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? mark_held_locks+0x79/0xa0 ? retint_kernel+0x10/0x10 ? trace_hardirqs_off_thunk+0x1a/0x1c do_invalid_op+0x20/0x30 invalid_op+0x1e/0x30 RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm] kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm] kvm_vcpu_ioctl+0x384/0x780 [kvm] ? kvm_vcpu_ioctl+0x384/0x780 [kvm] ? sched_clock+0x13/0x20 ? __do_page_fault+0x2a0/0x550 do_vfs_ioctl+0xa4/0x700 ? up_read+0x1f/0x40 ? __do_page_fault+0x2a0/0x550 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 SDM mentioned that "The MXCSR has several reserved bits, and attempting to write a 1 to any of these bits will cause a general-protection exception(#GP) to be generated". The syzkaller forks' testcase overrides xsave area w/ random values and steps on the reserved bits of MXCSR register. The damaged MXCSR register values of guest will be restored to SSEx MXCSR register before vmentry. This patch fixes it by catching userspace override MXCSR register reserved bits w/ random values and bails out immediately. Reported-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 13 5月, 2017 1 次提交
-
-
由 Andrew Morton 提交于
Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 5月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 09 5月, 2017 4 次提交
-
-
由 Andy Lutomirski 提交于
This partially reverts commit: 23b2a4dd ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up") That commit had one definite bug and one potential bug. The definite bug is that setup_per_cpu_areas() uses a differnet generic implementation on UP kernels, so initial_page_table never got resynced. This was fine for access to percpu data (it's in the identity map on UP), but it breaks other users of initial_page_table. The potential bug is that helpers like efi_init() would be called before the tables were synced. Avoid both problems by just syncing the page tables in setup_arch() *and* setup_per_cpu_areas(). Reported-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Laura Abbott 提交于
Now that all call sites, completely decouple cacheflush.h and set_memory.h [sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling] Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.comSigned-off-by: NLaura Abbott <labbott@redhat.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Laura Abbott 提交于
set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.comSigned-off-by: NLaura Abbott <labbott@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
__vmalloc* allows users to provide gfp flags for the underlying allocation. This API is quite popular $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l 77 The only problem is that many people are not aware that they really want to give __GFP_HIGHMEM along with other flags because there is really no reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages which are mapped to the kernel vmalloc space. About half of users don't use this flag, though. This signals that we make the API unnecessarily too complex. This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM are simplified and drop the flag. Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Cristopher Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 5月, 2017 2 次提交
-
-
由 Xunlei Pang 提交于
Kexec sets up all identity mappings before booting into the new kernel, and this will cause extra memory consumption for paging structures which is quite considerable on modern machines with huge memory sizes. E.g. on a 32TB machine that is kdumping, it could waste around 128MB (around 4MB/TB) from the reserved memory after kexec sets all the identity mappings using the current 2MB page. Add to that the memory needed for the loaded kdump kernel, initramfs, etc., and it causes a kexec syscall -NOMEM failure. As a result, we had to enlarge reserved memory via "crashkernel=X" to work around this problem. This causes some trouble for distributions that use policies to evaluate the proper "crashkernel=X" value for users. So enable gbpages for kexec mappings. Signed-off-by: NXunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Xunlei Pang 提交于
Kernel identity mappings on x86-64 kernels are created in two ways: by the early x86 boot code, or by kernel_ident_mapping_init(). Native kernels (which is the dominant usecase) use the former, but the kexec and the hibernation code uses kernel_ident_mapping_init(). There's a subtle difference between these two ways of how identity mappings are created, the current kernel_ident_mapping_init() code creates identity mappings always using 2MB page(PMD level) - while the native kernel boot path also utilizes gbpages where available. This difference is suboptimal both for performance and for memory usage: kernel_ident_mapping_init() needs to allocate pages for the page tables when creating the new identity mappings. This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init() to address these concerns. The primary advantage would be better TLB coverage/performance, because we'd utilize 1GB TLBs instead of 2MB ones. It is also useful for machines with large number of memory to save paging structure allocations(around 4MB/TB using 2MB page) when setting identity mappings for all the memory, after using 1GB page it will consume only 8KB/TB. ( Note that this change alone does not activate gbpages in kexec, we are doing that in a separate patch. ) Signed-off-by: NXunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 5月, 2017 4 次提交
-
-
由 Juergen Gross 提交于
There is no user of x86_hyper->set_cpu_features() any more. Remove it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Juergen Gross 提交于
There is no need to set the same capabilities for each cpu individually. This can be done for all cpus in platform initialization. Cc: Alok Kataria <akataria@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: NAlok Kataria <akataria@vmware.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Vitaly Kuznetsov 提交于
All code to support Xen PV will get under this new option. For the beginning, check for it in the common code. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Vitaly Kuznetsov 提交于
As a preparation to splitting the code we need to untangle it: x86_hyper_xen -> x86_hyper_xen_hvm and x86_hyper_xen_pv xen_platform() -> xen_platform_hvm() and xen_platform_pv() xen_cpu_up_prepare() -> xen_cpu_up_prepare_pv() and xen_cpu_up_prepare_hvm() xen_cpu_dead() -> xen_cpu_dead_pv() and xen_cpu_dead_pv_hvm() Add two parameters to xen_cpuhp_setup() to pass proper cpu_up_prepare and cpu_dead hooks. xen_set_cpu_features() is now PV-only so the redundant xen_pv_domain() check can be dropped. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 27 4月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
IOMMU harms performance signficantly when we run very fast networking workloads. It's 40GB networking doing XDP test. Software overhead is almost unaware, but it's the IOTLB miss (based on our analysis) which kills the performance. We observed the same performance issue even with software passthrough (identity mapping), only the hardware passthrough survives. The pps with iommu (with software passthrough) is only about ~30% of that without it. This is a limitation in hardware based on our observation, so we'd like to disable the IOMMU force on, but we do want to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not eabling IOMMU is totally ok. So introduce a new boot option to disable the force on. It's kind of silly we need to run into intel_iommu_init even without force on, but we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 26 4月, 2017 3 次提交
-
-
由 Andy Lutomirski 提交于
mark_screen_rdonly() is the last remaining caller of flush_tlb(). flush_tlb_mm_range() is potentially faster and isn't obsolete. Compile-tested only because I don't know whether software that uses this mechanism even exists. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
Currently unwind_dump() dumps only the most recently accessed stack. But it has a few issues. In some cases, 'first_sp' can get out of sync with 'stack_info', causing unwind_dump() to start from the wrong address, flood the printk buffer, and eventually read a bad address. In other cases, dumping only the most recently accessed stack doesn't give enough data to diagnose the error. Fix both issues by dumping *all* stacks involved in the trace, not just the last one. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8b5e99f0 ("x86/unwind: Dump stack data on warnings") Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
Borislav Petkov reported the following unwinder warning: WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30 unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0 ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38) ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35) ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68) ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50) ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30) ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50) ffffc9000024fed8: 0000000000000000 ... ffffc9000024fee0: 0000000000000100 (0x100) ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488) ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000) ffffc9000024fef8: 0000000000000000 ... ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98) ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00) ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10) ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10) ffffc9000024ff30: 0000000000000010 (0x10) ffffc9000024ff38: 0000000000000296 (0x296) ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50) ffffc9000024ff48: 0000000000000018 (0x18) ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8) ... It unwinded from an interrupt which came in right after entry code called into a C syscall handler, before it had a chance to set up the frame pointer, so regs->bp still had its user space value. Add a check to silence warnings in such a case, where an interrupt has occurred and regs->sp is almost at the end of the stack. Reported-by: NBorislav Petkov <bp@suse.de> Tested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c32c47c6 ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 4月, 2017 1 次提交
-
-
由 Steven Rostedt (VMware) 提交于
Fengguang Wu's zero day bot triggered a stack unwinder dump. This can be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry is in use on x86_32. ># cd /sys/kernel/debug/tracing ># echo 'p:schedule schedule' > kprobe_events ># echo stacktrace > events/kprobes/schedule/trigger This is because the code that implemented fentry in the ftrace_regs_caller tried to use the least amount of #ifdefs, and modified ebp when CC_USE_FENTRY was defined to point to the parent ip as it does when CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it corrupts the ebp register for this frame while doing the tracing. NOTE, it does not corrupt ebp in any other way. It is just a bad frame pointer when calling into the tracing infrastructure. The original ebp is restored before returning from the fentry call. But if a stack trace is performed inside the tracing, the unwinder will notice the bad ebp. Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip into the second parameter (%edx), and have an #else that does it the original way. The unwinder will unfortunately miss the function being traced, as the stack frame is not set up yet for it, as it is for x86_64. But fixing that is a bit more complex and did not work before anyway. This has been tested with and without FRAME_POINTERS being set while using -mfentry, as well as using an older compiler that uses mcount. Analyzed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Fixes: 644e0e8d ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set") Reported-by: Nkernel test robot <fengguang.wu@intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.homeSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-