- 19 5月, 2018 5 次提交
-
-
由 Kirill A. Shutemov 提交于
__pgtable_l5_enabled shouldn't be needed after system has booted, we can mark it as __initdata, but it requires preparation. KASAN initialization code is a user of USE_EARLY_PGTABLE_L5, so all pgtable_l5_enabled() translated to __pgtable_l5_enabled there, including the one in p4d_offset(). It may lead to section mismatch, if a compiler would not inline p4d_offset(), but leave it as a standalone function: p4d_offset() is not marked as __init. Marking p4d_offset() as __always_inline fixes the issue. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-7-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
This kernel parameter allows to force kernel to use 4-level paging even if hardware and kernel support 5-level paging. The option may be useful to work around regressions related to 5-level paging. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer to it as a variable. This is misleading. Make pgtable_l5_enabled() a function. We cannot literally define it as a function due to circular dependencies between header files. Function-alike macros is close enough. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
Usually pgtable_l5_enabled is defined using cpu_feature_enabled(). cpu_feature_enabled() is not available in early boot code. We use several different preprocessor tricks to get around it. It's messy. Unify them all. If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can be defined before all includes. It makes pgtable_l5_enabled rely on __pgtable_l5_enabled variable instead. This approach fits all early users. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
Hugh noticied that we calculate the address of the trampoline page table incorrectly in cleanup_trampoline(). TRAMPOLINE_32BIT_PGTABLE_OFFSET has to be divided by sizeof(unsigned long), since trampoline_32bit is an 'unsigned long' pointer. TRAMPOLINE_32BIT_PGTABLE_OFFSET is zero so the bug doesn't have a visible effect. Reported-by: NHugh Dickins <hughd@google.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: e9d0e633 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Link: http://lkml.kernel.org/r/20180518103528.59260-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 5月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
Rick bisected a regression on large systems which use the x2apic cluster mode for interrupt delivery to the commit wich reworked the cluster management. The problem is caused by a missing initialization of the clusterid field in the shared cluster data structures. So all structures end up with cluster ID 0 which only allows sharing between all CPUs which belong to cluster 0. All other CPUs with a cluster ID > 0 cannot share the data structure because they cannot find existing data with their cluster ID. This causes malfunction with IPIs because IPIs are sent to the wrong cluster and the caller waits for ever that the target CPU handles the IPI. Add the missing initialization when a upcoming CPU is the first in a cluster so that the later booting CPUs can find the data and share it for proper operation. Fixes: 023a6117 ("x86/apic/x2apic: Simplify cluster management") Reported-by: NRick Warner <rick@microway.com> Bisected-by: NRick Warner <rick@microway.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NRick Warner <rick@microway.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
-
- 16 5月, 2018 2 次提交
-
-
由 Kirill A. Shutemov 提交于
cleanup_trampoline() relocates the top-level page table out of trampoline memory. We use 'top_pgtable' as our new top-level page table. But if the 'top_pgtable' would be referenced from C in a usual way, the address of the table will be calculated relative to RIP. After kernel gets relocated, the address will be in the middle of decompression buffer and the page table may get overwritten. This leads to a crash. We calculate the address of other page tables relative to the relocation address. It makes them safe. We should do the same for 'top_pgtable'. Calculate the address of 'top_pgtable' in assembly and pass down to cleanup_trampoline(). Move the page table to .pgtable section where the rest of page tables are. The section is @nobits so we save 4k in kernel image. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: e9d0e633 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
Eric and Hugh have reported instant reboot due to my recent changes in decompression code. The root cause is that I didn't realize that we need to adjust GOT to be able to run C code that early. The problem is only visible with an older toolchain. Binutils >= 2.24 is able to eliminate GOT references by replacing them with RIP-relative address loads: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec We need to adjust GOT two times: - before calling paging_prepare() using the initial load address - before calling C code from the relocated kernel Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Reported-by: NHugh Dickins <hughd@google.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 194a9749 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G") Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 5月, 2018 7 次提交
-
-
由 Dave Hansen 提交于
mm_pkey_is_allocated() treats pkey 0 as unallocated. That is inconsistent with the manpages, and also inconsistent with mm->context.pkey_allocation_map. Stop special casing it and only disallow values that are actually bad (< 0). The end-user visible effect of this is that you can now use mprotect_pkey() to set pkey=0. This is a bit nicer than what Ram proposed[1] because it is simpler and removes special-casing for pkey 0. On the other hand, it does allow applications to pkey_free() pkey-0, but that's just a silly thing to do, so we are not going to protect against it. The scenario that could happen is similar to what happens if you free any other pkey that is in use: it might get reallocated later and used to protect some other data. The most likely scenario is that pkey-0 comes back from pkey_alloc(), an access-disable or write-disable bit is set in PKRU for it, and the next stack access will SIGSEGV. It's not horribly different from if you mprotect()'d your stack or heap to be unreadable or unwritable, which is generally very foolish, but also not explicitly prevented by the kernel. 1. http://lkml.kernel.org/r/1522112702-27853-1-git-send-email-linuxram@us.ibm.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org>p Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 58ab9a08 ("x86/pkeys: Check against max pkey to avoid overflows") Link: http://lkml.kernel.org/r/20180509171358.47FD785E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
I got a bug report that the following code (roughly) was causing a SIGSEGV: mprotect(ptr, size, PROT_EXEC); mprotect(ptr, size, PROT_NONE); mprotect(ptr, size, PROT_READ); *ptr = 100; The problem is hit when the mprotect(PROT_EXEC) is implicitly assigned a protection key to the VMA, and made that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() failed to remove the protection key, and the PROT_NONE-> PROT_READ left the PTE usable, but the pkey still in place and left the memory inaccessible. To fix this, we ensure that we always "override" the pkee at mprotect() if the VMA does not have execute-only permissions, but the VMA has the execute-only pkey. We had a check for PROT_READ/WRITE, but it did not work for PROT_NONE. This entirely removes the PROT_* checks, which ensures that PROT_NONE now works. Reported-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 62b5f7d0 ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Link: http://lkml.kernel.org/r/20180509171351.084C5A71@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Potapenko 提交于
Clang builds with defconfig started crashing after the following commit: fb43d6cb ("x86/mm: Do not auto-massage page protections") This was caused by introducing a new global access in __startup_64(). Code in __startup_64() can be relocated during execution, but the compiler doesn't have to generate PC-relative relocations when accessing globals from that function. Clang actually does not generate them, which leads to boot-time crashes. To work around this problem, every global pointer must be adjusted using fixup_pointer(). Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDave Hansen <dave.hansen@intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dvyukov@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Cc: md@google.com Cc: mka@chromium.org Fixes: fb43d6cb ("x86/mm: Do not auto-massage page protections") Link: http://lkml.kernel.org/r/20180509091822.191810-1-glider@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
Workaround for the sake of BPF compilation which utilizes kernel headers, but clang does not support ASM GOTO and fails the build. Fixes: d0266046 ("x86: Remove FAST_FEATURE_TESTS") Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: daniel@iogearbox.net Cc: peterz@infradead.org Cc: netdev@vger.kernel.org Cc: bp@alien8.de Cc: yhs@fb.com Cc: kernel-team@fb.com Cc: torvalds@linux-foundation.org Cc: davem@davemloft.net Link: https://lkml.kernel.org/r/20180513193222.1997938-1-ast@kernel.org
-
由 Masami Hiramatsu 提交于
Since MOV SS and POP SS instructions will delay the exceptions until the next instruction is executed, single-stepping on it by uprobes must be prohibited. uprobe already rejects probing on POP SS (0x1f), but allows probing on MOV SS (0x8e and reg == 2). This checks the target instruction and if it is MOV SS or POP SS, returns -ENOTSUPP to reject probing. Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Yonghong Song <yhs@fb.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S . Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/152587072544.17316.5950935243917346341.stgit@devbox
-
由 Masami Hiramatsu 提交于
Since MOV SS and POP SS instructions will delay the exceptions until the next instruction is executed, single-stepping on it by kprobes must be prohibited. However, kprobes usually executes those instructions directly on trampoline buffer (a.k.a. kprobe-booster), except for the kprobes which has post_handler. Thus if kprobe user probes MOV SS with post_handler, it will do single-stepping on the MOV SS. This means it is safe that if it is used via ftrace or perf/bpf since those don't use the post_handler. Anyway, since the stack switching is a rare case, it is safer just rejecting kprobes on such instructions. Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Yonghong Song <yhs@fb.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S . Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/152587069574.17316.3311695234863248641.stgit@devbox
-
由 Tetsuo Handa 提交于
>From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 9 May 2018 12:12:39 +0900 Subject: [PATCH v3] x86/kexec: avoid double free_page() upon do_kexec_load() failure. syzbot is reporting crashes after memory allocation failure inside do_kexec_load() [1]. This is because free_transition_pgtable() is called by both init_transition_pgtable() and machine_kexec_cleanup() when memory allocation failed inside init_transition_pgtable(). Regarding 32bit code, machine_kexec_free_page_tables() is called by both machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory allocation failed inside machine_kexec_alloc_page_tables(). Fix this by leaving the error handling to machine_kexec_cleanup() (and optionally setting NULL after free_page()). [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40 Fixes: f5deb796 ("x86: kexec: Use one page table in x86_64 machine_kexec") Fixes: 92be3d6b ("kexec/i386: allocate page table pages dynamically") Reported-by: Nsyzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBaoquan He <bhe@redhat.com> Cc: thomas.lendacky@amd.com Cc: prudo@linux.vnet.ibm.com Cc: Huang Ying <ying.huang@intel.com> Cc: syzkaller-bugs@googlegroups.com Cc: takahiro.akashi@linaro.org Cc: H. Peter Anvin <hpa@zytor.com> Cc: akpm@linux-foundation.org Cc: dyoung@redhat.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jp
-
- 13 5月, 2018 5 次提交
-
-
由 Thomas Gleixner 提交于
No point to have it at the call sites. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 David Wang 提交于
Centaur CPUs enumerate the cache topology in the same way as Intel CPUs, but the function is unused so for. The Centaur init code also misses to initialize x86_info::max_cores, so the CPU topology can't be described correctly. Initialize x86_info::max_cores and invoke init_cacheinfo() to make CPU and cache topology information available and correct. Signed-off-by: NDavid Wang <davidwang@zhaoxin.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: lukelin@viacpu.com Cc: qiyuanwang@zhaoxin.com Cc: gregkh@linuxfoundation.org Cc: brucechang@via-alliance.com Cc: timguo@zhaoxin.com Cc: cooperyan@zhaoxin.com Cc: hpa@zytor.com Cc: benjaminpan@viatech.com Link: https://lkml.kernel.org/r/1525314766-18910-4-git-send-email-davidwang@zhaoxin.com
-
由 David Wang 提交于
There is no point in having the conditional cpu_detect_cache_sizes() call at the callsite of init_intel_cacheinfo(). Move it into init_intel_cacheinfo() and make init_intel_cacheinfo() void. [ tglx: Made the init_intel_cacheinfo() void as the return value was pointless. Adjust changelog accordingly ] Signed-off-by: NDavid Wang <davidwang@zhaoxin.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: lukelin@viacpu.com Cc: qiyuanwang@zhaoxin.com Cc: gregkh@linuxfoundation.org Cc: brucechang@via-alliance.com Cc: timguo@zhaoxin.com Cc: cooperyan@zhaoxin.com Cc: hpa@zytor.com Cc: benjaminpan@viatech.com Link: https://lkml.kernel.org/r/1525314766-18910-3-git-send-email-davidwang@zhaoxin.com
-
由 David Wang 提交于
intel_num_cpu_cores() is a static function in intel.c which can't be used by other files. Define another function called detect_num_cpu_cores() in common.c to replace this function so it can be reused. Signed-off-by: NDavid Wang <davidwang@zhaoxin.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: lukelin@viacpu.com Cc: qiyuanwang@zhaoxin.com Cc: gregkh@linuxfoundation.org Cc: brucechang@via-alliance.com Cc: timguo@zhaoxin.com Cc: cooperyan@zhaoxin.com Cc: hpa@zytor.com Cc: benjaminpan@viatech.com Link: https://lkml.kernel.org/r/1525314766-18910-2-git-send-email-davidwang@zhaoxin.com
-
由 Thomas Gleixner 提交于
No point in exposing all these functions globaly as they are strict local to the cpu management code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 08 5月, 2018 1 次提交
-
-
由 van der Linden, Frank 提交于
This patch fixes crashes during boot for HVM guests on older (pre HVM vector callback) Xen versions. Without this, current kernels will always fail to boot on those Xen versions. Sample stack trace: BUG: unable to handle kernel paging request at ffffffffff200000 IP: __xen_evtchn_do_upcall+0x1e/0x80 PGD 1e0e067 P4D 1e0e067 PUD 1e10067 PMD 235c067 PTE 0 Oops: 0002 [#1] SMP PTI Modules linked in: CPU: 0 PID: 512 Comm: kworker/u2:0 Not tainted 4.14.33-52.13.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 3.4.3.amazon 11/11/2016 task: ffff88002531d700 task.stack: ffffc90000480000 RIP: 0010:__xen_evtchn_do_upcall+0x1e/0x80 RSP: 0000:ffff880025403ef0 EFLAGS: 00010046 RAX: ffffffff813cc760 RBX: ffffffffff200000 RCX: ffffc90000483ef0 RDX: ffff880020540a00 RSI: ffff880023c78000 RDI: 000000000000001c RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880025403f5c R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880025400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff200000 CR3: 0000000001e0a000 CR4: 00000000000006f0 Call Trace: <IRQ> do_hvm_evtchn_intr+0xa/0x10 __handle_irq_event_percpu+0x43/0x1a0 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x39/0x60 handle_fasteoi_irq+0x80/0x140 handle_irq+0xaf/0x120 do_IRQ+0x41/0xd0 common_interrupt+0x7d/0x7d </IRQ> During boot, the HYPERVISOR_shared_info page gets remapped to make it work with KASLR. This means that any pointer derived from it needs to be adjusted. The only value that this applies to is the vcpu_info pointer for VCPU 0. For PV and HVM with the callback vector feature, this gets done via the smp_ops prepare_boot_cpu callback. Older Xen versions do not support the HVM callback vector, so there is no Xen-specific smp_ops set up in that scenario. So, the vcpu_info pointer for VCPU 0 never gets set to the proper value, and the first reference of it will be bad. Fix this by resetting it immediately after the remap. Signed-off-by: NFrank van der Linden <fllinden@amazon.com> Reviewed-by: NEduardo Valentin <eduval@amazon.com> Reviewed-by: NAlakesh Haloi <alakeshh@amazon.com> Reviewed-by: NVallish Vaidyeshwara <vallish@amazon.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 06 5月, 2018 7 次提交
-
-
由 Suravee Suthikulpanit 提交于
Derive topology information from Extended Topology Enumeration (CPUID function 0xB) when the information is available. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1524865681-112110-3-git-send-email-suravee.suthikulpanit@amd.com
-
由 Suravee Suthikulpanit 提交于
Current implementation does not communicate whether it can successfully detect CPUID function 0xB information. Therefore, modify the function to return success or error codes. This will be used by subsequent patches. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1524865681-112110-2-git-send-email-suravee.suthikulpanit@amd.com
-
由 Suravee Suthikulpanit 提交于
Last Level Cache ID can be calculated from the number of threads sharing the cache, which is available from CPUID Fn0x8000001D (Cache Properties). This is used to left-shift the APIC ID to derive LLC ID. Therefore, default to this method unless the APIC ID enumeration does not follow the scheme. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1524864877-111962-5-git-send-email-suravee.suthikulpanit@amd.com
-
由 Borislav Petkov 提交于
Since this file contains general cache-related information for x86, rename the file to a more generic name. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1524864877-111962-4-git-send-email-suravee.suthikulpanit@amd.com
-
由 Suravee Suthikulpanit 提交于
Current logic iterates over CPUID Fn8000001d leafs (Cache Properties) to detect the last level cache, and derive the last-level cache ID. However, this information is already available in the cpu_llc_id. Therefore, make use of it instead. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Link: http://lkml.kernel.org/r/1524864877-111962-3-git-send-email-suravee.suthikulpanit@amd.com
-
由 Borislav Petkov 提交于
Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're always present as symbols and not only in the CONFIG_SMP case. Then, other code using them doesn't need ugly ifdeffery anymore. Get rid of some ifdeffery. Signed-off-by: NBorislav Petkov <bpetkov@suse.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1524864877-111962-2-git-send-email-suravee.suthikulpanit@amd.com
-
由 Anthoine Bourgeois 提交于
Since the commit "8003c9ae: add APIC Timer periodic/oneshot mode VMX preemption timer support", a Windows 10 guest has some erratic timer spikes. Here the results on a 150000 times 1ms timer without any load: Before 8003c9ae | After 8003c9ae Max 1834us | 86000us Mean 1100us | 1021us Deviation 59us | 149us Here the results on a 150000 times 1ms timer with a cpu-z stress test: Before 8003c9ae | After 8003c9ae Max 32000us | 140000us Mean 1006us | 1997us Deviation 140us | 11095us The root cause of the problem is starting hrtimer with an expiry time already in the past can take more than 20 milliseconds to trigger the timer function. It can be solved by forward such past timers immediately, rather than submitting them to hrtimer_start(). In case the timer is periodic, update the target expiration and call hrtimer_start with it. v2: Check if the tsc deadline is already expired. Thank you Mika. v3: Execute the past timers immediately rather than submitting them to hrtimer_start(). v4: Rearm the periodic timer with advance_periodic_target_expiration() a simpler version of set_target_expiration(). Thank you Paolo. Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com> 8003c9ae ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support") Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 5月, 2018 2 次提交
-
-
由 Daniel Borkmann 提交于
The JIT logic in jit_subprogs() is as follows: for all subprogs we allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here), and pass it to bpf_int_jit_compile(). If a failure occurred during JIT and prog->jited is not set, then we bail out from attempting to JIT the whole program, and punt to the interpreter instead. In case JITing went successful, we fixup BPF call offsets and do another pass to bpf_int_jit_compile() (extra_pass is true at that point) to complete JITing calls. Given that requires to pass JIT context around addrs and jit_data from x86 JIT are freed in the extra_pass in bpf_int_jit_compile() when calls are involved (if not, they can be freed immediately). However, if in the original pass, the JIT image didn't converge then we leak addrs and jit_data since image itself is NULL, the prog->is_func is set and extra_pass is false in that case, meaning both will become unreachable and are never cleaned up, therefore we need to free as well on !image. Only x64 JIT is affected. Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
While reviewing x64 JIT code, I noticed that we leak the prior allocated JIT image in the case where proglen != oldproglen during the JIT passes. Prior to the commit e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") we would just break out of the loop, and using the image as the JITed prog since it could only shrink in size anyway. After e0ee9c12, we would bail out to out_addrs label where we free addrs and jit_data but not the image coming from bpf_jit_binary_alloc(). Fixes: e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 02 5月, 2018 3 次提交
-
-
由 Thomas Gleixner 提交于
The recent commt which addresses the x86_phys_bits corruption with encrypted memory on CPUID reload after a microcode update lost the reload of CPUID_8000_0008_EBX as well. As a consequence IBRS and IBRS_FW are not longer detected Restore the behaviour by bringing the reload of CPUID_8000_0008_EBX back. This restore has a twist due to the convoluted way the cpuid analysis works: CPUID_8000_0008_EBX is used by AMD to enumerate IBRB, IBRS, STIBP. On Intel EBX is not used. But the speculation control code sets the AMD bits when running on Intel depending on the Intel specific speculation control bits. This was done to use the same bits for alternatives. The change which moved the 8000_0008 evaluation out of get_cpu_cap() broke this nasty scheme due to ordering. So that on Intel the store to CPUID_8000_0008_EBX clears the IBRB, IBRS, STIBP bits which had been set before by software. So the actual CPUID_8000_0008_EBX needs to go back to the place where it was and the phys/virt address space calculation cannot touch it. In hindsight this should have used completely synthetic bits for IBRB, IBRS, STIBP instead of reusing the AMD bits, but that's for 4.18. /me needs to find time to cleanup that steaming pile of ... Fixes: d94a155c ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") Reported-by: NJörg Otte <jrg.otte@gmail.com> Reported-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NJörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: kirill.shutemov@linux.intel.com Cc: Borislav Petkov <bp@alien8.de Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805021043510.1668@nanos.tec.linutronix.de
-
由 Peter Zijlstra 提交于
mark_tsc_unstable() also needs to affect tsc_early, Now that clocksource_mark_unstable() can be used on a clocksource irrespective of its registration state, use it on both tsc_early and tsc. This does however require cs->list to be initialized empty, otherwise it cannot tell the registation state before registation. Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NDiego Viola <diego.viola@gmail.com> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180430100344.533326547@infradead.org
-
由 Peter Zijlstra 提交于
Don't leave the tsc-early clocksource registered if it errors out early. This was reported by Diego, who on his Core2 era machine got TSC invalidated while it was running with tsc-early (due to C-states). This results in keeping tsc-early with very bad effects. Reported-and-Tested-by: NDiego Viola <diego.viola@gmail.com> Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180430100344.350507853@infradead.org
-
- 28 4月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 27 4月, 2018 5 次提交
-
-
由 Junaid Shahid 提交于
Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Andy Lutomirski 提交于
32-bit user code that uses int $80 doesn't care about r8-r11. There is, however, some 64-bit user code that intentionally uses int $0x80 to invoke 32-bit system calls. From what I've seen, basically all such code assumes that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I doubt that there's any code that depends on int $0x80 zeroing r8-r11, change the kernel to preserve them. I suspect that very little user code is broken by the old clobber, since r8-r11 are only rarely allocated by gcc, and they're clobbered by function calls, so they only way we'd see a problem is if the same function that invokes int $0x80 also spills something important to one of these registers. The current behavior seems to date back to the historical commit "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were preserved. I can't find any explanation of why this change was made. Update the test_syscall_vdso_32 testcase as well to verify the new behavior, and it strengthens the test to make sure that the kernel doesn't accidentally permute r8..r15. Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org
-
由 Arnd Bergmann 提交于
A bugfix broke the x32 shmid64_ds and msqid64_ds data structure layout (as seen from user space) a few years ago: Originally, __BITS_PER_LONG was defined as 64 on x32, so we did not have padding after the 64-bit __kernel_time_t fields, After __BITS_PER_LONG got changed to 32, applications would observe extra padding. In other parts of the uapi headers we seem to have a mix of those expecting either 32 or 64 on x32 applications, so we can't easily revert the path that broke these two structures. Instead, this patch decouples x32 from the other architectures and moves it back into arch specific headers, partially reverting the even older commit 73a2d096 ("x86: remove all now-duplicate header files"). It's not clear whether this ever made any difference, since at least glibc carries its own (correct) copy of both of these header files, so possibly no application has ever observed the definitions here. Based on a suggestion from H.J. Lu, I tried out the tool from https://github.com/hjl-tools/linux-header to find other such bugs, which pointed out the same bug in statfs(), which also has a separate (correct) copy in glibc. Fixes: f4b4aae1 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H . J . Lu" <hjl.tools@gmail.com> Cc: Jeffrey Walton <noloader@gmail.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180424212013.3967461-1-arnd@arndb.de
-
由 Petr Tesarik 提交于
Xen PV domains cannot shut down and start a crash kernel. Instead, the crashing kernel makes a SCHEDOP_shutdown hypercall with the reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in arch/x86/xen/enlighten_pv.c. A crash kernel reservation is merely a waste of RAM in this case. It may also confuse users of kexec_load(2) and/or kexec_file_load(2). When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH, respectively, these syscalls return success, which is technically correct, but the crash kexec image will never be actually used. Signed-off-by: NPetr Tesarik <ptesarik@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
-
由 jacek.tomaka@poczta.fm 提交于
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210 (and others) Before: [ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 After: [ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16 The entries do exist in the official Intel SMD but the type column there is incorrect (states "Cache" where it should read "TLB"), but the entries for the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'. Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
-
- 26 4月, 2018 1 次提交
-
-
由 Yazen Ghannam 提交于
Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. play_dead() expects to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, it is preferred over the other methods, so the CPU never reaches the deepest possible state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then fallback to HALT. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
-