- 17 5月, 2018 4 次提交
-
-
由 Thomas Gleixner 提交于
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Borislav Petkov 提交于
Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NJörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
-
- 14 5月, 2018 1 次提交
-
-
由 Jim Mattson 提交于
Cast val and (val >> 32) to (u32), so that they fit in a general-purpose register in both 32-bit and 64-bit code. [ tglx: Made it u32 instead of uintptr_t ] Fixes: c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 5月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 05 5月, 2018 1 次提交
-
-
由 Kees Cook 提交于
Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 03 5月, 2018 11 次提交
-
-
由 Thomas Gleixner 提交于
Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The Speculative Store Bypass vulnerability can be mitigated with the Reduced Data Speculation (RDS) feature. To allow finer grained control of this eventually expensive mitigation a per task mitigation control is required. Add a new TIF_RDS flag and put it into the group of TIF flags which are evaluated for mismatch in switch_to(). If these bits differ in the previous and the next task, then the slow path function __switch_to_xtra() is invoked. Implement the TIF_RDS dependent mitigation control in the slow path. If the prctl for controlling Speculative Store Bypass is disabled or no task uses the prctl then there is no overhead in the switch_to() fast path. Update the KVM related speculation control functions to take TID_RDS into account as well. Based on a patch from Tim Chen. Completely rewritten. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
Having everything in nospec-branch.h creates a hell of dependencies when adding the prctl based switching mechanism. Move everything which is not required in nospec-branch.h to spec-ctrl.h and fix up the includes in the relevant files. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
AMD does not need the Speculative Store Bypass mitigation to be enabled. The parameters for this are already available and can be done via MSR C001_1020. Each family uses a different bit in that MSR for this. [ tglx: Expose the bit mask via a variable and move the actual MSR fiddling into the bugs code as that's the right thing to do and also required to prepare for dynamic enable/disable ] Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Intel CPUs expose methods to: - Detect whether RDS capability is available via CPUID.7.0.EDX[31], - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS. - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS. With that in mind if spec_store_bypass_disable=[auto,on] is selected set at boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it. Note that this does not fix the KVM case where the SPEC_CTRL is exposed to guests which can muck with it, see patch titled : KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS. And for the firmware (IBRS to be set), see patch titled: x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits [ tglx: Distangled it from the intel implementation and kept the call order ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU supports Reduced Data Speculation. [ tglx: Split it out from a later patch ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all the other bits as reserved. The Intel SDM glossary defines reserved as implementation specific - aka unknown. As such at bootup this must be taken it into account and proper masking for the bits in use applied. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Made x86_spec_ctrl_base __ro_after_init ] Suggested-by: NJon Masters <jcm@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Linus Torvalds 提交于
The macro is not type safe and I did look for why that "g" constraint for the asm doesn't work: it's because the asm is more fundamentally wrong. It does movl %[val], %%eax but "val" isn't a 32-bit value, so then gcc will pass it in a register, and generate code like movl %rsi, %eax and gas will complain about a nonsensical 'mov' instruction (it's moving a 64-bit register to a 32-bit one). Passing it through memory will just hide the real bug - gcc still thinks the memory location is 64-bit, but the "movl" will only load the first 32 bits and it all happens to work because x86 is little-endian. Convert it to a type safe inline function with a little trick which hands the feature into the ALTERNATIVE macro. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
- 27 4月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
A bugfix broke the x32 shmid64_ds and msqid64_ds data structure layout (as seen from user space) a few years ago: Originally, __BITS_PER_LONG was defined as 64 on x32, so we did not have padding after the 64-bit __kernel_time_t fields, After __BITS_PER_LONG got changed to 32, applications would observe extra padding. In other parts of the uapi headers we seem to have a mix of those expecting either 32 or 64 on x32 applications, so we can't easily revert the path that broke these two structures. Instead, this patch decouples x32 from the other architectures and moves it back into arch specific headers, partially reverting the even older commit 73a2d096 ("x86: remove all now-duplicate header files"). It's not clear whether this ever made any difference, since at least glibc carries its own (correct) copy of both of these header files, so possibly no application has ever observed the definitions here. Based on a suggestion from H.J. Lu, I tried out the tool from https://github.com/hjl-tools/linux-header to find other such bugs, which pointed out the same bug in statfs(), which also has a separate (correct) copy in glibc. Fixes: f4b4aae1 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H . J . Lu" <hjl.tools@gmail.com> Cc: Jeffrey Walton <noloader@gmail.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180424212013.3967461-1-arnd@arndb.de
-
- 26 4月, 2018 4 次提交
-
-
由 Jiri Kosina 提交于
Commits 9b46a051 ("x86/mm: Initialize vmemmap_base at boot-time") and a7412546 ("x86/mm: Adjust vmalloc base and size at boot-time") lost the type information for __VMALLOC_BASE_L4, __VMALLOC_BASE_L5, __VMEMMAP_BASE_L4 and __VMEMMAP_BASE_L5 constants. Declare them explicitly unsigned long again. Fixes: 9b46a051 ("x86/mm: Initialize vmemmap_base at boot-time") Fixes: a7412546 ("x86/mm: Adjust vmalloc base and size at boot-time") Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: N"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1804121437350.28129@cbobk.fhfr.pm
-
由 Dou Liyang 提交于
The macro FPU_IRQ has never been used since v3.10, So remove it. Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180426060832.27312-1-douly.fnst@cn.fujitsu.com
-
由 Dou Liyang 提交于
Now, Linux uses matrix allocator for vector assignment, the original assignment code which used VECTOR_OFFSET_START has been removed. So remove the stale macro as well. Fixes: commit 69cde000 ("x86/vector: Use matrix allocator for vector assignment") Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Fenghua Yu 提交于
cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 4月, 2018 2 次提交
-
-
由 Steven Rostedt (VMware) 提交于
Arnaldo noticed that the latest kernel is missing the syscall event system directory in x86. I bisected it down to d5a00528 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()"). The system call trace events are special, as there is only one trace event for all system calls (the raw_syscalls). But a macro that wraps the system calls creates meta data for them that copies the name to find the system call that maps to the system call table (the number). At boot up, it does a kallsyms lookup of the system call table to find the function that maps to the meta data of the system call. If it does not find a function, then that system call is ignored. Because the x86 system calls had "__x64_", or "__ia32_" prefixed to the "sys" for the names, they do not match the default compare algorithm. As this was a problem for power pc, the algorithm can be overwritten by the architecture. The solution is to have x86 have its own algorithm to do the compare and this brings back the system call trace events. Link: http://lkml.kernel.org/r/20180417174128.0f3457f0@gandalf.local.homeReported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Acked-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Fixes: d5a00528 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Dave Hansen 提交于
commit ce9962bf7e22bb3891655c349faff618922d4a73 0day reported warnings at boot on 32-bit systems without NX support: attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0: check_pgprot at arch/x86/include/asm/pgtable.h:535 (inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549 (inlined by) do_anonymous_page at mm/memory.c:3169 (inlined by) handle_pte_fault at mm/memory.c:3961 (inlined by) __handle_mm_fault at mm/memory.c:4087 (inlined by) handle_mm_fault at mm/memory.c:4124 The problem is that due to the recent commit which removed auto-massaging of page protections, filtering page permissions at PTE creation time is not longer done, so vma->vm_page_prot is passed unfiltered to PTE creation. Filter the page protections before they are installed in vma->vm_page_prot. Fixes: fb43d6cb ("x86/mm: Do not auto-massage page protections") Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com
-
- 23 4月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
GPL2.0 is not a valid SPDX identiier. Replace it with GPL-2.0. Fixes: 4a362601 ("x86/jailhouse: Add infrastructure for running in non-root cell") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NJan Kiszka <jan.kiszka@siemens.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Link: https://lkml.kernel.org/r/20180422220832.815346488@linutronix.de
-
- 17 4月, 2018 1 次提交
-
-
由 Dou Liyang 提交于
early_trap_init() and cpu_set_gdt() have been removed, so remove the stale declarations as well. Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Cc: luto@kernel.org Cc: hpa@zytor.com Cc: bp@suse.de Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/20180404064527.10562-1-douly.fnst@cn.fujitsu.com
-
- 16 4月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always captures the TSC_OFFSET of the running guest whether it is the L1 or L2 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> [AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 4月, 2018 1 次提交
-
-
由 AKASHI Takahiro 提交于
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 4月, 2018 4 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
This reverts commit ca26cffa. Newer clang versions accept that asm(_ASM_SP) construct, and now that the bpf-script-test-kbuild.c script, used in one of the 'perf test LLVM' subtests doesn't include ptrace.h, which ended up including arch/x86/include/asm/asm.h, we can revert this patch. Suggested-by: NYonghong Song <yhs@fb.com> Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae39e@fb.comAcked-by: NYonghong Song <yhs@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-nqozcv8loq40tkqpfw997993@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Dave Hansen 提交于
Global pages are bad for hardening because they potentially let an exploit read the kernel image via a Meltdown-style attack which makes it easier to find gadgets. But, global pages are good for performance because they reduce TLB misses when making user/kernel transitions, especially when PCIDs are not available, such as on older hardware, or where a hypervisor has disabled them for some reason. This patch implements a basic, sane policy: If you have PCIDs, you only map a minimal amount of kernel text global. If you do not have PCIDs, you map all kernel text global. This policy effectively makes PCIDs something that not only adds performance but a little bit of hardening as well. I ran a simple "lseek" microbenchmark[1] to test the benefit on a modern Atom microserver. Most of the benefit comes from applying the series before this patch ("entry only"), but there is still a signifiant benefit from this patch. No Global Lines (baseline ): 6077741 lseeks/sec 88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%) 94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%) [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.cSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
A PTE is constructed from a physical address and a pgprotval_t. __PAGE_KERNEL, for instance, is a pgprot_t and must be converted into a pgprotval_t before it can be used to create a PTE. This is done implicitly within functions like pfn_pte() by massage_pgprot(). However, this makes it very challenging to set bits (and keep them set) if your bit is being filtered out by massage_pgprot(). This moves the bit filtering out of pfn_pte() and friends. For users of PAGE_KERNEL*, filtering will be done automatically inside those macros but for users of __PAGE_KERNEL*, they need to do their own filtering now. Note that we also just move pfn_pte/pmd/pud() over to check_pgprot() instead of massage_pgprot(). This way, we still *look* for unsupported bits and properly warn about them if we find them. This might happen if an unfiltered __PAGE_KERNEL* value was passed in, for instance. - printk format warning fix from: Arnd Bergmann <arnd@arndb.de> - boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com> - crash bisected by: Mike Galbraith <efault@gmx.de> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-and-fixed-by: NArnd Bergmann <arnd@arndb.de> Fixed-by: NTom Lendacky <thomas.lendacky@amd.com> Bisected-by: NMike Galbraith <efault@gmx.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Pavel Tatashin 提交于
Juergen Gross noticed that commit f7f99100 ("mm: stop zeroing memory during allocation in vmemmap") broke XEN PV domains when deferred struct page initialization is enabled. This is because the xen's PagePinned() flag is getting erased from struct pages when they are initialized later in boot. Juergen fixed this problem by disabling deferred pages on xen pv domains. It is desirable, however, to have this feature available as it reduces boot time. This fix re-enables the feature for pv-dmains, and fixes the problem the following way: The fix is to delay setting PagePinned flag until struct pages for all allocated memory are initialized, i.e. until after free_all_bootmem(). A new x86_init.hyper op init_after_bootmem() is called to let xen know that boot allocator is done, and hence struct pages for all the allocated memory are now initialized. If deferred page initialization is enabled, the rest of struct pages are going to be initialized later in boot once page_alloc_init_late() is called. xen_after_bootmem() walks page table's pages and marks them pinned. Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Acked-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NJuergen Gross <jgross@suse.com> Tested-by: NJuergen Gross <jgross@suse.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Mathias Krause <minipli@googlemail.com> Cc: Jinbum Park <jinb.park7@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jia Zhang <zhang.jia@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 4月, 2018 2 次提交
-
-
由 Li RongQing 提交于
The APIC ID as parsed from ACPI MADT is validity checked with the apic->apic_id_valid() callback, which depends on the selected APIC type. For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF are detected as valid. This happens because the 'apicid' argument of the apic_id_valid() callback is type 'int'. So the resulting comparison apicid < 0xFF evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC mode are considered valid and accounted as possible CPUs. Change the apicid argument type of the apic_id_valid() callback to u32 so the evaluation is unsigned and returns the correct result. [ tglx: Massaged changelog ] Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com
-
由 Dave Hansen 提交于
The __PAGE_KERNEL_* page permissions are "raw". They contain bits that may or may not be supported on the current processor. They need to be filtered by a mask (currently __supported_pte_mask) to turn them into a value that we can actually set in a PTE. These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL. But, with PTI, we want to be able to support _PAGE_GLOBAL (have the bit set in __supported_pte_mask) but not have it appear in any of these masks by default. This patch creates a new mask, __default_kernel_pte_mask, and applies it when creating all of the PAGE_KERNEL_* masks. This makes PAGE_KERNEL_* safe to use anywhere (they only contain supported bits). It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n kernels but clears _PAGE_GLOBAL when PTI=y. We also make __default_kernel_pte_mask a non-GPL exported symbol because there are plenty of driver-available interfaces that take PAGE_KERNEL_* permissions. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 4月, 2018 4 次提交
-
-
由 Dominik Brodowski 提交于
Make the code in syscall_wrapper.h more readable by naming the stub macros similar to the stub they provide. While at it, fix a stray newline at the end of the __IA32_COMPAT_SYS_STUBx macro. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-5-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
This rename allows us to have a coherent syscall stub naming convention on 64-bit x86 (0xffffffff prefix removed): 810f0af0 t kernel_waitid # common (32/64) kernel helper <inline> __do_sys_waitid # inlined helper doing actual work 810f0be0 t __se_sys_waitid # C func calling inlined helper <inline> __do_compat_sys_waitid # inlined helper doing actual work 810f0d80 t __se_compat_sys_waitid # compat C func calling inlined helper 810f2080 T __x64_sys_waitid # x64 64-bit-ptregs -> C stub 810f20b0 T __ia32_sys_waitid # ia32 32-bit-ptregs -> C stub[*] 810f2470 T __ia32_compat_sys_waitid # ia32 32-bit-ptregs -> compat C stub 810f2490 T __x32_compat_sys_waitid # x32 64-bit-ptregs -> compat C stub [*] This stub is unused, as the syscall table links __ia32_compat_sys_waitid instead of __ia32_sys_waitid as we need a compat variant here. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the COMPAT_SYSCALL_DEFINEx() macro. For the generic case, this means: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) T __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T compat_sys_waitid # alias to __se_compat_sys_waitid() # (taking parameters as declared), to # be included in syscall table For x86, the naming is as follows: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) t __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T __ia32_compat_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_compat_sys_waitid(); to be # included in syscall table T __x32_compat_sys_waitid # x32 64-bit-ptregs -> C stub, calls # __se_compat_sys_waitid(); to be included # in syscall table If only one of IA32_EMULATION and x32 is enabled, __se_compat_sys_waitid() may be inlined into the stub __{ia32,x32}_compat_sys_waitid(). Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-3-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the SYSCALL_DEFINEx() macro. For the generic case, this means (0xffffffff prefix removed): 810f08d0 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810f1aa0 T __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1aa0 T sys_waitid # alias to __se_sys_waitid() (taking # parameters as declared), to be included # in syscall table For x86, the naming is as follows: 810efc70 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810efd60 t __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1140 T __ia32_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_sys_waitid(); to be included # in syscall table 810f1110 T sys_waitid # x86 64-bit-ptregs -> C stub, calls # __se_sys_waitid(); to be included in # syscall table For x86, sys_waitid() will be re-named to __x64_sys_waitid in a follow-up patch. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 4月, 2018 1 次提交
-
-
由 Dominik Brodowski 提交于
Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify several codepaths. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-