- 03 5月, 2018 12 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
AMD does not need the Speculative Store Bypass mitigation to be enabled. The parameters for this are already available and can be done via MSR C001_1020. Each family uses a different bit in that MSR for this. [ tglx: Expose the bit mask via a variable and move the actual MSR fiddling into the bugs code as that's the right thing to do and also required to prepare for dynamic enable/disable ] Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the future (or in fact use different MSRs for the same functionality). As such a run-time mechanism is required to whitelist the appropriate MSR values. [ tglx: Made the variable __ro_after_init ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Intel CPUs expose methods to: - Detect whether RDS capability is available via CPUID.7.0.EDX[31], - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS. - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS. With that in mind if spec_store_bypass_disable=[auto,on] is selected set at boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it. Note that this does not fix the KVM case where the SPEC_CTRL is exposed to guests which can muck with it, see patch titled : KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS. And for the firmware (IBRS to be set), see patch titled: x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits [ tglx: Distangled it from the intel implementation and kept the call order ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Add the CPU feature bit CPUID.7.0.EDX[31] which indicates whether the CPU supports Reduced Data Speculation. [ tglx: Split it out from a later patch ] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Add the sysfs file for the new vulerability. It does not do much except show the words 'Vulnerable' for recent x86 cores. Intel cores prior to family 6 are known not to be vulnerable, and so are some Atoms and some Xeon Phi. It assumes that older Cyrix, Centaur, etc. cores are immune. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all the other bits as reserved. The Intel SDM glossary defines reserved as implementation specific - aka unknown. As such at bootup this must be taken it into account and proper masking for the bits in use applied. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Made x86_spec_ctrl_base __ro_after_init ] Suggested-by: NJon Masters <jcm@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Those SysFS functions have a similar preamble, as such make common code to handle them. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
Combine the various logic which goes through all those x86_cpu_id matching structures in one function. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
由 Linus Torvalds 提交于
The macro is not type safe and I did look for why that "g" constraint for the asm doesn't work: it's because the asm is more fundamentally wrong. It does movl %[val], %%eax but "val" isn't a 32-bit value, so then gcc will pass it in a register, and generate code like movl %rsi, %eax and gas will complain about a nonsensical 'mov' instruction (it's moving a 64-bit register to a 32-bit one). Passing it through memory will just hide the real bug - gcc still thinks the memory location is 64-bit, but the "movl" will only load the first 32 bits and it all happens to work because x86 is little-endian. Convert it to a type safe inline function with a little trick which hands the feature into the ALTERNATIVE macro. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
- 02 5月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
The recent commt which addresses the x86_phys_bits corruption with encrypted memory on CPUID reload after a microcode update lost the reload of CPUID_8000_0008_EBX as well. As a consequence IBRS and IBRS_FW are not longer detected Restore the behaviour by bringing the reload of CPUID_8000_0008_EBX back. This restore has a twist due to the convoluted way the cpuid analysis works: CPUID_8000_0008_EBX is used by AMD to enumerate IBRB, IBRS, STIBP. On Intel EBX is not used. But the speculation control code sets the AMD bits when running on Intel depending on the Intel specific speculation control bits. This was done to use the same bits for alternatives. The change which moved the 8000_0008 evaluation out of get_cpu_cap() broke this nasty scheme due to ordering. So that on Intel the store to CPUID_8000_0008_EBX clears the IBRB, IBRS, STIBP bits which had been set before by software. So the actual CPUID_8000_0008_EBX needs to go back to the place where it was and the phys/virt address space calculation cannot touch it. In hindsight this should have used completely synthetic bits for IBRB, IBRS, STIBP instead of reusing the AMD bits, but that's for 4.18. /me needs to find time to cleanup that steaming pile of ... Fixes: d94a155c ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") Reported-by: NJörg Otte <jrg.otte@gmail.com> Reported-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NJörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: kirill.shutemov@linux.intel.com Cc: Borislav Petkov <bp@alien8.de Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805021043510.1668@nanos.tec.linutronix.de
-
- 28 4月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 27 4月, 2018 5 次提交
-
-
由 Junaid Shahid 提交于
Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Andy Lutomirski 提交于
32-bit user code that uses int $80 doesn't care about r8-r11. There is, however, some 64-bit user code that intentionally uses int $0x80 to invoke 32-bit system calls. From what I've seen, basically all such code assumes that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I doubt that there's any code that depends on int $0x80 zeroing r8-r11, change the kernel to preserve them. I suspect that very little user code is broken by the old clobber, since r8-r11 are only rarely allocated by gcc, and they're clobbered by function calls, so they only way we'd see a problem is if the same function that invokes int $0x80 also spills something important to one of these registers. The current behavior seems to date back to the historical commit "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were preserved. I can't find any explanation of why this change was made. Update the test_syscall_vdso_32 testcase as well to verify the new behavior, and it strengthens the test to make sure that the kernel doesn't accidentally permute r8..r15. Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org
-
由 Arnd Bergmann 提交于
A bugfix broke the x32 shmid64_ds and msqid64_ds data structure layout (as seen from user space) a few years ago: Originally, __BITS_PER_LONG was defined as 64 on x32, so we did not have padding after the 64-bit __kernel_time_t fields, After __BITS_PER_LONG got changed to 32, applications would observe extra padding. In other parts of the uapi headers we seem to have a mix of those expecting either 32 or 64 on x32 applications, so we can't easily revert the path that broke these two structures. Instead, this patch decouples x32 from the other architectures and moves it back into arch specific headers, partially reverting the even older commit 73a2d096 ("x86: remove all now-duplicate header files"). It's not clear whether this ever made any difference, since at least glibc carries its own (correct) copy of both of these header files, so possibly no application has ever observed the definitions here. Based on a suggestion from H.J. Lu, I tried out the tool from https://github.com/hjl-tools/linux-header to find other such bugs, which pointed out the same bug in statfs(), which also has a separate (correct) copy in glibc. Fixes: f4b4aae1 ("x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H . J . Lu" <hjl.tools@gmail.com> Cc: Jeffrey Walton <noloader@gmail.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180424212013.3967461-1-arnd@arndb.de
-
由 Petr Tesarik 提交于
Xen PV domains cannot shut down and start a crash kernel. Instead, the crashing kernel makes a SCHEDOP_shutdown hypercall with the reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in arch/x86/xen/enlighten_pv.c. A crash kernel reservation is merely a waste of RAM in this case. It may also confuse users of kexec_load(2) and/or kexec_file_load(2). When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH, respectively, these syscalls return success, which is technically correct, but the crash kexec image will never be actually used. Signed-off-by: NPetr Tesarik <ptesarik@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
-
由 jacek.tomaka@poczta.fm 提交于
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210 (and others) Before: [ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 After: [ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16 The entries do exist in the official Intel SMD but the type column there is incorrect (states "Cache" where it should read "TLB"), but the entries for the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'. Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
-
- 26 4月, 2018 6 次提交
-
-
由 Yazen Ghannam 提交于
Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. play_dead() expects to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, it is preferred over the other methods, so the CPU never reaches the deepest possible state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then fallback to HALT. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
-
由 Jiri Kosina 提交于
Commits 9b46a051 ("x86/mm: Initialize vmemmap_base at boot-time") and a7412546 ("x86/mm: Adjust vmalloc base and size at boot-time") lost the type information for __VMALLOC_BASE_L4, __VMALLOC_BASE_L5, __VMEMMAP_BASE_L4 and __VMEMMAP_BASE_L5 constants. Declare them explicitly unsigned long again. Fixes: 9b46a051 ("x86/mm: Initialize vmemmap_base at boot-time") Fixes: a7412546 ("x86/mm: Adjust vmalloc base and size at boot-time") Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: N"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1804121437350.28129@cbobk.fhfr.pm
-
由 Dou Liyang 提交于
The macro FPU_IRQ has never been used since v3.10, So remove it. Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180426060832.27312-1-douly.fnst@cn.fujitsu.com
-
由 Dou Liyang 提交于
Now, Linux uses matrix allocator for vector assignment, the original assignment code which used VECTOR_OFFSET_START has been removed. So remove the stale macro as well. Fixes: commit 69cde000 ("x86/vector: Use matrix allocator for vector assignment") Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Fenghua Yu 提交于
cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
The SMM freeze feature was introduced since PerfMon V2. But the current code unconditionally enables the feature for all platforms. It can generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for the machine with PerfMon V1. To disable the feature for PerfMon V1, perf needs to - Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to intel_pmu, which is only applied to PerfMon V2 and later. - Check the PerfMon version before flipping the SMM bit when starting CPU Fixes: 6089327f ("perf/x86: Add sysfs entry to freeze counters on SMI") Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: ak@linux.intel.com Cc: eranian@google.com Cc: acme@redhat.com Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
-
- 25 4月, 2018 6 次提交
-
-
由 Steven Rostedt (VMware) 提交于
Arnaldo noticed that the latest kernel is missing the syscall event system directory in x86. I bisected it down to d5a00528 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()"). The system call trace events are special, as there is only one trace event for all system calls (the raw_syscalls). But a macro that wraps the system calls creates meta data for them that copies the name to find the system call that maps to the system call table (the number). At boot up, it does a kallsyms lookup of the system call table to find the function that maps to the meta data of the system call. If it does not find a function, then that system call is ignored. Because the x86 system calls had "__x64_", or "__ia32_" prefixed to the "sys" for the names, they do not match the default compare algorithm. As this was a problem for power pc, the algorithm can be overwritten by the architecture. The solution is to have x86 have its own algorithm to do the compare and this brings back the system call trace events. Link: http://lkml.kernel.org/r/20180417174128.0f3457f0@gandalf.local.homeReported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Acked-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Fixes: d5a00528 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Dave Hansen 提交于
commit ce9962bf7e22bb3891655c349faff618922d4a73 0day reported warnings at boot on 32-bit systems without NX support: attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0: check_pgprot at arch/x86/include/asm/pgtable.h:535 (inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549 (inlined by) do_anonymous_page at mm/memory.c:3169 (inlined by) handle_pte_fault at mm/memory.c:3961 (inlined by) __handle_mm_fault at mm/memory.c:4087 (inlined by) handle_mm_fault at mm/memory.c:4124 The problem is that due to the recent commit which removed auto-massaging of page protections, filtering page permissions at PTE creation time is not longer done, so vma->vm_page_prot is passed unfiltered to PTE creation. Filter the page protections before they are installed in vma->vm_page_prot. Fixes: fb43d6cb ("x86/mm: Do not auto-massage page protections") Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com
-
由 Dave Hansen 提交于
commit 26d35ca6c3776784f8156e1d6f80cc60d9a2a915 RANDSTRUCT derives its hardening benefits from the attacker's lack of knowledge about the layout of kernel data structures. Keep the kernel image non-global in cases where RANDSTRUCT is in use to help keep the layout a secret. Fixes: 8c06c774 (x86/pti: Leave kernel text global for !PCID) Reported-by: NKees Cook <keescook@google.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20180420222026.D0B4AAC9@viggo.jf.intel.com
-
由 Dave Hansen 提交于
commit abb67605203687c8b7943d760638d0301787f8d9 Kees reported to me that I made too much of the kernel image global. It was far more than just text: I think this is too much set global: _end is after data, bss, and brk, and all kinds of other stuff that could hold secrets. I think this should match what mark_rodata_ro() is doing. This does exactly that. We use __end_rodata_hpage_align as our marker both because it is huge-page-aligned and it does not contain any sections we expect to hold secrets. Kees's logic was that r/o data is in the kernel image anyway and, in the case of traditional distributions, can be freely downloaded from the web, so there's no reason to hide it. Fixes: 8c06c774 (x86/pti: Leave kernel text global for !PCID) Reported-by: NKees Cook <keescook@google.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222023.1C8B2B20@viggo.jf.intel.com
-
由 Dave Hansen 提交于
commit 231df823c4f04176f607afc4576c989895cff40e The pageattr.c code attempts to process "faults" when it goes looking for PTEs to change and finds non-present entries. It allows these faults in the linear map which is "expected to have holes", but WARN()s about them elsewhere, like when called on the kernel image. However, change_page_attr_clear() is now called on the kernel image in the process of trying to clear the Global bit. This trips the warning in __cpa_process_fault() if a non-present PTE is encountered in the kernel image. The "holes" in the kernel image result from free_init_pages()'s use of set_memory_np(). These holes are totally fine, and result from normal operation, just as they would be in the kernel linear map. Just silence the warning when holes in the kernel image are encountered. Fixes: 39114b7a (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: NMariusz Ceier <mceier@gmail.com> Reported-by: NAaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NAaro Koskinen <aaro.koskinen@nokia.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222021.1C7D2B3F@viggo.jf.intel.com
-
由 Dave Hansen 提交于
commit 16dce603adc9de4237b7bf2ff5c5290f34373e7b Part of the global bit _setting_ patches also includes clearing the Global bit when it should not be enabled. That is done with set_memory_nonglobal(), which uses change_page_attr_clear() in pageattr.c under the covers. The TLB flushing code inside pageattr.c has has checks like BUG_ON(irqs_disabled()), looking for interrupt disabling that might cause deadlocks. But, these also trip in early boot on certain preempt configurations. Just copy the existing BUG_ON() sequence from cpa_flush_range() to the other two sites and check for early boot. Fixes: 39114b7a (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: NMariusz Ceier <mceier@gmail.com> Reported-by: NAaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NAaro Koskinen <aaro.koskinen@nokia.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222019.20C4A410@viggo.jf.intel.com
-
- 24 4月, 2018 2 次提交
-
-
由 Borislav Petkov 提交于
Vitezslav reported a case where the "Timeout during microcode update!" panic would hit. After a deeper look, it turned out that his .config had CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a no-op. When that happened, the discovered microcode patch wasn't saved into the cache and the late loading path wouldn't find any. This, then, lead to early exit from __reload_late() and thus CPUs waiting until the timeout is reached, leading to the panic. In hindsight, that function should have been written so it does not return before the post-synchronization. Oh well, I know better now... Fixes: bb8c13d6 ("x86/microcode: Fix CPU synchronization routine") Reported-by: NVitezslav Samel <vitezslav@samel.cz> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NVitezslav Samel <vitezslav@samel.cz> Tested-by: NAshok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-2-bp@alien8.de
-
由 Borislav Petkov 提交于
save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the generic_load_microcode() path saves the microcode patches it has found into the cache of patches which is used for late loading too. Regardless of whether CPU hotplug is used or not. Make the saving unconditional so that late loading can find the proper patch. Reported-by: NVitezslav Samel <vitezslav@samel.cz> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NVitezslav Samel <vitezslav@samel.cz> Tested-by: NAshok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-1-bp@alien8.de
-
- 23 4月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
GPL2.0 is not a valid SPDX identiier. Replace it with GPL-2.0. Fixes: 4a362601 ("x86/jailhouse: Add infrastructure for running in non-root cell") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NJan Kiszka <jan.kiszka@siemens.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Link: https://lkml.kernel.org/r/20180422220832.815346488@linutronix.de
-
- 21 4月, 2018 1 次提交
-
-
由 Dave Young 提交于
Chun-Yi reported a kernel warning message below: WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c() early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120 The problem is x86 kexec_file_load adds extra alignment to the efi memmap: in bzImage64_load(): efi_map_sz = efi_get_runtime_map_size(); efi_map_sz = ALIGN(efi_map_sz, 16); And __efi_memmap_init maps with the size including the alignment bytes but efi_memmap_unmap use nr_maps * desc_size which does not include the extra bytes. The alignment in kexec code is only needed for the kexec buffer internal use Actually kexec should pass exact size of the efi memmap to 2nd kernel. Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.comSigned-off-by: NDave Young <dyoung@redhat.com> Reported-by: Njoeyli <jlee@suse.com> Tested-by: NRandy Wright <rwright@hpe.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 4月, 2018 3 次提交
-
-
由 Oskar Senft 提交于
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally despite the fact that there are no SBOXes available. Check the Power Control Unit CAPID4 register to determine the number of available SBOXes on the particular CPU before trying to enable them. If there are none, nullify the SBOX descriptor so it isn't tried to be initialized. Signed-off-by: NOskar Senft <osk@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMark van Dijk <mark@voidzero.net> Reviewed-by: NKan Liang <kan.liang@intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
-
由 Stephane Eranian 提交于
This reverts commit 3b94a891 ("perf/x86/intel/uncore: Remove SBOX support for Broadwell server") Revert because there exists a proper workaround for Broadwell-EP servers without SBOX now. Note that BDX-DE does not have a SBOX. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKan Liang <kan.liang@intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: osk@google.com Cc: mark@voidzero.net Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
-
由 Joerg Roedel 提交于
On a system with 4-level page-tables there is no p4d, so the pud in the pgd should be mapped. The old code before commit fb43d6cb already did that. The change from above commit causes an invalid page-table which causes undefined behavior. In one report it caused triple faults. Fix it by changing the p4d back to pud. Fixes: fb43d6cb ('x86/mm: Do not auto-massage page protections') Reported-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMichal Kubecek <mkubecek@suse.cz> Tested-by: NBorislav Petkov <bp@suse.de> Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Cc: pavel@ucw.cz Cc: hpa@zytor.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/1524162360-26179-1-git-send-email-joro@8bytes.org
-
- 17 4月, 2018 2 次提交
-
-
由 Joerg Roedel 提交于
The walk_pte_level() function just uses __va to get the virtual address of the PTE page, but that breaks when the PTE page is not in the direct mapping with HIGHPTE=y. The result is an unhandled kernel paging request at some random address when accessing the current_kernel or current_user file. Use the correct API to access PTE pages. Fixes: fe770bf0 ('x86: clean up the page table dumper and add 32-bit support') Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: JBeulich@suse.com Cc: hpa@zytor.com Cc: aryabinin@virtuozzo.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org
-
由 Alison Schofield 提交于
Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enumerated to Linux as a NUMA node. This is similar to how the cores and LLC were arranged for the Cluster-On-Die (CoD) feature. CoD allowed the same cache line to be present in each half of the LLC. But, with SNC, each line is only ever present in *one* slice. This means that the portion of the LLC *available* to a CPU depends on the data being accessed: Remote socket: entire package LLC is shared Local socket->local slice: data goes into local slice LLC Local socket->remote slice: data goes into remote-slice LLC. Slightly higher latency than local slice LLC. The biggest implication from this is that a process accessing all NUMA-local memory only sees half the LLC capacity. The CPU describes its cache hierarchy with the CPUID instruction. One of the CPUID leaves enumerates the "logical processors sharing this cache". This information is used for scheduling decisions so that tasks move more freely between CPUs sharing the cache. But, the CPUID for the SNC configuration discussed above enumerates the LLC as being shared by the entire package. This is not 100% precise because the entire cache is not usable by all accesses. But, it *is* the way the hardware enumerates itself, and this is not likely to change. The userspace visible impact of all the above is that the sysfs info reports the entire LLC as being available to the entire package. As noted above, this is not true for local socket accesses. This patch does not correct the sysfs info. It is the same, pre and post patch. The current code emits the following warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. The warning is coming from the topology_sane() check in smpboot.c because the topology is not matching the expectations of the model for obvious reasons. To fix this, add a vendor and model specific check to never call topology_sane() for these systems. Also, just like "Cluster-on-Die" disable the "coregroup" sched_domain_topology_level and use NUMA information from the SRAT alone. This is OK at least on the hardware we are immediately concerned about because the LLC sharing happens at both the slice and at the package level, which are also NUMA boundaries. Signed-off-by: NAlison Schofield <alison.schofield@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: brice.goglin@gmail.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
-