- 08 9月, 2018 2 次提交
-
-
由 Nadav Amit 提交于
When page-table entries are set, the compiler might optimize their assignment by using multiple instructions to set the PTE. This might turn into a security hazard if the user somehow manages to use the interim PTE. L1TF does not make our lives easier, making even an interim non-present PTE a security hazard. Using WRITE_ONCE() to set PTEs and friends should prevent this potential security hazard. I skimmed the differences in the binary with and without this patch. The differences are (obviously) greater when CONFIG_PARAVIRT=n as more code optimizations are possible. For better and worse, the impact on the binary with this patch is pretty small. Skimming the code did not cause anything to jump out as a security hazard, but it seems that at least move_soft_dirty_pte() caused set_pte_at() to use multiple writes. Signed-off-by: NNadav Amit <namit@vmware.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
-
由 Thomas Gleixner 提交于
activate_managed() returns EINVAL instead of -EINVAL in case of error. While this is unlikely to happen, the positive return value would cause further malfunction at the call site. Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
- 06 9月, 2018 2 次提交
-
-
由 Jann Horn 提交于
When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: 45807a1d ("vdso: print fatal signals") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
-
由 Chuanhua Lei 提交于
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then dividing it by HZ. Both tsc_khz and the temporary variable holding the multiplication result are of type unsigned long, so on 32bit the result is truncated to the lower 32bit. Use u64 as type for the temporary variable and cast tsc_khz to it before multiplying. [ tglx: Massaged changelog and removed pointless braces ] Fixes: cf7a63ef ("x86/tsc: Calibrate tsc only once") Signed-off-by: NChuanhua Lei <chuanhua.lei@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: yixin.zhu@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Tatashin <pasha.tatashin@microsoft.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
-
- 03 9月, 2018 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in arch/x86/include/asm/atomic.h that are caused by having a #define macro between the kernel-doc notation and the function name. Fixed by moving the #define macro to after the function implementation. Make the same change for atomic64_{32,64}.h for consistency even though there were no kernel-doc warnings found in these header files, but there would be if they were used in generation of documentation. Fixes these kernel-doc warnings: ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'i' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'v' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:96: warning: Excess function parameter 'v' description in 'arch_atomic_inc' ../arch/x86/include/asm/atomic.h:109: warning: Excess function parameter 'v' description in 'arch_atomic_dec' ../arch/x86/include/asm/atomic.h:124: warning: Excess function parameter 'v' description in 'arch_atomic_dec_and_test' ../arch/x86/include/asm/atomic.h:138: warning: Excess function parameter 'v' description in 'arch_atomic_inc_and_test' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'i' description in 'arch_atomic_add_negative' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'v' description in 'arch_atomic_add_negative' Fixes: 18cc1814 ("atomics/treewide: Make test ops optional") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/0a1e678d-c8c5-b32c-2640-ed4e94d399d2@infradead.org
-
- 02 9月, 2018 4 次提交
-
-
由 Filippo Sironi 提交于
Handle the case where microcode gets loaded on the BSP's hyperthread sibling first and the boot_cpu_data's microcode revision doesn't get updated because of early exit due to the siblings sharing a microcode engine. For that, simply write the updated revision on all CPUs unconditionally. Signed-off-by: NFilippo Sironi <sironi@amazon.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: prarit@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
-
由 Prarit Bhargava 提交于
When preparing an MCE record for logging, boot_cpu_data.microcode is used to read out the microcode revision on the box. However, on systems where late microcode update has happened, the microcode revision output in a MCE log record is wrong because boot_cpu_data.microcode is not updated when the microcode gets updated. But, the microcode revision saved in boot_cpu_data's microcode member should be kept up-to-date, regardless, for consistency. Make it so. Fixes: fa94d0c6 ("x86/MCE: Save microcode revision in machine check records") Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: sironi@amazon.de Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
-
由 Randy Dunlap 提交于
Fix the section mismatch warning in arch/x86/mm/pti.c: WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte() The function pti_clone_pgtable() references the function __init pti_user_pagetable_walk_pte(). This is often because pti_clone_pgtable lacks a __init annotation or the annotation of pti_user_pagetable_walk_pte is wrong. FATAL: modpost: Section mismatches detected. Fixes: 85900ea5 ("x86/pti: Map the vsyscall page if needed") Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
-
由 Samuel Neves 提交于
In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: NSamuel Neves <sneves@dei.uc.pt> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
-
- 01 9月, 2018 1 次提交
-
-
由 LuckTony 提交于
The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
-
- 31 8月, 2018 4 次提交
-
-
由 Joerg Roedel 提交于
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: 7757d607 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NGuenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
-
由 Andy Lutomirski 提交于
A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NRik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
-
由 Ben Hutchings 提交于
When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: e501ce95 ("x86: Force asm-goto") Reported-by: NHelmut Grohne <helmutg@debian.org> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
-
由 Jann Horn 提交于
show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf072 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
-
- 30 8月, 2018 5 次提交
-
-
由 Uros Bizjak 提交于
Replace open-coded set instructions with CC_SET()/CC_OUT(). Signed-off-by: NUros Bizjak <ubizjak@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180814165951.13538-1-ubizjak@gmail.com
-
由 Jiri Kosina 提交于
text_poke() and text_poke_bp() must be called with text_mutex held. Put proper lockdep anotation in place instead of just mentioning the requirement in a comment. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm
-
由 Jann Horn 提交于
Reset the KASAN shadow state of the task stack before rewinding RSP. Without this, a kernel oops will leave parts of the stack poisoned, and code running under do_exit() can trip over such poisoned regions and cause nonsensical false-positive KASAN reports about stack-out-of-bounds bugs. This does not wipe the exception stacks; if an oops happens on an exception stack, it might result in random KASAN false-positives from other tasks afterwards. This is probably relatively uninteresting, since if the kernel oopses on an exception stack, there are most likely bigger things to worry about. It'd be more interesting if vmapped stacks and KASAN were compatible, since then handle_stack_overflow() would oops from exception stack context. Fixes: 2deb4be2 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: kasan-dev@googlegroups.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
-
由 Nick Desaulniers 提交于
This should have been marked extern inline in order to pick up the out of line definition in arch/x86/kernel/irqflags.S. Fixes: 208cbb32 ("x86/irqflags: Provide a declaration for native_save_fl") Reported-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com
-
由 Masahiro Yamada 提交于
Commit cafa0010 ("Raise the minimum required gcc version to 4.6") bumped the minimum GCC version to 4.6 for all architectures. Remove the workaround code. It was the only user of cc-if-fullversion. Remove the macro as well. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/1535348714-25457-1-git-send-email-yamada.masahiro@socionext.com
-
- 27 8月, 2018 3 次提交
-
-
由 Nikolas Nyby 提交于
Fix a typo in the Kconfig help text: adverticed -> advertised. Signed-off-by: NNikolas Nyby <nikolas@gnu.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Cc: tglx@linutronix.de Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180825231054.23813-1-nikolas@gnu.org
-
由 Andi Kleen 提交于
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits physical address space. The L1TF workaround is limited by this internal cache address width, and needs to have one bit free there for the mitigation to work. Older client systems report only 36bit physical address space so the range check decides that L1TF is not mitigated for a 36bit phys/32GB system with some memory holes. But since these actually have the larger internal cache width this warning is bogus because it would only really be needed if the system had more than 43bits of memory. Add a new internal x86_cache_bits field. Normally it is the same as the physical bits field reported by CPUID, but for Nehalem and newerforce it to be at least 44bits. Change the L1TF memory size warning to use the new cache_bits field to avoid bogus warnings and remove the bogus comment about memory size. Fixes: 17dbca11 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: NGeorge Anchev <studio@anchev.net> Reported-by: NChristopher Snowhill <kode54@gmail.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Michael Hocko <mhocko@suse.com> Cc: vbabka@suse.cz Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
-
由 Andi Kleen 提交于
The check for Spectre microcodes does not check for family 6, only the model numbers. Add a family 6 check to avoid ambiguity with other families. Fixes: a5b29663 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes") Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org
-
- 24 8月, 2018 5 次提交
-
-
由 Vlastimil Babka 提交于
Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. Make the warning more helpful by suggesting the proper mem=X kernel boot parameter to make it effective and a link to the L1TF document to help decide if the mitigation is worth the unusable RAM. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536Suggested-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
-
由 Vlastimil Babka 提交于
Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. In fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to holes in the e820 map, the main region is almost 500MB over the 32GB limit: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the 500MB revealed, that there's an off-by-one error in the check in l1tf_select_mitigation(). l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range check in the mitigation path does not take this into account. Instead of amending the range check, make l1tf_pfn_limit() return the first PFN which is over the limit which is less error prone. Adjust the other users accordingly. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Fixes: 17dbca11 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: NGeorge Anchev <studio@anchev.net> Reported-by: NChristopher Snowhill <kode54@gmail.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz
-
由 Masahiro Yamada 提交于
Commit a0f97e06 ("kbuild: enable 'make CFLAGS=...' to add additional options to CC") renamed CFLAGS to KBUILD_CFLAGS. Commit 222d394d ("kbuild: enable 'make AFLAGS=...' to add additional options to AS") renamed AFLAGS to KBUILD_AFLAGS. Commit 06c5040c ("kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP") renamed CPPFLAGS to KBUILD_CPPFLAGS. For some reason, LDFLAGS was not renamed. Using a well-known variable like LDFLAGS may result in accidental override of the variable. Kbuild generally uses KBUILD_ prefixed variables for the internally appended options, so here is one more conversion to sanitize the naming convention. I did not touch Makefiles under tools/ since the tools build system is a different world. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
-
由 Peter Zijlstra 提交于
If we don't use paravirt; don't play unnecessary and complicated games to free page-tables. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Jann reported that x86 was missing required TLB invalidates when he hit the !*batch slow path in tlb_remove_table(). This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache) invalidates, the PowerPC-hash where this code originated and the Sparc-hash where this was subsequently used did not need that. ARM which later used this put an explicit TLB invalidate in their __p*_free_tlb() functions, and PowerPC-radix followed that example. But when we hooked up x86 we failed to consider this. Fix this by (optionally) hooking tlb_remove_table() into the TLB invalidate code. NOTE: s390 was also needing something like this and might now be able to use the generic code again. [ Modified to be on top of Nick's cleanups, which simplified this patch now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ] Fixes: 9e52fc2b ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 8月, 2018 5 次提交
-
-
由 Peter Zijlstra 提交于
Revert commits: 95b0e635 x86/mm/tlb: Always use lazy TLB mode 64482aaf x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs ac031589 x86/mm/tlb: Make lazy TLB mode lazier 61d0beb5 x86/mm/tlb: Restructure switch_mm_irqs_off() 2ff6ddf1 x86/mm/tlb: Leave lazy TLB mode at page table free time In order to simplify the TLB invalidate fixes for x86 and unify the parts that need backporting. We'll try again later. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ard Biesheuvel 提交于
An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries, each consisting of two 64-bit fields containing absolute references, to the symbol itself and to a char array containing its name, respectively. When we build the same configuration with KASLR enabled, we end up with an additional ~192 KB of relocations in the .init section, i.e., one 24 byte entry for each absolute reference, which all need to be processed at boot time. Given how the struct kernel_symbol that describes each entry is completely local to module.c (except for the references emitted by EXPORT_SYMBOL() itself), we can easily modify it to contain two 32-bit relative references instead. This reduces the size of the __ksymtab section by 50% for all 64-bit architectures, and gets rid of the runtime relocations entirely for architectures implementing KASLR, either via standard PIE linking (arm64) or using custom host tools (x86). Note that the binary search involving __ksymtab contents relies on each section being sorted by symbol name. This is implemented based on the input section names, not the names in the ksymtab entries, so this patch does not interfere with that. Given that the use of place-relative relocations requires support both in the toolchain and in the module loader, we cannot enable this feature for all architectures. So make it dependent on whether CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined. Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NJessica Yu <jeyu@kernel.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NWill Deacon <will.deacon@arm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ard Biesheuvel 提交于
To allow existing C code to be incorporated into the decompressor or the UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx declarations into nops, and #define it in places where such exports are undesirable. Note that this gets rid of a rather dodgy redefine of linux/export.h's header guard. Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NNicolas Pitre <nico@linaro.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NWill Deacon <will.deacon@arm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ard Biesheuvel 提交于
Patch series "add support for relative references in special sections", v10. This adds support for emitting special sections such as initcall arrays, PCI fixups and tracepoints as relative references rather than absolute references. This reduces the size by 50% on 64-bit architectures, but more importantly, it removes the need for carrying relocation metadata for these sections in relocatable kernels (e.g., for KASLR) that needs to be fixed up at boot time. On arm64, this reduces the vmlinux footprint of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs 4 byte relative reference) Patch #3 was sent out before as a single patch. This series supersedes the previous submission. This version makes relative ksymtab entries dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather than trying to infer from kbuild test robot replies for which architectures it should be blacklisted. Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS, and sets it for the main architectures that are expected to benefit the most from this feature, i.e., 64-bit architectures or ones that use runtime relocations. Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of ksymtab/kcrctab sections in decompressor and EFI stub objects when rebuilding existing C files to run in a different context. Patches #4 - #6 implement relative references for initcalls, PCI fixups and tracepoints, respectively, all of which produce sections with order ~1000 entries on an arm64 defconfig kernel with tracing enabled. This means we save about 28 KB of vmlinux space for each of these patches. [From the v7 series blurb, which included the jump_label patches as well]: For the arm64 kernel, all patches combined reduce the memory footprint of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has KASLR enabled), of which ~1 MB is the size reduction of the RELA section in .init, and the remaining 300 KB is reduction of .text/.data. This patch (of 6): Before updating certain subsystems to use place relative 32-bit relocations in special sections, to save space and reduce the number of absolute relocations that need to be processed at runtime by relocatable kernels, introduce the Kconfig symbol and define it for some architectures that should be able to support and benefit from it. Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NWill Deacon <will.deacon@arm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: James Morris <jmorris@namei.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Cc: James Morris <james.morris@microsoft.com> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: NDavid Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 8月, 2018 6 次提交
-
-
由 Paolo Bonzini 提交于
Two bug fixes: 1) missing entries in the l1d_param array; this can cause a host crash if an access attempts to reach the missing entry. Future-proof the get function against any overflows as well. However, the two entries VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must not be accepted by the parse function, so disable them there. 2) invalid values must be rejected even if the CPU does not have the bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF) ... and a small refactoring, since the .cmd field is redundant with the index in the array. Reported-by: NBandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Fixes: a7b9020bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Gleixner 提交于
Mikhail reported the following lockdep splat: WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170 but this lock was taken by another, HARDIRQ-safe lock in the past: (&(&sighand->siglock)->rlock){-.-.} and interrupts could create inverse lock ordering between them. Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The code path which connects those locks is: speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64() In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host(). This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible. Fixes: 1f50ddb4 ("x86/speculation: Handle HT correctly on AMD") Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Virtualization of Intel SGX depends on Enclave Page Cache (EPC) management that is not yet available in the kernel, i.e. KVM support for exposing SGX to a guest cannot be added until basic support for SGX is upstreamed, which is a WIP[1]. Until SGX is properly supported in KVM, ensure a guest sees expected behavior for ENCLS, i.e. all ENCLS #UD. Because SGX does not have a true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS instruction can be executed[1] by the guest if SGX is supported by the system. Intercept all ENCLS leafs (via the ENCLS- exiting control and field) and unconditionally inject #UD. [1] https://www.spinics.net/lists/kvm/msg171333.html or https://lkml.org/lkml/2018/7/3/879 [2] A guest can execute ENCLS in the sense that ENCLS will not take an immediate #UD, but no ENCLS will ever succeed in a guest without explicit support from KVM (map EPC memory into the guest), unless KVM has a *very* egregious bug, e.g. accidentally mapped EPC memory into the guest SPTEs. In other words this patch is needed only to prevent the guest from seeing inconsistent behavior, e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf operand(s) does not point at EPC memory) instead of #UD on ENCLS. Intercepting ENCLS is not required to prevent the guest from truly utilizing SGX. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Hardware support for basic SGX virtualization adds a new execution control (ENCLS_EXITING), VMCS field (ENCLS_EXITING_BITMAP) and exit reason (ENCLS), that enables a VMM to intercept specific ENCLS leaf functions, e.g. to inject faults when the VMM isn't exposing SGX to a VM. When ENCLS_EXITING is enabled, the VMM can set/clear bits in the bitmap to intercept/allow ENCLS leaf functions in non-root, e.g. setting bit 2 in the ENCLS_EXITING_BITMAP will cause ENCLS[EINIT] to VMExit(ENCLS). Note: EXIT_REASON_ENCLS was previously added by commit 1f519992 ("KVM: VMX: add missing exit reasons"). Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20180814163334.25724-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yi Wang 提交于
Substitute spaces with tab. No functional changes. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn> Message-Id: <1534398159-48509-1-git-send-email-wang.yi59@zte.com.cn> Cc: stable@vger.kernel.org # L1TF Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arnd Bergmann 提交于
Removing one of the two accesses of the maxphyaddr variable led to a harmless warning: arch/x86/kvm/x86.c: In function 'kvm_set_mmio_spte_mask': arch/x86/kvm/x86.c:6563:6: error: unused variable 'maxphyaddr' [-Werror=unused-variable] Removing the #ifdef seems to be the nicest workaround, as it makes the code look cleaner than adding another #ifdef. Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org # L1TF Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 8月, 2018 2 次提交
-
-
由 Juergen Gross 提交于
Commit 7b25b9cb ("x86/xen/time: Initialize pv xen time in init_hypervisor_platform()") moved the mapping of the shared info area before pagetable_init(). This breaks booting as 32-bit PV guest as the use of set_fixmap isn't possible at this time on 32-bit. This can be worked around by populating the needed PMD on 32-bit kernel earlier. In order not to reimplement populate_extra_pte() using extend_brk() for allocating new page tables extend alloc_low_pages() to do that in case the early page table pool is not yet available. Fixes: 7b25b9cb ("x86/xen/time: Initialize pv xen time in init_hypervisor_platform()") Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
Remove Xen hypercall functions which are used nowhere in the kernel. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-