- 13 2月, 2018 4 次提交
-
-
由 Progyan Bhattacharya 提交于
The file was generated by make command and should not be in the source tree. Signed-off-by: NProgyan Bhattacharya <progyanb@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Masayoshi Mizuma 提交于
When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg() returns -1. arch/x86/events/intel/uncore.c: static void uncore_pci_remove(struct pci_dev *pdev) { ... phys_id = uncore_pcibus_to_physid(pdev->bus); ... pkg = topology_phys_to_logical_pkg(phys_id); // returns -1 for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { uncore_extra_pci_dev[pkg].dev[i] = NULL; break; } } WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!! topology_phys_to_logical_pkg() tries to find cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. arch/x86/kernel/smpboot.c: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { int cpu; for_each_possible_cpu(cpu) { struct cpuinfo_x86 *c = &cpu_data(cpu); if (c->initialized && c->phys_proc_id == phys_pkg) return c->logical_proc_id; } return -1; } However, the phys_proc_id was already set to 0 by remove_siblinginfo() when the CPU was offlined. So, topology_phys_to_logical_pkg() cannot find the correct logical_proc_id and always returns -1. As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning messages are shown. What is worse is that the bogus 'pkg' index results in two bugs: - We dereference uncore_extra_pci_dev[] with a negative index - We fail to clean up a stale pointer in uncore_extra_pci_dev[][] To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo(). This should not cause any problems, because ->phys_proc_id is not used after it is hot-removed and it is re-set while hot-adding. Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yasu.isimatu@gmail.com Cc: <stable@vger.kernel.org> Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array") Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 jia zhang 提交于
The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 jia zhang 提交于
Commit: df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data") ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y. However, accessing the vsyscall user page will cause an SMAP fault. Replace memcpy() with copy_from_user() to fix this bug works, but adding a common way to handle this sort of user page may be useful for future. Currently, only vsyscall page requires KCORE_USER. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 2月, 2018 3 次提交
-
-
由 Ingo Molnar 提交于
Clean up various aspects of the x86 CONFIG_NR_CPUS configuration switches: - Rename the three CONFIG_NR_CPUS related variables to create a common namespace for them: RANGE_BEGIN_CPUS => NR_CPUS_RANGE_BEGIN RANGE_END_CPUS => NR_CPUS_RANGE_END DEF_CONFIG_CPUS => NR_CPUS_DEFAULT - Align them vertically, such as: config NR_CPUS_RANGE_END int depends on X86_64 default 8192 if SMP && ( MAXSMP || CPUMASK_OFFSTACK) default 512 if SMP && (!MAXSMP && !CPUMASK_OFFSTACK) default 1 if !SMP - Update help text, add more comments. Test results: # i386 allnoconfig: CONFIG_NR_CPUS_RANGE_BEGIN=1 CONFIG_NR_CPUS_RANGE_END=1 CONFIG_NR_CPUS_DEFAULT=1 CONFIG_NR_CPUS=1 # i386 defconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=8 CONFIG_NR_CPUS_DEFAULT=8 CONFIG_NR_CPUS=8 # i386 allyesconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=64 CONFIG_NR_CPUS_DEFAULT=32 CONFIG_NR_CPUS=32 # x86_64 allnoconfig: CONFIG_NR_CPUS_RANGE_BEGIN=1 CONFIG_NR_CPUS_RANGE_END=1 CONFIG_NR_CPUS_DEFAULT=1 CONFIG_NR_CPUS=1 # x86_64 defconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=512 CONFIG_NR_CPUS_DEFAULT=64 CONFIG_NR_CPUS=64 # x86_64 allyesconfig: CONFIG_NR_CPUS_RANGE_BEGIN=8192 CONFIG_NR_CPUS_RANGE_END=8192 CONFIG_NR_CPUS_DEFAULT=8192 CONFIG_NR_CPUS=8192 Acked-by: NRandy Dunlap <rdunlap@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180210113629.jcv6su3r4suuno63@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Randy Dunlap 提交于
Clean up and simplify the X86 NR_CPUS Kconfig symbol/option by introducing RANGE_BEGIN_CPUS, RANGE_END_CPUS, and DEF_CONFIG_CPUS. Then combine some default values when their conditionals can be reduced. Also move the X86_BIGSMP kconfig option inside an "if X86_32"/"endif" config block and drop its explicit "depends on X86_32". Combine the max. 8192 cases of RANGE_END_CPUS (X86_64 only). Split RANGE_END_CPUS and DEF_CONFIG_CPUS into separate cases for X86_32 and X86_64. Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0b833246-ed4b-e451-c426-c4464725be92@infradead.org Link: lkml.kernel.org/r/CA+55aFzOd3j6ZUSkEwTdk85qtt1JywOtm3ZAb-qAvt8_hJ6D4A@mail.gmail.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
The following commit: 7b606162 ("x86: do not use print_symbol()") ... introduced a new build warning on 32-bit x86: arch/x86/kernel/cpu/mcheck/mce.c:237:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] pr_cont("{%pS}", (void *)m->ip); ^ Fix the type mismatch between the 'void *' expected by %pS and the mce->ip field which is u64 by casting to long. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-kernel@vger.kernel.org Fixes: 7b606162 ("x86: do not use print_symbol()") Link: http://lkml.kernel.org/r/20180210145314.22174-1-bp@alien8.de [ Cleaned up the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 2月, 2018 2 次提交
-
-
由 Kees Cook 提交于
Various portions of the kernel, especially per-architecture pieces, need to know if the compiler is building with the stack protector. This was done in the arch/Kconfig with 'select', but this doesn't allow a way to do auto-detected compiler support. In preparation for creating an on-if-available default, move the logic for the definition of CONFIG_CC_STACKPROTECTOR into the Makefile. Link: http://lkml.kernel.org/r/1510076320-69931-3-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org> Tested-by: NLaura Abbott <labbott@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
Right now the fact that KASAN uses a single shadow byte for 8 bytes of memory is scattered all over the code. This change defines KASAN_SHADOW_SCALE_SHIFT early in asm include files and makes use of this constant where necessary. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/34937ca3b90736eaad91b568edf5684091f662e3.1515775666.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2018 6 次提交
-
-
由 Peter Zijlstra 提交于
Because more readable.. Requested-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Because its daft.. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
There are two places where core serialization is needed by membarrier: 1) When returning from the membarrier IPI, 2) After scheduler updates curr to a thread with a different mm, before going back to user-space, since the curr->mm is used by membarrier to check whether it needs to send an IPI to that CPU. x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go back to user-space. The IRET instruction is core serializing, but not SYSEXIT. x86-64 uses IRET as return from interrupt, which takes care of the IPI. However, it can return to user-space through either SYSRETL (compat code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing, we rely instead on write_cr3() performed by switch_mm() to provide core serialization after changing the current mm, and deal with the special case of kthread -> uthread (temporarily keeping current mm into active_mm) by adding a sync_core() in that specific case. Use the new sync_core_before_usermode() to guarantee this. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
Ensure that a core serializing instruction is issued before returning to user-mode. x86 implements return to user-space through sysexit, sysrel, and sysretq, which are not core serializing. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-8-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
Document the membarrier requirement on having a full memory barrier in __schedule() after coming from user-space, before storing to rq->curr. It is provided by smp_mb__after_spinlock() in __schedule(). Document that membarrier requires a full barrier on transition from kernel thread to userspace thread. We currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that ensures this. The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask update operations as well as write_cr3(). Document that write_cr3() provides this barrier. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-4-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
Stephane reported that we don't support period for enabling large PEBS data, which there's no reason for. Adding PERF_SAMPLE_PERIOD into freerunning flags. Tested it with: # perf record -e cycles:P -c 100 --no-timestamp -C 0 --period Reported-by: NStephane Eranian <eranian@google.com> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Tested-by: NKan Liang <kan.liang@intel.com> Tested-by: NStephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180201083812.11359-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 2月, 2018 6 次提交
-
-
由 Rajneesh Bhardwaj 提交于
Add CPUID of Cannonlake (CNL) processors to Intel family list. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Suggested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMegha Dey <megha.dey@linux.intel.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 KarimAllah Ahmed 提交于
[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
-
由 KarimAllah Ahmed 提交于
[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
-
由 KarimAllah Ahmed 提交于
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
-
由 Ashok Raj 提交于
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: kvm@vger.kernel.org Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
-
由 KarimAllah Ahmed 提交于
[dwmw2: Stop using KF() for bits in it, too] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de
-
- 03 2月, 2018 4 次提交
-
-
由 Arnd Bergmann 提交于
The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the definition in x86-32 does not, and it fails to include the header with the declaration. This leads to a warning when building with link-time-optimizations: kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch] extern asmlinkage int swsusp_arch_resume(void); ^ arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here int swsusp_arch_resume(void) This moves the declaration into a globally visible header file and fixes up both x86 definitions to match it. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: linux-pm@vger.kernel.org Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Bart Van Assche <bart.vanassche@wdc.com> Link: https://lkml.kernel.org/r/20180202145634.200291-2-arnd@arndb.de
-
由 Arnd Bergmann 提交于
In some configurations, 'partial' does not get initialized, as shown by this gcc-8 warning: arch/x86/kernel/dumpstack.c: In function 'show_trace_log_lvl': arch/x86/kernel/dumpstack.c:156:4: error: 'partial' may be used uninitialized in this function [-Werror=maybe-uninitialized] show_regs_if_on_stack(&stack_info, regs, partial); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This initializes it to false, to get the previous behavior in this case. Fixes: a9cdbe72 ("x86/dumpstack: Fix partial register dumps") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20180202145634.200291-1-arnd@arndb.de
-
由 Darren Kenny 提交于
Fixes: 117cc7a9 ("x86/retpoline: Fill return stack buffer on vmexit") Signed-off-by: NDarren Kenny <darren.kenny@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180202191220.blvgkgutojecxr3b@starbug-vm.ie.oracle.com
-
由 Arnd Bergmann 提交于
I'm seeing build failures from the two newly introduced arrays that are marked 'const' and '__initdata', which are mutually exclusive: arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init' arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init' The correct annotation is __initconst. Fixes: fec9434a ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de
-
- 02 2月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
[dwmw2: Use ARRAY_SIZE] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1517484441-1420-3-git-send-email-dwmw@amazon.co.uk
-
- 01 2月, 2018 5 次提交
-
-
由 Rajneesh Bhardwaj 提交于
Recently sent patch 'platform/x86: intel_pmc_core: Remove unused EXPORTED API' missed to remove the header file 'arch/x86/include/asm/pmc_core.h' which was solely used to declare the EXPORTED API 'intel_pmc_slp_s0_counter_read'. This patch provides the errata fix for the same. Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Dan Williams 提交于
Commit 75f139aa "KVM: x86: Add memory barrier on vmcs field lookup" added a raw 'asm("lfence");' to prevent a bounds check bypass of 'vmcs_field_to_offset_table'. The lfence can be avoided in this path by using the array_index_nospec() helper designed for these types of fixes. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Andrew Honig <ahonig@google.com> Cc: kvm@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Link: https://lkml.kernel.org/r/151744959670.6342.3001723920950249067.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Kirill A. Shutemov 提交于
We need an atomic way to setup pmd page table entry, avoiding races with CPU setting dirty/accessed bits. This is required to implement pmdp_invalidate() that doesn't lose these bits. On PAE we can avoid expensive cmpxchg8b for cases when new page table entry is not present. If it's present, fallback to cpmxchg loop. [akpm@linux-foundation.org: add missing `do' to do-while loop] Link: http://lkml.kernel.org/r/20171213105756.69879-10-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Tatashin 提交于
There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT, as all the page initialization code is in common code. Also, there is no need to depend on MEMORY_HOTPLUG, as initialization code does not really use hotplug memory functionality. So, we can remove this requirement as well. This patch allows to use deferred struct page initialization on all platforms with memblock allocator. Tested on x86, arm64, and sparc. Also, verified that code compiles on PPC with CONFIG_MEMORY_HOTPLUG disabled. Link: http://lkml.kernel.org/r/20171117014601.31606-1-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paolo Bonzini 提交于
Place the MSR bitmap in struct loaded_vmcs, and update it in place every time the x2apic or APICv state can change. This is rare and the loop can handle 64 MSRs per iteration, in a similar fashion as nested_vmx_prepare_msr_bitmap. This prepares for choosing, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs. Cc: stable@vger.kernel.org # prereq for Spectre mitigation Suggested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 1月, 2018 9 次提交
-
-
由 Josh Poimboeuf 提交于
The 'noreplace-paravirt' option disables paravirt patching, leaving the original pv indirect calls in place. That's highly incompatible with retpolines, unless we want to uglify paravirt even further and convert the paravirt calls to retpolines. As far as I can tell, the option doesn't seem to be useful for much other than introducing surprising corner cases and making the kernel vulnerable to Spectre v2. It was probably a debug option from the early paravirt days. So just remove it. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
-
由 Kirill A. Shutemov 提交于
Currently kexec() will crash when switching into a 5-level paging enabled kernel. I missed that we need to change relocate_kernel() to set CR4.LA57 flag if the kernel has 5-level paging enabled. I avoided using #ifdef CONFIG_X86_5LEVEL here and inferred if we need to enable 5-level paging from previous CR4 value. This way the code is ready for boot-time switching between paging modes. With this patch applied, in addition to kexec 4-to-4 which always worked, we can kexec 4-to-5 and 5-to-5 - while 5-to-4 will need more work. Reported-by: NBaoquan He <bhe@redhat.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NBaoquan He <bhe@redhat.com> Cc: <stable@vger.kernel.org> # v4.14+ Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 77ef56e4 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") Link: http://lkml.kernel.org/r/20180129110845.26633-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tim Chen 提交于
Flush indirect branches when switching into a process that marked itself non dumpable. This protects high value processes like gpg better, without having too high performance overhead. If done naïvely, we could switch to a kernel idle thread and then back to the original process, such as: process A -> idle -> process A In such scenario, we do not have to do IBPB here even though the process is non-dumpable, as we are switching back to the same process after a hiatus. To avoid the redundant IBPB, which is expensive, we track the last mm user context ID. The cost is to have an extra u64 mm context id to track the last mm we were using before switching to the init_mm used by idle. Avoiding the extra IBPB is probably worth the extra memory for this common scenario. For those cases where tlb_defer_switch_to_init_mm() returns true (non PCID), lazy tlb will defer switch to init_mm, so we will not be changing the mm for the process A -> idle -> process A switch. So IBPB will be skipped for this case. Thanks to the reviewers and Andy Lutomirski for the suggestion of using ctx_id which got rid of the problem of mm pointer recycling. Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: ak@linux.intel.com Cc: karahmed@amazon.de Cc: arjan@linux.intel.com Cc: torvalds@linux-foundation.org Cc: linux@dominikbrodowski.net Cc: peterz@infradead.org Cc: bp@alien8.de Cc: luto@kernel.org Cc: pbonzini@redhat.com Cc: gregkh@linux-foundation.org Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
-
由 David Woodhouse 提交于
Despite the fact that all the other code there seems to be doing it, just using set_cpu_cap() in early_intel_init() doesn't actually work. For CPUs with PKU support, setup_pku() calls get_cpu_cap() after c->c_init() has set those feature bits. That resets those bits back to what was queried from the hardware. Turning the bits off for bad microcode is easy to fix. That can just use setup_clear_cpu_cap() to force them off for all CPUs. I was less keen on forcing the feature bits *on* that way, just in case of inconsistencies. I appreciate that the kernel is going to get this utterly wrong if CPU features are not consistent, because it has already applied alternatives by the time secondary CPUs are brought up. But at least if setup_force_cpu_cap() isn't being used, we might have a chance of *detecting* the lack of the corresponding bit and either panicking or refusing to bring the offending CPU online. So ensure that the appropriate feature bits are set within get_cpu_cap() regardless of how many extra times it's called. Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: karahmed@amazon.de Cc: peterz@infradead.org Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
-
由 Colin Ian King 提交于
Trivial fix to spelling mistake in pr_err error message text. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kernel-janitors@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180130193218.9271-1-colin.king@canonical.com
-
由 Dan Williams 提交于
Reflect the presence of get_user(), __get_user(), and 'syscall' protections in sysfs. The expectation is that new and better tooling will allow the kernel to grow more usages of array_index_nospec(), for now, only claim mitigation for __user pointer de-references. Reported-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727420158.33451.11658324346540434635.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Dan Williams 提交于
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Dan Williams 提交于
Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. Unlike the __get_user() case get_user() includes the address limit check near the pointer de-reference. With that locality the speculation can be mitigated with pointer narrowing rather than a barrier, i.e. array_index_nospec(). Where the narrowing is performed by: cmp %limit, %ptr sbb %mask, %mask and %mask, %ptr With respect to speculation the value of %ptr is either less than %limit or NULL. Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Dan Williams 提交于
Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the limit check is far away from the user pointer de-reference. In those cases a barrier_nospec() prevents speculation with a potential pointer to privileged memory. uaccess_try_nospec covers get_user_try. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Suggested-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
-