1. 16 3月, 2018 1 次提交
  2. 14 3月, 2018 1 次提交
  3. 09 3月, 2018 1 次提交
    • F
      x86/kprobes: Fix kernel crash when probing .entry_trampoline code · c07a8f8b
      Francis Deslauriers 提交于
      Disable the kprobe probing of the entry trampoline:
      
      .entry_trampoline is a code area that is used to ensure page table
      isolation between userspace and kernelspace.
      
      At the beginning of the execution of the trampoline, we load the
      kernel's CR3 register. This has the effect of enabling the translation
      of the kernel virtual addresses to physical addresses. Before this
      happens most kernel addresses can not be translated because the running
      process' CR3 is still used.
      
      If a kprobe is placed on the trampoline code before that change of the
      CR3 register happens the kernel crashes because int3 handling pages are
      not accessible.
      
      To fix this, add the .entry_trampoline section to the kprobe blacklist
      to prohibit the probing of code before all the kernel pages are
      accessible.
      Signed-off-by: NFrancis Deslauriers <francis.deslauriers@efficios.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: mathieu.desnoyers@efficios.com
      Cc: mhiramat@kernel.org
      Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c07a8f8b
  4. 08 3月, 2018 8 次提交
  5. 07 3月, 2018 1 次提交
  6. 01 3月, 2018 1 次提交
    • T
      x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table · 945fd17a
      Thomas Gleixner 提交于
      The separation of the cpu_entry_area from the fixmap missed the fact that
      on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
      initial_page_table by the previous synchronizations.
      
      This results in suspend/resume failures because 32bit utilizes initial page
      table for resume. The absence of the cpu_entry_area mapping results in a
      triple fault, aka. insta reboot.
      
      With PAE enabled this works by chance because the PGD entry which covers
      the fixmap and other parts incindentally provides the cpu_entry_area
      mapping as well.
      
      Synchronize the initial page table after setting up the cpu entry
      area. Instead of adding yet another copy of the same code, move it to a
      function and invoke it from the various places.
      
      It needs to be investigated if the existing calls in setup_arch() and
      setup_per_cpu_areas() can be replaced by the later invocation from
      setup_cpu_entry_areas(), but that's beyond the scope of this fix.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NWoody Suwalski <terraluna977@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NWoody Suwalski <terraluna977@gmail.com>
      Cc: William Grant <william.grant@canonical.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
      945fd17a
  7. 21 2月, 2018 2 次提交
  8. 20 2月, 2018 2 次提交
  9. 17 2月, 2018 3 次提交
  10. 15 2月, 2018 5 次提交
  11. 13 2月, 2018 3 次提交
    • I
      x86/speculation: Clean up various Spectre related details · 21e433bd
      Ingo Molnar 提交于
      Harmonize all the Spectre messages so that a:
      
          dmesg | grep -i spectre
      
      ... gives us most Spectre related kernel boot messages.
      
      Also fix a few other details:
      
       - clarify a comment about firmware speculation control
      
       - s/KPTI/PTI
      
       - remove various line-breaks that made the code uglier
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21e433bd
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
    • D
      x86/speculation: Correct Speculation Control microcode blacklist again · d37fc6d3
      David Woodhouse 提交于
      Arjan points out that the Intel document only clears the 0xc2 microcode
      on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3).
      For the Skylake H/S platform it's OK but for Skylake E3 which has the
      same CPUID it isn't (yet) cleared.
      
      So removing it from the blacklist was premature. Put it back for now.
      
      Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was
      featured in one of the early revisions of the Intel document was never
      released to the public, and won't be until/unless it is also validated
      as safe. So those can change to 0x80 which is what all *other* versions
      of the doc have identified.
      
      Once the retrospective testing of existing public microcodes is done, we
      should be back into a mode where new microcodes are only released in
      batches and we shouldn't even need to update the blacklist for those
      anyway, so this tweaking of the list isn't expected to be a thing which
      keeps happening.
      Requested-by: NArjan van de Ven <arjan.van.de.ven@intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d37fc6d3
  12. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  13. 11 2月, 2018 1 次提交
    • D
      x86/speculation: Update Speculation Control microcode blacklist · 17513420
      David Woodhouse 提交于
      Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
      and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
      too. We blacklisted the latter purely because it was present with all
      the other problematic ones in the 2018-01-08 release, but now it's
      explicitly listed as OK.
      
      We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
      that appeared in one version of the blacklist and then reverted to
      0x80 again. We can change it if 0x84 is actually announced to be safe.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-2-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17513420
  14. 07 2月, 2018 2 次提交
  15. 03 2月, 2018 2 次提交
    • A
      x86/dumpstack: Avoid uninitlized variable · ebfc1501
      Arnd Bergmann 提交于
      In some configurations, 'partial' does not get initialized, as shown by
      this gcc-8 warning:
      
      arch/x86/kernel/dumpstack.c: In function 'show_trace_log_lvl':
      arch/x86/kernel/dumpstack.c:156:4: error: 'partial' may be used uninitialized in this function [-Werror=maybe-uninitialized]
          show_regs_if_on_stack(&stack_info, regs, partial);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      This initializes it to false, to get the previous behavior in this case.
      
      Fixes: a9cdbe72 ("x86/dumpstack: Fix partial register dumps")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20180202145634.200291-1-arnd@arndb.de
      ebfc1501
    • A
      x86/pti: Mark constant arrays as __initconst · 4bf5d56d
      Arnd Bergmann 提交于
      I'm seeing build failures from the two newly introduced arrays that
      are marked 'const' and '__initdata', which are mutually exclusive:
      
      arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init'
      arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init'
      
      The correct annotation is __initconst.
      
      Fixes: fec9434a ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de
      4bf5d56d
  16. 02 2月, 2018 1 次提交
  17. 31 1月, 2018 5 次提交
    • J
      x86/paravirt: Remove 'noreplace-paravirt' cmdline option · 12c69f1e
      Josh Poimboeuf 提交于
      The 'noreplace-paravirt' option disables paravirt patching, leaving the
      original pv indirect calls in place.
      
      That's highly incompatible with retpolines, unless we want to uglify
      paravirt even further and convert the paravirt calls to retpolines.
      
      As far as I can tell, the option doesn't seem to be useful for much
      other than introducing surprising corner cases and making the kernel
      vulnerable to Spectre v2.  It was probably a debug option from the early
      paravirt days.  So just remove it.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
      12c69f1e
    • K
      x86/kexec: Make kexec (mostly) work in 5-level paging mode · 5bf30316
      Kirill A. Shutemov 提交于
      Currently kexec() will crash when switching into a 5-level paging
      enabled kernel.
      
      I missed that we need to change relocate_kernel() to set CR4.LA57
      flag if the kernel has 5-level paging enabled.
      
      I avoided using #ifdef CONFIG_X86_5LEVEL here and inferred if we need to
      enable 5-level paging from previous CR4 value. This way the code is
      ready for boot-time switching between paging modes.
      
      With this patch applied, in addition to kexec 4-to-4 which always worked,
      we can kexec 4-to-5 and 5-to-5 - while 5-to-4 will need more work.
      Reported-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NBaoquan He <bhe@redhat.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 77ef56e4 ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
      Link: http://lkml.kernel.org/r/20180129110845.26633-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5bf30316
    • V
      x86/irq: Count Hyper-V reenlightenment interrupts · 51d4e5da
      Vitaly Kuznetsov 提交于
      Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
      they are not interesting in general it's important when L2 nested guests
      are running.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com
      51d4e5da
    • V
      x86/hyperv: Reenlightenment notifications support · 93286261
      Vitaly Kuznetsov 提交于
      Hyper-V supports Live Migration notification. This is supposed to be used
      in conjunction with TSC emulation: when a VM is migrated to a host with
      different TSC frequency for some short period the host emulates the
      accesses to TSC and sends an interrupt to notify about the event. When the
      guest is done updating everything it can disable TSC emulation and
      everything will start working fast again.
      
      These notifications weren't required until now as Hyper-V guests are not
      supposed to use TSC as a clocksource: in Linux the TSC is even marked as
      unstable on boot. Guests normally use 'tsc page' clocksource and host
      updates its values on migrations automatically.
      
      Things change when with nested virtualization: even when the PV
      clocksources (kvm-clock or tsc page) are passed through to the nested
      guests the TSC frequency and frequency changes need to be know..
      
      Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
      EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
      right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
      the fix in on the way.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
      93286261
    • D
      x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel · 7fcae111
      David Woodhouse 提交于
      Despite the fact that all the other code there seems to be doing it, just
      using set_cpu_cap() in early_intel_init() doesn't actually work.
      
      For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
      c->c_init() has set those feature bits. That resets those bits back to what
      was queried from the hardware.
      
      Turning the bits off for bad microcode is easy to fix. That can just use
      setup_clear_cpu_cap() to force them off for all CPUs.
      
      I was less keen on forcing the feature bits *on* that way, just in case
      of inconsistencies. I appreciate that the kernel is going to get this
      utterly wrong if CPU features are not consistent, because it has already
      applied alternatives by the time secondary CPUs are brought up.
      
      But at least if setup_force_cpu_cap() isn't being used, we might have a
      chance of *detecting* the lack of the corresponding bit and either
      panicking or refusing to bring the offending CPU online.
      
      So ensure that the appropriate feature bits are set within get_cpu_cap()
      regardless of how many extra times it's called.
      
      Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: karahmed@amazon.de
      Cc: peterz@infradead.org
      Cc: bp@alien8.de
      Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
      7fcae111