1. 28 3月, 2018 1 次提交
  2. 21 3月, 2018 1 次提交
    • L
      kvm/x86: fix icebp instruction handling · 32d43cd3
      Linus Torvalds 提交于
      The undocumented 'icebp' instruction (aka 'int1') works pretty much like
      'int3' in the absense of in-circuit probing equipment (except,
      obviously, that it raises #DB instead of raising #BP), and is used by
      some validation test-suites as such.
      
      But Andy Lutomirski noticed that his test suite acted differently in kvm
      than on bare hardware.
      
      The reason is that kvm used an inexact test for the icebp instruction:
      it just assumed that an all-zero VM exit qualification value meant that
      the VM exit was due to icebp.
      
      That is not unlike the guess that do_debug() does for the actual
      exception handling case, but it's purely a heuristic, not an absolute
      rule.  do_debug() does it because it wants to ascribe _some_ reasons to
      the #DB that happened, and an empty %dr6 value means that 'icebp' is the
      most likely casue and we have no better information.
      
      But kvm can just do it right, because unlike the do_debug() case, kvm
      actually sees the real reason for the #DB in the VM-exit interruption
      information field.
      
      So instead of relying on an inexact heuristic, just use the actual VM
      exit information that says "it was 'icebp'".
      
      Right now the 'icebp' instruction isn't technically documented by Intel,
      but that will hopefully change.  The special "privileged software
      exception" information _is_ actually mentioned in the Intel SDM, even
      though the cause of it isn't enumerated.
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d43cd3
  3. 17 3月, 2018 1 次提交
  4. 14 3月, 2018 1 次提交
  5. 12 3月, 2018 2 次提交
  6. 09 3月, 2018 1 次提交
    • F
      x86/kprobes: Fix kernel crash when probing .entry_trampoline code · c07a8f8b
      Francis Deslauriers 提交于
      Disable the kprobe probing of the entry trampoline:
      
      .entry_trampoline is a code area that is used to ensure page table
      isolation between userspace and kernelspace.
      
      At the beginning of the execution of the trampoline, we load the
      kernel's CR3 register. This has the effect of enabling the translation
      of the kernel virtual addresses to physical addresses. Before this
      happens most kernel addresses can not be translated because the running
      process' CR3 is still used.
      
      If a kprobe is placed on the trampoline code before that change of the
      CR3 register happens the kernel crashes because int3 handling pages are
      not accessible.
      
      To fix this, add the .entry_trampoline section to the kprobe blacklist
      to prohibit the probing of code before all the kernel pages are
      accessible.
      Signed-off-by: NFrancis Deslauriers <francis.deslauriers@efficios.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: mathieu.desnoyers@efficios.com
      Cc: mhiramat@kernel.org
      Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c07a8f8b
  7. 08 3月, 2018 1 次提交
  8. 07 3月, 2018 3 次提交
  9. 02 3月, 2018 2 次提交
  10. 01 3月, 2018 1 次提交
    • T
      x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table · 945fd17a
      Thomas Gleixner 提交于
      The separation of the cpu_entry_area from the fixmap missed the fact that
      on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
      initial_page_table by the previous synchronizations.
      
      This results in suspend/resume failures because 32bit utilizes initial page
      table for resume. The absence of the cpu_entry_area mapping results in a
      triple fault, aka. insta reboot.
      
      With PAE enabled this works by chance because the PGD entry which covers
      the fixmap and other parts incindentally provides the cpu_entry_area
      mapping as well.
      
      Synchronize the initial page table after setting up the cpu entry
      area. Instead of adding yet another copy of the same code, move it to a
      function and invoke it from the various places.
      
      It needs to be investigated if the existing calls in setup_arch() and
      setup_per_cpu_areas() can be replaced by the later invocation from
      setup_cpu_entry_areas(), but that's beyond the scope of this fix.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NWoody Suwalski <terraluna977@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NWoody Suwalski <terraluna977@gmail.com>
      Cc: William Grant <william.grant@canonical.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
      945fd17a
  11. 28 2月, 2018 2 次提交
  12. 24 2月, 2018 1 次提交
  13. 23 2月, 2018 1 次提交
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
  14. 21 2月, 2018 3 次提交
  15. 20 2月, 2018 5 次提交
  16. 17 2月, 2018 3 次提交
  17. 15 2月, 2018 7 次提交
  18. 13 2月, 2018 2 次提交
    • T
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck 提交于
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0e786d
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
  19. 07 2月, 2018 1 次提交
  20. 06 2月, 2018 1 次提交