1. 06 5月, 2022 2 次提交
  2. 12 5月, 2021 2 次提交
  3. 04 9月, 2020 1 次提交
  4. 21 8月, 2020 1 次提交
    • S
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM · 6a3ea3e6
      Sean Christopherson 提交于
      KVM has an optmization to avoid expensive MRS read/writes on
      VMENTER/EXIT. It caches the MSR values and restores them either when
      leaving the run loop, on preemption or when going out to user space.
      
      The affected MSRs are not required for kernel context operations. This
      changed with the recently introduced mechanism to handle FSGSBASE in the
      paranoid entry code which has to retrieve the kernel GSBASE value by
      accessing per CPU memory. The mechanism needs to retrieve the CPU number
      and uses either LSL or RDPID if the processor supports it.
      
      Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
      lazily restored MSRs, which means between the point where the guest value
      is written and the point of restore, MSR_TSC_AUX contains a random number.
      
      If an NMI or any other exception which uses the paranoid entry path happens
      in such a context, then RDPID returns the random guest MSR_TSC_AUX value.
      
      As a consequence this reads from the wrong memory location to retrieve the
      kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
      operations. If the GSBASE in the exception handler points to the per CPU
      memory of a different CPU then this has the obvious consequences of data
      corruption and crashes.
      
      As the paranoid entry path is the only place which accesses MSR_TSX_AUX
      (via RDPID) and the fallback via LSL is not significantly slower, remove
      the RDPID alternative from the entry path and always use LSL.
      
      The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
      which would be inflicting massive overhead on that code path.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Reported-by: NTom Lendacky <thomas.lendacky@amd.com>
      Debugged-by: NTom Lendacky <thomas.lendacky@amd.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
      6a3ea3e6
  5. 18 6月, 2020 2 次提交
  6. 11 6月, 2020 2 次提交
  7. 25 4月, 2020 1 次提交
  8. 29 10月, 2019 1 次提交
  9. 18 7月, 2019 1 次提交
  10. 09 7月, 2019 1 次提交
    • J
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations · 18ec54fd
      Josh Poimboeuf 提交于
      
      Spectre v1 isn't only about array bounds checks.  It can affect any
      conditional checks.  The kernel entry code interrupt, exception, and NMI
      handlers all have conditional swapgs checks.  Those may be problematic in
      the context of Spectre v1, as kernel code can speculatively run with a user
      GS.
      
      For example:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg
      	mov (%reg), %reg1
      
      When coming from user space, the CPU can speculatively skip the swapgs, and
      then do a speculative percpu load using the user GS value.  So the user can
      speculatively force a read of any kernel value.  If a gadget exists which
      uses the percpu value as an address in another load/store, then the
      contents of the kernel value may become visible via an L1 side channel
      attack.
      
      A similar attack exists when coming from kernel space.  The CPU can
      speculatively do the swapgs, causing the user GS to get used for the rest
      of the speculative window.
      
      The mitigation is similar to a traditional Spectre v1 mitigation, except:
      
        a) index masking isn't possible; because the index (percpu offset)
           isn't user-controlled; and
      
        b) an lfence is needed in both the "from user" swapgs path and the
           "from kernel" non-swapgs path (because of the two attacks described
           above).
      
      The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
      CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
      lfences can be skipped in those cases.
      
      On the other hand, the kernel entry swapgs paths don't depend on PTI.
      
      To avoid unnecessary lfences for the user entry case, create two separate
      features for alternative patching:
      
        X86_FEATURE_FENCE_SWAPGS_USER
        X86_FEATURE_FENCE_SWAPGS_KERNEL
      
      Use these features in entry code to patch in lfences where needed.
      
      The features aren't enabled yet, so there's no functional change.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      18ec54fd
  11. 03 7月, 2019 1 次提交
    • T
      x86/fsgsbase: Revert FSGSBASE support · 049331f2
      Thomas Gleixner 提交于
      The FSGSBASE series turned out to have serious bugs and there is still an
      open issue which is not fully understood yet.
      
      The confidence in those changes has become close to zero especially as the
      test cases which have been shipped with that series were obviously never
      run before sending the final series out to LKML.
      
        ./fsgsbase_64 >/dev/null
        Segmentation fault
      
      As the merge window is close, the only sane decision is to revert FSGSBASE
      support. The revert is necessary as this branch has been merged into
      perf/core already and rebasing all of that a few days before the merge
      window is not the most brilliant idea.
      
      I could definitely slap myself for not noticing the test case fail when
      merging that series, but TBH my expectations weren't that low back
      then. Won't happen again.
      
      Revert the following commits:
      539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
      2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
      f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
      2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
      5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
      708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
      79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
      f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
      1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
      a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
      8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
      b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      049331f2
  12. 25 6月, 2019 1 次提交
  13. 22 6月, 2019 2 次提交
  14. 06 1月, 2019 1 次提交
    • M
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada 提交于
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  15. 19 12月, 2018 1 次提交
    • I
      Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" · e769742d
      Ingo Molnar 提交于
      This reverts commit 5bdcd510.
      
      The macro based workarounds for GCC's inlining bugs caused regressions: distcc
      and other distro build setups broke, and the fixes are not easy nor will they
      solve regressions on already existing installations.
      
      So we are reverting this patch and the 8 followup patches.
      
      What makes this revert easier is that GCC9 will likely include the new 'asm inline'
      syntax that makes inlining of assembly blocks a lot more robust.
      
      This is a superior method to any macro based hackeries - and might even be
      backported to GCC8, which would make all modern distros get the inlining
      fixes as well.
      
      Many thanks to Masahiro Yamada and others for helping sort out these problems.
      Reported-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Richard Biener <rguenther@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e769742d
  16. 06 10月, 2018 1 次提交
  17. 05 9月, 2018 1 次提交
    • A
      x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls · afaef01c
      Alexander Popov 提交于
      The STACKLEAK feature (initially developed by PaX Team) has the following
      benefits:
      
      1. Reduces the information that can be revealed through kernel stack leak
         bugs. The idea of erasing the thread stack at the end of syscalls is
         similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
         crypto, which all comply with FDP_RIP.2 (Full Residual Information
         Protection) of the Common Criteria standard.
      
      2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
         CVE-2010-2963). That kind of bugs should be killed by improving C
         compilers in future, which might take a long time.
      
      This commit introduces the code filling the used part of the kernel
      stack with a poison value before returning to userspace. Full
      STACKLEAK feature also contains the gcc plugin which comes in a
      separate commit.
      
      The STACKLEAK feature is ported from grsecurity/PaX. More information at:
        https://grsecurity.net/
        https://pax.grsecurity.net/
      
      This code is modified from Brad Spengler/PaX Team's code in the last
      public patch of grsecurity/PaX based on our understanding of the code.
      Changes or omissions from the original code are ours and don't reflect
      the original grsecurity/PaX code.
      
      Performance impact:
      
      Hardware: Intel Core i7-4770, 16 GB RAM
      
      Test #1: building the Linux kernel on a single core
              0.91% slowdown
      
      Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
              4.2% slowdown
      
      So the STACKLEAK description in Kconfig includes: "The tradeoff is the
      performance impact: on a single CPU system kernel compilation sees a 1%
      slowdown, other systems and workloads may vary and you are advised to
      test this feature on your expected workload before deploying it".
      Signed-off-by: NAlexander Popov <alex.popov@linux.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      afaef01c
  18. 05 4月, 2018 1 次提交
    • D
      syscalls/x86: Extend register clearing on syscall entry to lower registers · 6dc936f1
      Dominik Brodowski 提交于
      To reduce the chance that random user space content leaks down the call
      chain in registers, also clear lower registers on syscall entry:
      
      For 64-bit syscalls, extend the register clearing in PUSH_AND_CLEAR_REGS
      to %dx and %cx. This should not hurt at all, also on the other callers
      of that macro. We do not need to clear %rdi and %rsi for syscall entry,
      as those registers are used to pass the parameters to do_syscall_64().
      
      For the 32-bit compat syscalls, do_int80_syscall_32() and
      do_fast_syscall_32() each only take one parameter. Therefore, extend the
      register clearing to %dx, %cx, and %si in entry_SYSCALL_compat and
      entry_INT80_compat.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-8-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6dc936f1
  19. 21 2月, 2018 1 次提交
  20. 17 2月, 2018 2 次提交
    • D
      x86/entry/64: Use 'xorl' for faster register clearing · ced5d0bf
      Dominik Brodowski 提交于
      On some x86 CPU microarchitectures using 'xorq' to clear general-purpose
      registers is slower than 'xorl'. As 'xorl' is sufficient to clear all
      64 bits of these registers due to zero-extension [*], switch the x86
      64-bit entry code to use 'xorl'.
      
      No change in functionality and no change in code size.
      
      [*] According to Intel 64 and IA-32 Architecture Software Developer's
          Manual, section 3.4.1.1, the result of 32-bit operands are "zero-
          extended to a 64-bit result in the destination general-purpose
          register." The AMD64 Architecture Programmer’s Manual Volume 3,
          Appendix B.1, describes the same behaviour.
      Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net
      [ Improved on the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ced5d0bf
    • D
      x86/entry: Reduce the code footprint of the 'idtentry' macro · 9e809d15
      Dominik Brodowski 提交于
      Play a little trick in the generic PUSH_AND_CLEAR_REGS macro
      to insert the GP registers "above" the original return address.
      
      This allows us to (re-)insert the macro in error_entry() and
      paranoid_entry() and to remove it from the idtentry macro. This
      reduces the static footprint significantly:
      
         text	   data	    bss	    dec	    hex	filename
        24307	      0	      0	  24307	   5ef3	entry_64.o-orig
        20987	      0	      0	  20987	   51fb	entry_64.o
      Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net
      [ Small tweaks to comments. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e809d15
  21. 13 2月, 2018 8 次提交
  22. 06 2月, 2018 1 次提交
  23. 14 1月, 2018 1 次提交
    • T
      x86/pti: Fix !PCID and sanitize defines · f10ee3dc
      Thomas Gleixner 提交于
      The switch to the user space page tables in the low level ASM code sets
      unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
      address of the page directory to the user part, bit 11 is switching the
      PCID to the PCID associated with the user page tables.
      
      This fails on a machine which lacks PCID support because bit 11 is set in
      CR3. Bit 11 is reserved when PCID is inactive.
      
      While the Intel SDM claims that the reserved bits are ignored when PCID is
      disabled, the AMD APM states that they should be cleared.
      
      This went unnoticed as the AMD APM was not checked when the code was
      developed and reviewed and test systems with Intel CPUs never failed to
      boot. The report is against a Centos 6 host where the guest fails to boot,
      so it's not yet clear whether this is a virt issue or can happen on real
      hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
      the reserved bits.
      
      Make sure that on non PCID machines bit 11 is not set by the page table
      switching code.
      
      Andy suggested to rename the related bits and masks so they are clearly
      describing what they should be used for, which is done as well for clarity.
      
      That split could have been done with alternatives but the macro hell is
      horrible and ugly. This can be done on top if someone cares to remove the
      extra orq. For now it's a straight forward fix.
      
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
      f10ee3dc
  24. 24 12月, 2017 4 次提交
    • P
      x86/mm: Optimize RESTORE_CR3 · 21e94459
      Peter Zijlstra 提交于
      Most NMI/paranoid exceptions will not in fact change pagetables and would
      thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3
      writes.
      
      Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the
      kernel mappings and now that we track which user PCIDs need flushing we can
      avoid those too when possible.
      
      This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both
      sites have plenty available.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21e94459
    • P
      x86/mm: Use/Fix PCID to optimize user/kernel switches · 6fd166aa
      Peter Zijlstra 提交于
      We can use PCID to retain the TLBs across CR3 switches; including those now
      part of the user/kernel switch. This increases performance of kernel
      entry/exit at the cost of more expensive/complicated TLB flushing.
      
      Now that we have two address spaces, one for kernel and one for user space,
      we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
      (just like we use the PFN LSB for the PGD). Since we do TLB invalidation
      from kernel space, the existing code will only invalidate the kernel PCID,
      we augment that by marking the corresponding user PCID invalid, and upon
      switching back to userspace, use a flushing CR3 write for the switch.
      
      In order to access the user_pcid_flush_mask we use PER_CPU storage, which
      means the previously established SWAPGS vs CR3 ordering is now mandatory
      and required.
      
      Having to do this memory access does require additional registers, most
      sites have a functioning stack and we can spill one (RAX), sites without
      functional stack need to otherwise provide the second scratch register.
      
      Note: PCID is generally available on Intel Sandybridge and later CPUs.
      Note: Up until this point TLB flushing was broken in this series.
      
      Based-on-code-from: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6fd166aa
    • T
      x86/mm/pti: Add infrastructure for page table isolation · aa8c6248
      Thomas Gleixner 提交于
      Add the initial files for kernel page table isolation, with a minimal init
      function and the boot time detection for this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aa8c6248
    • D
      x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching · 8a09317b
      Dave Hansen 提交于
      PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it
      enters the kernel and switch back when it exits.  This essentially needs to
      be done before leaving assembly code.
      
      This is extra challenging because the switching context is tricky: the
      registers that can be clobbered can vary.  It is also hard to store things
      on the stack because there is an established ABI (ptregs) or the stack is
      entirely unsafe to use.
      
      Establish a set of macros that allow changing to the user and kernel CR3
      values.
      
      Interactions with SWAPGS:
      
        Previous versions of the PAGE_TABLE_ISOLATION code relied on having
        per-CPU scratch space to save/restore a register that can be used for the
        CR3 MOV.  The %GS register is used to index into our per-CPU space, so
        SWAPGS *had* to be done before the CR3 switch.  That scratch space is gone
        now, but the semantic that SWAPGS must be done before the CR3 MOV is
        retained.  This is good to keep because it is not that hard to do and it
        allows to do things like add per-CPU debugging information.
      
      What this does in the NMI code is worth pointing out.  NMIs can interrupt
      *any* context and they can also be nested with NMIs interrupting other
      NMIs.  The comments below ".Lnmi_from_kernel" explain the format of the
      stack during this situation.  Changing the format of this stack is hard.
      Instead of storing the old CR3 value on the stack, this depends on the
      *regular* register save/restore mechanism and then uses %r14 to keep CR3
      during the NMI.  It is callee-saved and will not be clobbered by the C NMI
      handlers that get called.
      
      [ PeterZ: ESPFIX optimization ]
      
      Based-on-code-from: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8a09317b