1. 28 2月, 2018 1 次提交
  2. 21 2月, 2018 7 次提交
  3. 20 2月, 2018 1 次提交
  4. 17 2月, 2018 2 次提交
    • D
      x86/entry/64: Use 'xorl' for faster register clearing · ced5d0bf
      Dominik Brodowski 提交于
      On some x86 CPU microarchitectures using 'xorq' to clear general-purpose
      registers is slower than 'xorl'. As 'xorl' is sufficient to clear all
      64 bits of these registers due to zero-extension [*], switch the x86
      64-bit entry code to use 'xorl'.
      
      No change in functionality and no change in code size.
      
      [*] According to Intel 64 and IA-32 Architecture Software Developer's
          Manual, section 3.4.1.1, the result of 32-bit operands are "zero-
          extended to a 64-bit result in the destination general-purpose
          register." The AMD64 Architecture Programmer’s Manual Volume 3,
          Appendix B.1, describes the same behaviour.
      Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net
      [ Improved on the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ced5d0bf
    • D
      x86/entry: Reduce the code footprint of the 'idtentry' macro · 9e809d15
      Dominik Brodowski 提交于
      Play a little trick in the generic PUSH_AND_CLEAR_REGS macro
      to insert the GP registers "above" the original return address.
      
      This allows us to (re-)insert the macro in error_entry() and
      paranoid_entry() and to remove it from the idtentry macro. This
      reduces the static footprint significantly:
      
         text	   data	    bss	    dec	    hex	filename
        24307	      0	      0	  24307	   5ef3	entry_64.o-orig
        20987	      0	      0	  20987	   51fb	entry_64.o
      Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net
      [ Small tweaks to comments. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e809d15
  5. 15 2月, 2018 1 次提交
    • I
      x86/entry/64: Fix CR3 restore in paranoid_exit() · e4865757
      Ingo Molnar 提交于
      Josh Poimboeuf noticed the following bug:
      
       "The paranoid exit code only restores the saved CR3 when it switches back
        to the user GS.  However, even in the kernel GS case, it's possible that
        it needs to restore a user CR3, if for example, the paranoid exception
        occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and
        SWAPGS."
      
      Josh also confirmed via targeted testing that it's possible to hit this bug.
      
      Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch.
      
      The reason we haven't seen this bug reported by users yet is probably because
      "paranoid" entry points are limited to the following cases:
      
       idtentry double_fault       do_double_fault  has_error_code=1  paranoid=2
       idtentry debug              do_debug         has_error_code=0  paranoid=1 shift_ist=DEBUG_STACK
       idtentry int3               do_int3          has_error_code=0  paranoid=1 shift_ist=DEBUG_STACK
       idtentry machine_check      do_mce           has_error_code=0  paranoid=1
      
      Amongst those entry points only machine_check is one that will interrupt an
      IRQS-off critical section asynchronously - and machine check events are rare.
      
      The other main asynchronous entries are NMI entries, which can be very high-freq
      with perf profiling, but they are special: they don't use the 'idtentry' macro but
      are open coded and restore user CR3 unconditionally so don't have this bug.
      Reported-and-tested-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180214073910.boevmg65upbk3vqb@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4865757
  6. 13 2月, 2018 9 次提交
  7. 10 2月, 2018 1 次提交
    • N
      x86/mm/pti: Fix PTI comment in entry_SYSCALL_64() · 14b1fcc6
      Nadav Amit 提交于
      The comment is confusing since the path is taken when
      CONFIG_PAGE_TABLE_ISOLATION=y is disabled (while the comment says it is not
      taken).
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Link: http://lkml.kernel.org/r/20180209170638.15161-1-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      14b1fcc6
  8. 06 2月, 2018 4 次提交
  9. 31 1月, 2018 2 次提交
    • V
      x86/hyperv: Reenlightenment notifications support · 93286261
      Vitaly Kuznetsov 提交于
      Hyper-V supports Live Migration notification. This is supposed to be used
      in conjunction with TSC emulation: when a VM is migrated to a host with
      different TSC frequency for some short period the host emulates the
      accesses to TSC and sends an interrupt to notify about the event. When the
      guest is done updating everything it can disable TSC emulation and
      everything will start working fast again.
      
      These notifications weren't required until now as Hyper-V guests are not
      supposed to use TSC as a clocksource: in Linux the TSC is even marked as
      unstable on boot. Guests normally use 'tsc page' clocksource and host
      updates its values on migrations automatically.
      
      Things change when with nested virtualization: even when the PV
      clocksources (kvm-clock or tsc page) are passed through to the nested
      guests the TSC frequency and frequency changes need to be know..
      
      Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
      EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
      right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
      the fix in on the way.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
      93286261
    • D
      x86/syscall: Sanitize syscall table de-references under speculation · 2fbd7af5
      Dan Williams 提交于
      The syscall table base is a user controlled function pointer in kernel
      space. Use array_index_nospec() to prevent any out of bounds speculation.
      
      While retpoline prevents speculating into a userspace directed target it
      does not stop the pointer de-reference, the concern is leaking memory
      relative to the syscall table base, by observing instruction cache
      behavior.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
      2fbd7af5
  10. 30 1月, 2018 3 次提交
  11. 28 1月, 2018 1 次提交
  12. 19 1月, 2018 1 次提交
  13. 15 1月, 2018 1 次提交
    • D
      x86/retpoline: Fill RSB on context switch for affected CPUs · c995efd5
      David Woodhouse 提交于
      On context switch from a shallow call stack to a deeper one, as the CPU
      does 'ret' up the deeper side it may encounter RSB entries (predictions for
      where the 'ret' goes to) which were populated in userspace.
      
      This is problematic if neither SMEP nor KPTI (the latter of which marks
      userspace pages as NX for the kernel) are active, as malicious code in
      userspace may then be executed speculatively.
      
      Overwrite the CPU's return prediction stack with calls which are predicted
      to return to an infinite loop, to "capture" speculation if this
      happens. This is required both for retpoline, and also in conjunction with
      IBRS for !SMEP && !KPTI.
      
      On Skylake+ the problem is slightly different, and an *underflow* of the
      RSB may cause errant branch predictions to occur. So there it's not so much
      overwrite, as *filling* the RSB to attempt to prevent it getting
      empty. This is only a partial solution for Skylake+ since there are many
      other conditions which may result in the RSB becoming empty. The full
      solution on Skylake+ is to use IBRS, which will prevent the problem even
      when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
      required on context switch.
      
      [ tglx: Added missing vendor check and slighty massaged comments and
        	changelog ]
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
      c995efd5
  14. 14 1月, 2018 1 次提交
    • T
      x86/pti: Fix !PCID and sanitize defines · f10ee3dc
      Thomas Gleixner 提交于
      The switch to the user space page tables in the low level ASM code sets
      unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
      address of the page directory to the user part, bit 11 is switching the
      PCID to the PCID associated with the user page tables.
      
      This fails on a machine which lacks PCID support because bit 11 is set in
      CR3. Bit 11 is reserved when PCID is inactive.
      
      While the Intel SDM claims that the reserved bits are ignored when PCID is
      disabled, the AMD APM states that they should be cleared.
      
      This went unnoticed as the AMD APM was not checked when the code was
      developed and reviewed and test systems with Intel CPUs never failed to
      boot. The report is against a Centos 6 host where the guest fails to boot,
      so it's not yet clear whether this is a virt issue or can happen on real
      hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
      the reserved bits.
      
      Make sure that on non PCID machines bit 11 is not set by the page table
      switching code.
      
      Andy suggested to rename the related bits and masks so they are clearly
      describing what they should be used for, which is done as well for clarity.
      
      That split could have been done with alternatives but the macro hell is
      horrible and ugly. This can be done on top if someone cares to remove the
      extra orq. For now it's a straight forward fix.
      
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
      f10ee3dc
  15. 12 1月, 2018 1 次提交
    • D
      x86/retpoline/entry: Convert entry assembler indirect jumps · 2641f08b
      David Woodhouse 提交于
      Convert indirect jumps in core 32/64bit entry assembler code to use
      non-speculative sequences when CONFIG_RETPOLINE is enabled.
      
      Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return
      address after the 'call' instruction must be *precisely* at the
      .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
      and the use of alternatives will mess that up unless we play horrid
      games to prepend with NOPs and make the variants the same length. It's
      not worth it; in the case where we ALTERNATIVE out the retpoline, the
      first instruction at __x86.indirect_thunk.rax is going to be a bare
      jmp *%rax anyway.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
      2641f08b
  16. 04 1月, 2018 1 次提交
  17. 24 12月, 2017 3 次提交
    • P
      x86/mm: Optimize RESTORE_CR3 · 21e94459
      Peter Zijlstra 提交于
      Most NMI/paranoid exceptions will not in fact change pagetables and would
      thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3
      writes.
      
      Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the
      kernel mappings and now that we track which user PCIDs need flushing we can
      avoid those too when possible.
      
      This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both
      sites have plenty available.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21e94459
    • P
      x86/mm: Use/Fix PCID to optimize user/kernel switches · 6fd166aa
      Peter Zijlstra 提交于
      We can use PCID to retain the TLBs across CR3 switches; including those now
      part of the user/kernel switch. This increases performance of kernel
      entry/exit at the cost of more expensive/complicated TLB flushing.
      
      Now that we have two address spaces, one for kernel and one for user space,
      we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
      (just like we use the PFN LSB for the PGD). Since we do TLB invalidation
      from kernel space, the existing code will only invalidate the kernel PCID,
      we augment that by marking the corresponding user PCID invalid, and upon
      switching back to userspace, use a flushing CR3 write for the switch.
      
      In order to access the user_pcid_flush_mask we use PER_CPU storage, which
      means the previously established SWAPGS vs CR3 ordering is now mandatory
      and required.
      
      Having to do this memory access does require additional registers, most
      sites have a functioning stack and we can spill one (RAX), sites without
      functional stack need to otherwise provide the second scratch register.
      
      Note: PCID is generally available on Intel Sandybridge and later CPUs.
      Note: Up until this point TLB flushing was broken in this series.
      
      Based-on-code-from: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6fd166aa
    • A
      x86/pti: Map the vsyscall page if needed · 85900ea5
      Andy Lutomirski 提交于
      Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
      space visible page tables.
      
      [ tglx: Hide unused functions (Patch by Arnd Bergmann) ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85900ea5