1. 02 11月, 2017 1 次提交
    • R
      x86/mm: Relocate page fault error codes to traps.h · 1067f030
      Ricardo Neri 提交于
      Up to this point, only fault.c used the definitions of the page fault error
      codes. Thus, it made sense to keep them within such file. Other portions of
      code might be interested in those definitions too. For instance, the User-
      Mode Instruction Prevention emulation code will use such definitions to
      emulate a page fault when it is unable to successfully copy the results
      of the emulated instructions to user space.
      
      While relocating the error code enumeration, the prefix X86_ is used to
      make it consistent with the rest of the definitions in traps.h. Of course,
      code using the enumeration had to be updated as well. No functional changes
      were performed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1067f030
  2. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  3. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  4. 18 10月, 2017 2 次提交
  5. 14 10月, 2017 1 次提交
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  6. 10 10月, 2017 1 次提交
  7. 09 10月, 2017 1 次提交
  8. 05 10月, 2017 2 次提交
    • B
      x86/mce: Hide mca_cfg · 262e6811
      Borislav Petkov 提交于
      Now that lguest is gone, put it in the internal header which should be
      used only by MCA/RAS code.
      
      Add missing header guards while at it.
      
      No functional change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
      262e6811
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  9. 30 9月, 2017 1 次提交
  10. 29 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for GCC 4.4 · 520a13c5
      Josh Poimboeuf 提交于
      The kernel test bot (run by Xiaolong Ye) reported that the following commit:
      
        f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      
      is causing double faults in a kernel compiled with GCC 4.4.
      
      Linus subsequently diagnosed the crash pattern and the buggy commit and found that
      the issue is with this code:
      
        register unsigned int __asm_call_sp asm("esp");
        #define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
      
      Even on a 64-bit kernel, it's using ESP instead of RSP.  That causes GCC
      to produce the following bogus code:
      
        ffffffff8147461d:       89 e0                   mov    %esp,%eax
        ffffffff8147461f:       4c 89 f7                mov    %r14,%rdi
        ffffffff81474622:       4c 89 fe                mov    %r15,%rsi
        ffffffff81474625:       ba 20 00 00 00          mov    $0x20,%edx
        ffffffff8147462a:       89 c4                   mov    %eax,%esp
        ffffffff8147462c:       e8 bf 52 05 00          callq  ffffffff814c98f0 <copy_user_generic_unrolled>
      
      Despite the absurdity of it backing up and restoring the stack pointer
      for no reason, the bug is actually the fact that it's only backing up
      and restoring the lower 32 bits of the stack pointer.  The upper 32 bits
      are getting cleared out, corrupting the stack pointer.
      
      So change the '__asm_call_sp' register variable to be associated with
      the actual full-size stack pointer.
      
      This also requires changing the __ASM_SEL() macro to be based on the
      actual compiled arch size, rather than the CONFIG value, because
      CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
      Otherwise Clang fails to build the kernel because it complains about the
      use of a 64-bit register (RSP) in a 32-bit file.
      Reported-and-Bisected-and-Tested-by: Nkernel test robot <xiaolong.ye@intel.com>
      Diagnosed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      520a13c5
  11. 26 9月, 2017 5 次提交
    • E
      x86/fpu: Introduce validate_xstate_header() · e63e5d5c
      Eric Biggers 提交于
      Move validation of user-supplied xstate_header into a helper function,
      in preparation of calling it from both the ptrace and sigreturn syscall
      paths.
      
      The new function also considers it to be an error if *any* reserved bits
      are set, whereas before we were just clearing most of them silently.
      
      This should reduce the chance of bugs that fail to correctly validate
      user-supplied XSAVE areas.  It also will expose any broken userspace
      programs that set the other reserved bits; this is desirable because
      such programs will lose compatibility with future CPUs and kernels if
      those bits are ever used for anything.  (There shouldn't be any such
      programs, and in fact in the case where the compacted format is in use
      we were already validating xfeatures.  But you never know...)
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e63e5d5c
    • I
      x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]() · 369a036d
      Ingo Molnar 提交于
      As per the new nomenclature we don't 'activate' the FPU state
      anymore, we initialize it. So drop the _activate_fpstate name
      from these functions, which were a bit of a mouthful anyway,
      and name them:
      
      	fpu__prepare_read()
      	fpu__prepare_write()
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      369a036d
    • I
      x86/fpu: Rename fpu__activate_curr() to fpu__initialize() · 2ce03d85
      Ingo Molnar 提交于
      Rename this function to better express that it's all about
      initializing the FPU state of a task which goes hand in hand
      with the fpu::initialized field.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ce03d85
    • I
      x86/fpu: Rename fpu::fpstate_active to fpu::initialized · e4a81bfc
      Ingo Molnar 提交于
      The x86 FPU code used to have a complex state machine where both the FPU
      registers and the FPU state context could be 'active' (or inactive)
      independently of each other - which enabled features like lazy FPU restore.
      
      Much of this complexity is gone in the current code: now we basically can
      have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
      state at all, plus full FPU users that save/restore directly with no laziness
      whatsoever.
      
      But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
      this flag has become a simple flag that shows whether the FPU context saving
      area in the thread struct is initialized and used, or not.
      
      Rename it to fpu::initialized to express this simplicity in the name as well.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a81bfc
    • I
      x86/fpu: Remove fpu__current_fpstate_write_begin/end() · 685c930d
      Ingo Molnar 提交于
      These functions are not used anymore, so remove them.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      685c930d
  12. 25 9月, 2017 2 次提交
    • V
      x86: Don't cast away the __user in __get_user_asm_u64() · 5ac751d9
      Ville Syrjälä 提交于
      Don't cast away the __user in __get_user_asm_u64() on x86-32.
      Prevents sparse getting upset.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20170912164000.13745-1-ville.syrjala@linux.intel.com
      5ac751d9
    • E
      x86/fpu: Reinitialize FPU registers if restoring FPU state fails · d5c8028b
      Eric Biggers 提交于
      Userspace can change the FPU state of a task using the ptrace() or
      rt_sigreturn() system calls.  Because reserved bits in the FPU state can
      cause the XRSTOR instruction to fail, the kernel has to carefully
      validate that no reserved bits or other invalid values are being set.
      
      Unfortunately, there have been bugs in this validation code.  For
      example, we were not checking that the 'xcomp_bv' field in the
      xstate_header was 0.  As-is, such bugs are exploitable to read the FPU
      registers of other processes on the system.  To do so, an attacker can
      create a task, assign to it an invalid FPU state, then spin in a loop
      and monitor the values of the FPU registers.  Because the task's FPU
      registers are not being restored, sometimes the FPU registers will have
      the values from another process.
      
      This is likely to continue to be a problem in the future because the
      validation done by the CPU instructions like XRSTOR is not immediately
      visible to kernel developers.  Nor will invalid FPU states ever be
      encountered during ordinary use --- they will only be seen during
      fuzzing or exploits.  There can even be reserved bits outside the
      xstate_header which are easy to forget about.  For example, the MXCSR
      register contains reserved bits, which were not validated by the
      KVM_SET_XSAVE ioctl until commit a575813b ("KVM: x86: Fix load
      damaged SSEx MXCSR register").
      
      Therefore, mitigate this class of vulnerability by restoring the FPU
      registers from init_fpstate if restoring from the task's state fails.
      
      We actually used to do this, but it was (perhaps unwisely) removed by
      commit 9ccc27a5 ("x86/fpu: Remove error return values from
      copy_kernel_to_*regs() functions").  This new patch is also a bit
      different.  First, it only clears the registers, not also the bad
      in-memory state; this is simpler and makes it easier to make the
      mitigation cover all callers of __copy_kernel_to_fpregs().  Second, it
      does the register clearing in an exception handler so that no extra
      instructions are added to context switches.  In fact, we *remove*
      instructions, since previously we were always zeroing the register
      containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d5c8028b
  13. 24 9月, 2017 16 次提交
    • A
      x86/fpu: Turn WARN_ON() in context switch into WARN_ON_FPU() · 03eaec81
      Andi Kleen 提交于
      copy_xregs_to_kernel checks if the alternatives have been already
      patched.
      
      This WARN_ON() is always executed in every context switch.
      
      All the other checks in fpu internal.h are WARN_ON_FPU(), but
      this one is plain WARN_ON(). I assume it was forgotten to switch it.
      
      So switch it to WARN_ON_FPU() too to avoid some unnecessary code
      in the context switch, and a potentially expensive cache line miss for the
      global variable.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170329062605.4970-1-andi@firstfloor.org
      Link: http://lkml.kernel.org/r/20170923130016.21448-24-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      03eaec81
    • R
      x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs · 0852b374
      Rik van Riel 提交于
      On Skylake CPUs I noticed that XRSTOR is unable to deal with states
      created by copyout_from_xsaves() if the xstate has only SSE/YMM state, and
      no FP state. That is, xfeatures had XFEATURE_MASK_SSE set, but not
      XFEATURE_MASK_FP.
      
      The reason is that part of the SSE/YMM state lives in the MXCSR and
      MXCSR_FLAGS fields of the FP state.
      
      Ensure that whenever we copy SSE or YMM state around, the MXCSR and
      MXCSR_FLAGS fields are also copied around.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170210085445.0f1cc708@annuminas.surriel.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-22-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0852b374
    • I
      x86/fpu: Remove struct fpu::fpregs_active · 99dc26bd
      Ingo Molnar 提交于
      The previous changes paved the way for the removal of the
      fpu::fpregs_active state flag - we now only have the
      fpu::fpstate_active and fpu::last_cpu fields left.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-21-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      99dc26bd
    • I
      x86/fpu: Decouple fpregs_activate()/fpregs_deactivate() from fpu->fpregs_active · 6cf4edbe
      Ingo Molnar 提交于
      The fpregs_activate()/fpregs_deactivate() are currently called in such a pattern:
      
      	if (!fpu->fpregs_active)
      		fpregs_activate(fpu);
      
      	...
      
      	if (fpu->fpregs_active)
      		fpregs_deactivate(fpu);
      
      But note that it's actually safe to call them without checking the flag first.
      
      This further decouples the fpu->fpregs_active flag from actual FPU logic.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-20-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6cf4edbe
    • I
      x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active · f1c8cd01
      Ingo Molnar 提交于
      We want to simplify the FPU state machine by eliminating fpu->fpregs_active,
      and we can do that because the two state flags (::fpregs_active and
      ::fpstate_active) are set essentially together.
      
      The old lazy FPU switching code used to make a distinction - but there's
      no lazy switching code anymore, we always switch in an 'eager' fashion.
      
      Do this by first changing all substantial uses of fpu->fpregs_active
      to fpu->fpstate_active and adding a few debug checks to double check
      our assumption is correct.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1c8cd01
    • I
      x86/fpu: Simplify fpu->fpregs_active use · b3a16308
      Ingo Molnar 提交于
      The fpregs_active() inline function is pretty pointless - in almost
      all the callsites it can be replaced with a direct fpu->fpregs_active
      access.
      
      Do so and eliminate the extra layer of obfuscation.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3a16308
    • I
      x86/fpu: Flip the parameter order in copy_*_to_xstate() · 6d7f7da5
      Ingo Molnar 提交于
      Make it more consistent with regular memcpy() semantics, where the destination
      argument comes first.
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d7f7da5
    • I
      x86/fpu: Remove 'kbuf' parameter from the copy_user_to_xstate() API · 7b9094c6
      Ingo Molnar 提交于
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-14-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b9094c6
    • I
      x86/fpu: Remove 'ubuf' parameter from the copy_kernel_to_xstate() API · 59dffa4e
      Ingo Molnar 提交于
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-13-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59dffa4e
    • I
      x86/fpu: Split copy_user_to_xstate() into copy_kernel_to_xstate() & copy_user_to_xstate() · 79fecc2b
      Ingo Molnar 提交于
      Similar to:
      
        x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user()
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-12-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79fecc2b
    • I
      x86/fpu: Clarify parameter names in the copy_xstate_to_*() methods · 56583c9a
      Ingo Molnar 提交于
      Right now there's a confusing mixture of 'offset' and 'size' parameters:
      
       - __copy_xstate_to_*() input parameter 'end_pos' not not really an offset,
         but the full size of the copy to be performed.
      
       - input parameter 'count' to copy_xstate_to_*() shadows that of
         __copy_xstate_to_*()'s 'count' parameter name - but the roles
         are different: the first one is the total number of bytes to
         be copied, while the second one is a partial copy size.
      
      To unconfuse all this, use a consistent set of parameter names:
      
       - 'size' is the partial copy size within a single xstate component
       - 'size_total' is the total copy requested
       - 'offset_start' is the requested starting offset.
       - 'offset' is the offset within an xstate component.
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      56583c9a
    • I
      x86/fpu: Clean up parameter order in the copy_xstate_to_*() APIs · d7eda6c9
      Ingo Molnar 提交于
      Parameter ordering is weird:
      
        int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void *kbuf, struct xregs_state *xsave);
        int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user *ubuf, struct xregs_state *xsave);
      
      'pos' and 'count', which are attributes of the destination buffer, are listed before the destination
      buffer itself ...
      
      List them after the primary arguments instead.
      
      This makes the code more similar to regular memcpy() variant APIs.
      
      No change in functionality.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d7eda6c9
    • I
      x86/fpu: Remove 'kbuf' parameter from the copy_xstate_to_user() APIs · a69c158f
      Ingo Molnar 提交于
      The 'kbuf' parameter is unused in the _user() side of the API, remove it.
      
      This simplifies the code and makes it easier to think about.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a69c158f
    • I
      x86/fpu: Remove 'ubuf' parameter from the copy_xstate_to_kernel() APIs · 4d981cf2
      Ingo Molnar 提交于
      The 'ubuf' parameter is unused in the _kernel() side of the API, remove it.
      
      This simplifies the code and makes it easier to think about.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d981cf2
    • I
      x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user() · f0d4f30a
      Ingo Molnar 提交于
      copy_xstate_to_user() is a weird API - in part due to a bad API inherited
      from the regset APIs.
      
      But don't propagate that bad API choice into the FPU code - so as a first
      step split the API into kernel and user buffer handling routines.
      
      (Also split the xstate_copyout() internal helper.)
      
      The split API is a dumb duplication that should be obviously correct, the
      real splitting will be done in the next patch.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0d4f30a
    • I
      x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to... · 656f0831
      Ingo Molnar 提交于
      x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to copy_user_to_xstate()/copy_xstate_to_user()
      
      The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code
      uses, which is:
      
       copy_fpregs_to_fpstate()
       copy_fpstate_to_sigframe()
       copy_fregs_to_user()
       copy_fxregs_to_kernel()
       copy_fxregs_to_user()
       copy_kernel_to_fpregs()
       copy_kernel_to_fregs()
       copy_kernel_to_fxregs()
       copy_kernel_to_xregs()
       copy_user_to_fregs()
       copy_user_to_fxregs()
       copy_user_to_xregs()
       copy_xregs_to_kernel()
       copy_xregs_to_user()
      
      I.e. according to this pattern, the following rename should be done:
      
        copyin_to_xsaves()    -> copy_user_to_xstate()
        copyout_from_xsaves() -> copy_xstate_to_user()
      
      or, if we want to be pedantic, denote that that the user-space format is ptrace:
      
        copyin_to_xsaves()    -> copy_user_ptrace_to_xstate()
        copyout_from_xsaves() -> copy_xstate_to_user_ptrace()
      
      But I'd suggest the shorter, non-pedantic name.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      656f0831
  14. 23 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf 提交于
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5caf621
  15. 22 9月, 2017 1 次提交
  16. 18 9月, 2017 2 次提交
    • A
      x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code · 52a2af40
      Andy Lutomirski 提交于
      Putting the logical ASID into CR3's PCID bits directly means that we
      have two cases to consider separately: ASID == 0 and ASID != 0.
      This means that bugs that only hit in one of these cases trigger
      nondeterministically.
      
      There were some bugs like this in the past, and I think there's
      still one in current kernels.  In particular, we have a number of
      ASID-unware code paths that save CR3, write some special value, and
      then restore CR3.  This includes suspend/resume, hibernate, kexec,
      EFI, and maybe other things I've missed.  This is currently
      dangerous: if ASID != 0, then this code sequence will leave garbage
      in the TLB tagged for ASID 0.  We could potentially see corruption
      when switching back to ASID 0.  In principle, an
      initialize_tlbstate_and_flush() call after these sequences would
      solve the problem, but EFI, at least, does not call this.  (And it
      probably shouldn't -- initialize_tlbstate_and_flush() is rather
      expensive.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      52a2af40
    • A
      x86/mm: Factor out CR3-building code · 47061a24
      Andy Lutomirski 提交于
      Current, the code that assembles a value to load into CR3 is
      open-coded everywhere.  Factor it out into helpers build_cr3() and
      build_cr3_noflush().
      
      This makes one semantic change: __get_current_cr3_fast() was wrong
      on SME systems.  No one noticed because the only caller is in the
      VMX code, and there are no CPUs with both SME and VMX.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
      Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47061a24
  17. 14 9月, 2017 1 次提交