1. 10 3月, 2016 11 次提交
    • A
      x86/entry: Improve system call entry comments · fda57b22
      Andy Lutomirski 提交于
      Ingo suggested that the comments should explain when the various
      entries are used.  This adds these explanations and improves other
      parts of the comments.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9524ecef7a295347294300045d08354d6a57c6e7.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fda57b22
    • A
      x86/entry: Remove TIF_SINGLESTEP entry work · 392a6254
      Andy Lutomirski 提交于
      Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
      regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
      entry code.  Remove it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      392a6254
    • A
      x86/entry/32: Add and check a stack canary for the SYSENTER stack · 2a41aa4f
      Andy Lutomirski 提交于
      The first instruction of the SYSENTER entry runs on its own tiny
      stack.  That stack can be used if a #DB or NMI is delivered before
      the SYSENTER prologue switches to a real stack.
      
      We have code in place to prevent us from overflowing the tiny stack.
      For added paranoia, add a canary to the stack and check it in
      do_debug() -- that way, if something goes wrong with the #DB logic,
      we'll eventually notice.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a41aa4f
    • A
      x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup · 7536656f
      Andy Lutomirski 提交于
      Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
      so the exception handler is invoked on the temporary SYSENTER stack.
      
      Because the SYSENTER stack is very small, we have a fixup to switch
      off the stack quickly when this happens.  The old fixup had several issues:
      
       1. It checked the interrupt frame's CS and EIP.  This wasn't
          obviously correct on Xen or if vm86 mode was in use [1].
      
       2. In the NMI handler, it did some frightening digging into the
          stack frame.  I'm not convinced this digging was correct.
      
       3. The fixup didn't switch stacks and then switch back.  Instead, it
          synthesized a brand new stack frame that would redirect the IRET
          back to the SYSENTER code.  That frame was highly questionable.
          For one thing, if NMI nested inside #DB, we would effectively
          abort the #DB prologue, which was probably safe but was
          frightening.  For another, the code used PUSHFL to write the
          FLAGS portion of the frame, which was simply bogus -- by the time
          PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
          flags were clobbered.
      
      Simplify this considerably.  Instead of looking at the saved frame
      to see where we came from, check the hardware ESP register against
      the SYSENTER stack directly.  Malicious user code cannot spoof the
      kernel ESP register, and by moving the check after SAVE_ALL, we can
      use normal PER_CPU accesses to find all the relevant addresses.
      
      With this patch applied, the improved syscall_nt_32 test finally
      passes on 32-bit kernels.
      
      [1] It isn't obviously correct, but it is nonetheless safe from vm86
          shenanigans as far as I can tell.  A user can't point EIP at
          entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
          like all kernel addresses, is greater than 0xffff and would thus
          violate the CS segment limit.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7536656f
    • A
      x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed · 6dcc9414
      Andy Lutomirski 提交于
      The SYSENTER stack is only used on 32-bit kernels.  Remove it on 64-bit kernels.
      
      ( We may end up using it down the road on 64-bit kernels. If so,
        we'll re-enable it for CONFIG_IA32_EMULATION. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9dbd18429f9ff61a76b6eda97a9ea20510b9f6ba.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6dcc9414
    • A
      x86/entry: Vastly simplify SYSENTER TF (single-step) handling · f2b37575
      Andy Lutomirski 提交于
      Due to a blatant design error, SYSENTER doesn't clear TF (single-step).
      
      As a result, if a user does SYSENTER with TF set, we will single-step
      through the kernel until something clears TF.  There is absolutely
      nothing we can do to prevent this short of turning off SYSENTER [1].
      
      Simplify the handling considerably with two changes:
      
        1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
           add TF to that list of flags to sanitize with no overhead whatsoever.
      
        2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.
      
      That's all we need to do.
      
      Don't get too excited -- our handling is still buggy on 32-bit
      kernels.  There's nothing wrong with the SYSENTER code itself, but
      the #DB prologue has a clever fixup for traps on the very first
      instruction of entry_SYSENTER_32, and the fixup doesn't work quite
      correctly.  The next two patches will fix that.
      
      [1] We could probably prevent it by forcing BTF on at all times and
          making sure we clear TF before any branches in the SYSENTER
          code.  Needless to say, this is a bad idea.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2b37575
    • A
      x86/entry/traps: Clear DR6 early in do_debug() and improve the comment · 8bb56436
      Andy Lutomirski 提交于
      Leaving any bits set in DR6 on return from a debug exception is
      asking for trouble.  Prevent it by writing zero right away and
      clarify the comment.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3857676e1be8fb27db4b89bbb1e2052b7f435ff4.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8bb56436
    • A
      x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions · 81edd9f6
      Andy Lutomirski 提交于
      The SDM says that debug exceptions clear BTF, and we need to keep
      TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
      the comment.
      
      I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
      to be cleared was just an oversight.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81edd9f6
    • A
      x86/entry/32: Restore FLAGS on SYSEXIT · c2c9b52f
      Andy Lutomirski 提交于
      We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.
      
      With this patch applied, native kernels should always honor
      task_pt_regs()->flags, which opens the door for some sys_iopl()
      cleanups.  I'll do those as a separate series, though, since getting
      it right will involve tweaking some paravirt ops.
      
      ( The short version is that, before this patch, sys_iopl(), invoked via
        SYSENTER, wasn't guaranteed to ever transfer the updated
        regs->flags, so sys_iopl() had to change the hardware flags register
        as well. )
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2c9b52f
    • A
      x86/entry/32: Filter NT and speed up AC filtering in SYSENTER · 67f590e8
      Andy Lutomirski 提交于
      This makes the 32-bit code work just like the 64-bit code.  It should
      speed up syscalls on 32-bit kernels on Skylake by something like 20
      cycles (by analogy to the 64-bit compat case).
      
      It also cleans up NT just like we do for the 64-bit case.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67f590e8
    • A
      x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test · e7860411
      Andy Lutomirski 提交于
      CLAC is slow, and the SYSENTER code already has an unlikely path
      that runs if unusual flags are set.  Drop the CLAC and instead rely
      on the unlikely path to clear AC.
      
      This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
      make this faster please!)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e7860411
  2. 08 3月, 2016 2 次提交
    • A
      x86/asm-offsets: Remove PARAVIRT_enabled · 0dd0036f
      Andy Lutomirski 提交于
      It no longer has any users.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: konrad.wilk@oracle.com
      Cc: lguest@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0dd0036f
    • A
      x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled · 58a5aac5
      Andy Lutomirski 提交于
      x86_64 has very clean espfix handling on paravirt: espfix64 is set
      up in native_iret, so paravirt systems that override iret bypass
      espfix64 automatically.  This is robust and straightforward.
      
      x86_32 is messier.  espfix is set up before the IRET paravirt patch
      point, so it can't be directly conditionalized on whether we use
      native_iret.  We also can't easily move it into native_iret without
      regressing performance due to a bizarre consideration.  Specifically,
      on 64-bit kernels, the logic is:
      
        if (regs->ss & 0x4)
                setup_espfix;
      
      On 32-bit kernels, the logic is:
      
        if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
            (regs->flags & X86_EFLAGS_VM) == 0)
                setup_espfix;
      
      The performance of setup_espfix itself is essentially irrelevant, but
      the comparison happens on every IRET so its performance matters.  On
      x86_64, there's no need for any registers except flags to implement
      the comparison, so we fold the whole thing into native_iret.  On
      x86_32, we don't do that because we need a free register to
      implement the comparison efficiently.  We therefore do espfix setup
      before restoring registers on x86_32.
      
      This patch gets rid of the explicit paravirt_enabled check by
      introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
      to skip espfix on paravirt systems where iret != native_iret.  This is
      also messy, but it's at least in line with other things we do.
      
      This improves espfix performance by removing a branch, but no one
      cares.  More importantly, it removes a paravirt_enabled user, which is
      good because paravirt_enabled is ill-defined and is going away.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: konrad.wilk@oracle.com
      Cc: lguest@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      58a5aac5
  3. 06 3月, 2016 1 次提交
  4. 03 3月, 2016 1 次提交
  5. 02 3月, 2016 1 次提交
  6. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  7. 27 2月, 2016 2 次提交
  8. 26 2月, 2016 2 次提交
  9. 25 2月, 2016 2 次提交
    • M
      KVM: x86: MMU: fix ubsan index-out-of-range warning · 17e4bce0
      Mike Krinkin 提交于
      Ubsan reports the following warning due to a typo in
      update_accessed_dirty_bits template, the patch fixes
      the typo:
      
      [  168.791851] ================================================================================
      [  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
      [  168.791866] index 4 is out of range for type 'u64 [4]'
      [  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
      [  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
      [  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
      [  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
      [  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
      [  168.791891] Call Trace:
      [  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
      [  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
      [  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
      [  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
      [  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
      [  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
      [  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
      [  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
      [  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
      [  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
      [  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
      [  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
      [  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
      [  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
      [  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
      [  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
      [  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
      [  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
      [  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
      [  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
      [  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
      [  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
      [  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
      [  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
      [  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
      [  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
      [  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
      [  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
      [  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
      [  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
      [  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
      [  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
      [  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
      [  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
      [  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
      [  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
      [  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
      [  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
      [  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
      [  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
      [  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
      [  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
      [  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
      [  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
      [  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
      [  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
      [  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  168.792780] ================================================================================
      Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e4bce0
    • A
      x86/entry/compat: Add missing CLAC to entry_INT80_32 · 3d44d51b
      Andy Lutomirski 提交于
      This doesn't seem to fix a regression -- I don't think the CLAC was
      ever there.
      
      I double-checked in a debugger: entries through the int80 gate do
      not automatically clear AC.
      
      Stable maintainers: I can provide a backport to 4.3 and earlier if
      needed.  This needs to be backported all the way to 3.10.
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org> # v3.10 and later
      Fixes: 63bcff2a ("x86, smap: Add STAC and CLAC instructions to control user space access")
      Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d44d51b
  10. 24 2月, 2016 4 次提交
    • P
      KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4
      Paolo Bonzini 提交于
      Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
      2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
      case actually could modify "la", but after the patch this change is
      not reflected in *linear.
      
      The bug is visible whenever a non-zero segment base causes the linear
      address to wrap around the 4GB mark.
      
      Fixes: e8dd2d2d
      Cc: stable@vger.kernel.org
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1d77f4
    • P
      KVM: x86: fix missed hardware breakpoints · 172b2386
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in two threads.  To see the bug, run a 2-VCPU guest with
      "taskset -c 0" and run "./bp 0 1" inside the guest.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      172b2386
    • A
      x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32 · 04d1d281
      Andy Lutomirski 提交于
      Both before and after 5f310f73 ("x86/entry/32: Re-implement
      SYSENTER using the new C path"), we relied on a uaccess very early
      in the SYSENTER path to clear AC.  After that change, though, we can
      potentially make it all the way into C code with AC set, which
      enlarges the attack surface for SMAP bypass by doing SYSENTER with
      AC set.
      
      Strengthen the SMAP protection by addding the missing ASM_CLAC right
      at the beginning.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3e36be110724896e32a4a1fe73bacb349d3cba94.1456262295.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04d1d281
    • L
      x86: fix SMAP in 32-bit environments · de9e478b
      Linus Torvalds 提交于
      In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space
      accesses") I changed how the stac/clac instructions were generated
      around the user space accesses, which then made it possible to do
      batched accesses efficiently for user string copies etc.
      
      However, in doing so, I completely spaced out, and didn't even think
      about the 32-bit case.  And nobody really even seemed to notice, because
      SMAP doesn't even exist until modern Skylake processors, and you'd have
      to be crazy to run 32-bit kernels on a modern CPU.
      
      Which brings us to Andy Lutomirski.
      
      He actually tested the 32-bit kernel on new hardware, and noticed that
      it doesn't work.  My bad.  The trivial fix is to add the required
      uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
      
      I feel a bit bad about this patch, just because that header file really
      should be cleaned up to avoid all the duplicated code in it, and this
      commit just expands on the problem.  But this just fixes the bug without
      any bigger cleanup surgery.
      Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de9e478b
  11. 23 2月, 2016 1 次提交
  12. 19 2月, 2016 1 次提交
  13. 18 2月, 2016 4 次提交
    • T
      x86/cpufeature: Create a new synthetic cpu capability for machine check recovery · 0f68c088
      Tony Luck 提交于
      The Intel Software Developer Manual describes bit 24 in the MCG_CAP
      MSR:
      
         MCG_SER_P (software error recovery support present) flag,
         bit 24 — Indicates (when set) that the processor supports
         software error recovery
      
      But only some models with this capability bit set will actually
      generate recoverable machine checks.
      
      Check the model name and set a synthetic capability bit. Provide
      a command line option to set this bit anyway in case the kernel
      doesn't recognise the model name.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0f68c088
    • T
      x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b
      Toshi Kani 提交于
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: NHenning Schild <henning.schild@siemens.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4eafd8b
    • B
      Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" · 67b4eab9
      Bjorn Helgaas 提交于
      Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and
      pci_dev->irq_managed").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      67b4eab9
    • B
      Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" · fe25d078
      Bjorn Helgaas 提交于
      Revert 8affb487 ("x86/PCI: Don't alloc pcibios-irq when MSI is
      enabled").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      CC: Joerg Roedel <jroedel@suse.de>
      fe25d078
  14. 17 2月, 2016 7 次提交
    • A
      x86/entry/compat: Keep TS_COMPAT set during signal delivery · 4e79e182
      Andy Lutomirski 提交于
      Signal delivery needs to know the sign of an interrupted syscall's
      return value in order to detect -ERESTART variants.  Normally this
      works independently of bitness because syscalls internally return
      long.  Under ptrace, however, this can break, and syscall_get_error
      is supposed to sign-extend regs->ax if needed.
      
      We were clearing TS_COMPAT too early, though, and this prevented
      sign extension, which subtly broke syscall restart under ptrace.
      Reported-by: NRobert O'Callahan <robert@ocallahan.org>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 4.3.x-
      Fixes: c5c46f59 ("x86/entry: Add new, comprehensible entry and exit handlers written in C")
      Link: http://lkml.kernel.org/r/cbce3cf545522f64eb37f5478cb59746230db3b5.1455142412.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e79e182
    • M
      hpet: Drop stale URLs · 4e7f9df2
      Michael S. Tsirkin 提交于
      Looks like the HPET spec at intel.com got moved.
      It isn't hard to find so drop the link, just mention
      the revision assumed.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e7f9df2
    • T
      x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() · a82eee74
      Toshi Kani 提交于
      Data corruption issues were observed in tests which initiated
      a system crash/reset while accessing BTT devices.  This problem
      is reproducible.
      
      The BTT driver calls pmem_rw_bytes() to update data in pmem
      devices.  This interface calls __copy_user_nocache(), which
      uses non-temporal stores so that the stores to pmem are
      persistent.
      
      __copy_user_nocache() uses non-temporal stores when a request
      size is 8 bytes or larger (and is aligned by 8 bytes).  The
      BTT driver updates the BTT map table, which entry size is
      4 bytes.  Therefore, updates to the map table entries remain
      cached, and are not written to pmem after a crash.
      
      Change __copy_user_nocache() to use non-temporal store when
      a request size is 4 bytes.  The change extends the current
      byte-copy path for a less-than-8-bytes request, and does not
      add any overhead to the regular path.
      Reported-and-tested-by: NMicah Parrish <micah.parrish@hpe.com>
      Reported-and-tested-by: NBrian Boylston <brian.boylston@hpe.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a82eee74
    • T
      x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable · ee9737c9
      Toshi Kani 提交于
      Add comments to __copy_user_nocache() to clarify its procedures
      and alignment requirements.
      
      Also change numeric branch target labels to named local labels.
      
      No code changed:
      
       arch/x86/lib/copy_user_64.o:
      
          text    data     bss     dec     hex filename
          1239       0       0    1239     4d7 copy_user_64.o.before
          1239       0       0    1239     4d7 copy_user_64.o.after
      
       md5:
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brian.boylston@hpe.com
      Cc: dan.j.williams@intel.com
      Cc: linux-nvdimm@lists.01.org
      Cc: micah.parrish@hpe.com
      Cc: ross.zwisler@linux.intel.com
      Cc: vishal.l.verma@intel.com
      Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
      [ Small readability edits and added object file comparison. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9737c9
    • B
      x86/msr: Document msr-index.h rule for addition · 053080a9
      Borislav Petkov 提交于
      In order to keep this file's size sensible and not cause too much
      unnecessary churn, make the rule explicit - similar to pci_ids.h - that
      only MSRs which are used in multiple compilation units, should get added
      to it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alex.williamson@redhat.com
      Cc: gleb@kernel.org
      Cc: joro@8bytes.org
      Cc: kvm@vger.kernel.org
      Cc: sherry.hurwitz@amd.com
      Cc: wei@redhat.com
      Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      053080a9
    • B
      x86/ftrace, x86/asm: Kill ftrace_caller_end label · f1b92bb6
      Borislav Petkov 提交于
      One of ftrace_caller_end and ftrace_return is redundant so unify them.
      Rename ftrace_return to ftrace_epilogue to mean that everything after
      that label represents, like an afterword, work which happens *after* the
      ftrace call, e.g., the function graph tracer for one.
      
      Steve wants this to rather mean "[a]n event which reflects meaningfully
      on a recently ended conflict or struggle." I can imagine that ftrace can
      be a struggle sometimes.
      
      Anyway, beef up the comment about the code contents and layout before
      ftrace_epilogue label.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1455612202-14414-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1b92bb6
    • T
      perf/x86/amd/uncore: Plug reference leak · 8bc9162c
      Thomas Gleixner 提交于
      In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
      struct is freed, but the percpu pointer still references it. Set it to NULL.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8bc9162c