1. 24 5月, 2016 1 次提交
  2. 19 5月, 2016 1 次提交
    • J
      x86/entry/64: Fix stack return address retrieval in thunk · d4bf7078
      Josh Poimboeuf 提交于
      With CONFIG_FRAME_POINTER enabled, a thunk can pass a bad return address
      value to the called function.  '9*8(%rsp)' actually gets the frame
      pointer, not the return address.
      
      The only users of the 'put_ret_addr_in_rdi' option are two functions
      which trace the enabling and disabling of interrupts, so this bug can
      result in bad debug or tracing information with CONFIG_IRQSOFF_TRACER or
      CONFIG_PROVE_LOCKING.
      
      Fix this by implementing the suggestion of Linus: explicitly push
      the frame pointer all the time and constify the stack offsets that
      way. This is both correct and easier to read.
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      [ Extended the changelog a bit. ]
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 058fb732 ("x86/asm/entry: Create stack frames in thunk functions")
      Link: http://lkml.kernel.org/r/20160517180606.v5o7wcgdni7443ol@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4bf7078
  3. 12 5月, 2016 1 次提交
  4. 05 5月, 2016 2 次提交
  5. 03 5月, 2016 1 次提交
    • D
      x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs · 778843f9
      Denys Vlasenko 提交于
      Use of a temporary R8 register here seems to be unnecessary.
      
      "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
      "push $0" is two-byte too. It seems just using the latter would be
      no worse.
      
      Thus, code had an unnecessary "xorq %r8,%r8" insn.
      It probably costs nothing in execution time here since we are probably
      limited by store bandwidth at this point, but still.
      
      Run-tested under QEMU: 32-bit calls still work:
      
       / # ./test_syscall_vdso32
       [RUN]	Executing 6-argument 32-bit syscall via VDSO
       [OK]	Arguments are preserved across syscall
       [NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
       [OK]	R8..R15 did not leak kernel data
       [RUN]	Executing 6-argument 32-bit syscall via INT 80
       [OK]	Arguments are preserved across syscall
       [OK]	R8..R15 did not leak kernel data
       [RUN]	Running tests under ptrace
       [RUN]	Executing 6-argument 32-bit syscall via VDSO
       [OK]	Arguments are preserved across syscall
       [NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
       [OK]	R8..R15 did not leak kernel data
       [RUN]	Executing 6-argument 32-bit syscall via INT 80
       [OK]	Arguments are preserved across syscall
       [OK]	R8..R15 did not leak kernel data
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      778843f9
  6. 29 4月, 2016 1 次提交
  7. 20 4月, 2016 1 次提交
  8. 19 4月, 2016 1 次提交
  9. 13 4月, 2016 4 次提交
  10. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  11. 10 3月, 2016 9 次提交
    • A
      x86/entry: Call enter_from_user_mode() with IRQs off · 9999c8c0
      Andy Lutomirski 提交于
      Now that slow-path syscalls always enter C before enabling
      interrupts, it's straightforward to call enter_from_user_mode() before
      enabling interrupts rather than doing it as part of entry tracing.
      
      With this change, we should finally be able to retire exception_enter().
      
      This will also enable optimizations based on knowing that we never
      change context tracking state with interrupts on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9999c8c0
    • A
      x86/entry/32: Change INT80 to be an interrupt gate · a798f091
      Andy Lutomirski 提交于
      We want all of the syscall entries to run with interrupts off so that
      we can efficiently run context tracking before enabling interrupts.
      
      This will regress int $0x80 performance on 32-bit kernels by a
      couple of cycles.  This shouldn't matter much -- int $0x80 is not a
      fast path.
      
      This effectively reverts:
      
        657c1eea ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on")
      
      ... and fixes the same issue differently.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a798f091
    • A
      x86/entry: Improve system call entry comments · fda57b22
      Andy Lutomirski 提交于
      Ingo suggested that the comments should explain when the various
      entries are used.  This adds these explanations and improves other
      parts of the comments.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9524ecef7a295347294300045d08354d6a57c6e7.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fda57b22
    • A
      x86/entry: Remove TIF_SINGLESTEP entry work · 392a6254
      Andy Lutomirski 提交于
      Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
      regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
      entry code.  Remove it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      392a6254
    • A
      x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup · 7536656f
      Andy Lutomirski 提交于
      Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
      so the exception handler is invoked on the temporary SYSENTER stack.
      
      Because the SYSENTER stack is very small, we have a fixup to switch
      off the stack quickly when this happens.  The old fixup had several issues:
      
       1. It checked the interrupt frame's CS and EIP.  This wasn't
          obviously correct on Xen or if vm86 mode was in use [1].
      
       2. In the NMI handler, it did some frightening digging into the
          stack frame.  I'm not convinced this digging was correct.
      
       3. The fixup didn't switch stacks and then switch back.  Instead, it
          synthesized a brand new stack frame that would redirect the IRET
          back to the SYSENTER code.  That frame was highly questionable.
          For one thing, if NMI nested inside #DB, we would effectively
          abort the #DB prologue, which was probably safe but was
          frightening.  For another, the code used PUSHFL to write the
          FLAGS portion of the frame, which was simply bogus -- by the time
          PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
          flags were clobbered.
      
      Simplify this considerably.  Instead of looking at the saved frame
      to see where we came from, check the hardware ESP register against
      the SYSENTER stack directly.  Malicious user code cannot spoof the
      kernel ESP register, and by moving the check after SAVE_ALL, we can
      use normal PER_CPU accesses to find all the relevant addresses.
      
      With this patch applied, the improved syscall_nt_32 test finally
      passes on 32-bit kernels.
      
      [1] It isn't obviously correct, but it is nonetheless safe from vm86
          shenanigans as far as I can tell.  A user can't point EIP at
          entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
          like all kernel addresses, is greater than 0xffff and would thus
          violate the CS segment limit.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7536656f
    • A
      x86/entry: Vastly simplify SYSENTER TF (single-step) handling · f2b37575
      Andy Lutomirski 提交于
      Due to a blatant design error, SYSENTER doesn't clear TF (single-step).
      
      As a result, if a user does SYSENTER with TF set, we will single-step
      through the kernel until something clears TF.  There is absolutely
      nothing we can do to prevent this short of turning off SYSENTER [1].
      
      Simplify the handling considerably with two changes:
      
        1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
           add TF to that list of flags to sanitize with no overhead whatsoever.
      
        2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.
      
      That's all we need to do.
      
      Don't get too excited -- our handling is still buggy on 32-bit
      kernels.  There's nothing wrong with the SYSENTER code itself, but
      the #DB prologue has a clever fixup for traps on the very first
      instruction of entry_SYSENTER_32, and the fixup doesn't work quite
      correctly.  The next two patches will fix that.
      
      [1] We could probably prevent it by forcing BTF on at all times and
          making sure we clear TF before any branches in the SYSENTER
          code.  Needless to say, this is a bad idea.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2b37575
    • A
      x86/entry/32: Restore FLAGS on SYSEXIT · c2c9b52f
      Andy Lutomirski 提交于
      We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.
      
      With this patch applied, native kernels should always honor
      task_pt_regs()->flags, which opens the door for some sys_iopl()
      cleanups.  I'll do those as a separate series, though, since getting
      it right will involve tweaking some paravirt ops.
      
      ( The short version is that, before this patch, sys_iopl(), invoked via
        SYSENTER, wasn't guaranteed to ever transfer the updated
        regs->flags, so sys_iopl() had to change the hardware flags register
        as well. )
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2c9b52f
    • A
      x86/entry/32: Filter NT and speed up AC filtering in SYSENTER · 67f590e8
      Andy Lutomirski 提交于
      This makes the 32-bit code work just like the 64-bit code.  It should
      speed up syscalls on 32-bit kernels on Skylake by something like 20
      cycles (by analogy to the 64-bit compat case).
      
      It also cleans up NT just like we do for the 64-bit case.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67f590e8
    • A
      x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test · e7860411
      Andy Lutomirski 提交于
      CLAC is slow, and the SYSENTER code already has an unlikely path
      that runs if unusual flags are set.  Drop the CLAC and instead rely
      on the unlikely path to clear AC.
      
      This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
      make this faster please!)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e7860411
  12. 08 3月, 2016 1 次提交
    • A
      x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled · 58a5aac5
      Andy Lutomirski 提交于
      x86_64 has very clean espfix handling on paravirt: espfix64 is set
      up in native_iret, so paravirt systems that override iret bypass
      espfix64 automatically.  This is robust and straightforward.
      
      x86_32 is messier.  espfix is set up before the IRET paravirt patch
      point, so it can't be directly conditionalized on whether we use
      native_iret.  We also can't easily move it into native_iret without
      regressing performance due to a bizarre consideration.  Specifically,
      on 64-bit kernels, the logic is:
      
        if (regs->ss & 0x4)
                setup_espfix;
      
      On 32-bit kernels, the logic is:
      
        if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
            (regs->flags & X86_EFLAGS_VM) == 0)
                setup_espfix;
      
      The performance of setup_espfix itself is essentially irrelevant, but
      the comparison happens on every IRET so its performance matters.  On
      x86_64, there's no need for any registers except flags to implement
      the comparison, so we fold the whole thing into native_iret.  On
      x86_32, we don't do that because we need a free register to
      implement the comparison efficiently.  We therefore do espfix setup
      before restoring registers on x86_32.
      
      This patch gets rid of the explicit paravirt_enabled check by
      introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
      to skip espfix on paravirt systems where iret != native_iret.  This is
      also messy, but it's at least in line with other things we do.
      
      This improves espfix performance by removing a branch, but no one
      cares.  More importantly, it removes a paravirt_enabled user, which is
      good because paravirt_enabled is ill-defined and is going away.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: konrad.wilk@oracle.com
      Cc: lguest@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      58a5aac5
  13. 05 3月, 2016 1 次提交
  14. 29 2月, 2016 1 次提交
    • J
      objtool: Mark non-standard object files and directories · c0dd6716
      Josh Poimboeuf 提交于
      Code which runs outside the kernel's normal mode of operation often does
      unusual things which can cause a static analysis tool like objtool to
      emit false positive warnings:
      
       - boot image
       - vdso image
       - relocation
       - realmode
       - efi
       - head
       - purgatory
       - modpost
      
      Set OBJECT_FILES_NON_STANDARD for their related files and directories,
      which will tell objtool to skip checking them.  It's ok to skip them
      because they don't affect runtime stack traces.
      
      Also skip the following code which does the right thing with respect to
      frame pointers, but is too "special" to be validated by a tool:
      
       - entry
       - mcount
      
      Also skip the test_nx module because it modifies its exception handling
      table at runtime, which objtool can't understand.  Fortunately it's
      just a test module so it doesn't matter much.
      
      Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it
      might eventually be useful for other tools.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c0dd6716
  15. 25 2月, 2016 1 次提交
  16. 24 2月, 2016 3 次提交
  17. 22 2月, 2016 1 次提交
    • K
      x86/vdso: Mark the vDSO code read-only after init · 018ef8dc
      Kees Cook 提交于
      The vDSO does not need to be writable after __init, so mark it as
      __ro_after_init. The result kills the exploit method of writing to the
      vDSO from kernel space resulting in userspace executing the modified code,
      as shown here to bypass SMEP restrictions: http://itszn.com/blog/?p=21
      
      The memory map (with added vDSO address reporting) shows the vDSO moving
      into read-only memory:
      
      Before:
      	[    0.143067] vDSO @ ffffffff82004000
      	[    0.143551] vDSO @ ffffffff82006000
      	---[ High Kernel Mapping ]---
      	0xffffffff80000000-0xffffffff81000000      16M                         pmd
      	0xffffffff81000000-0xffffffff81800000       8M   ro     PSE     GLB x  pmd
      	0xffffffff81800000-0xffffffff819f3000    1996K   ro             GLB x  pte
      	0xffffffff819f3000-0xffffffff81a00000      52K   ro                 NX pte
      	0xffffffff81a00000-0xffffffff81e00000       4M   ro     PSE     GLB NX pmd
      	0xffffffff81e00000-0xffffffff81e05000      20K   ro             GLB NX pte
      	0xffffffff81e05000-0xffffffff82000000    2028K   ro                 NX pte
      	0xffffffff82000000-0xffffffff8214f000    1340K   RW             GLB NX pte
      	0xffffffff8214f000-0xffffffff82281000    1224K   RW                 NX pte
      	0xffffffff82281000-0xffffffff82400000    1532K   RW             GLB NX pte
      	0xffffffff82400000-0xffffffff83200000      14M   RW     PSE     GLB NX pmd
      	0xffffffff83200000-0xffffffffc0000000     974M                         pmd
      
      After:
      	[    0.145062] vDSO @ ffffffff81da1000
      	[    0.146057] vDSO @ ffffffff81da4000
      	---[ High Kernel Mapping ]---
      	0xffffffff80000000-0xffffffff81000000      16M                         pmd
      	0xffffffff81000000-0xffffffff81800000       8M   ro     PSE     GLB x  pmd
      	0xffffffff81800000-0xffffffff819f3000    1996K   ro             GLB x  pte
      	0xffffffff819f3000-0xffffffff81a00000      52K   ro                 NX pte
      	0xffffffff81a00000-0xffffffff81e00000       4M   ro     PSE     GLB NX pmd
      	0xffffffff81e00000-0xffffffff81e0b000      44K   ro             GLB NX pte
      	0xffffffff81e0b000-0xffffffff82000000    2004K   ro                 NX pte
      	0xffffffff82000000-0xffffffff8214c000    1328K   RW             GLB NX pte
      	0xffffffff8214c000-0xffffffff8227e000    1224K   RW                 NX pte
      	0xffffffff8227e000-0xffffffff82400000    1544K   RW             GLB NX pte
      	0xffffffff82400000-0xffffffff83200000      14M   RW     PSE     GLB NX pmd
      	0xffffffff83200000-0xffffffffc0000000     974M                         pmd
      
      Based on work by PaX Team and Brad Spengler.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1455748879-21872-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      018ef8dc
  18. 17 2月, 2016 1 次提交
  19. 01 2月, 2016 3 次提交
  20. 30 1月, 2016 2 次提交
  21. 29 1月, 2016 3 次提交