1. 07 10月, 2015 2 次提交
  2. 23 9月, 2015 2 次提交
  3. 21 9月, 2015 1 次提交
    • B
      x86/entry/vsyscall: Fix undefined symbol warning · 93f13a9f
      Borislav Petkov 提交于
      Commit:
      
        3dc33bd3 ("x86/entry/vsyscall: Add CONFIG to control default")
      
      did the ifdef/elif thing but GCC doesn't like that:
      
        arch/x86/entry/vsyscall/vsyscall_64.c:44:7: warning: "CONFIG_LEGACY_VSYSCALL_NONE" is not defined [-Wundef]
         #elif CONFIG_LEGACY_VSYSCALL_NONE
               ^
      
      Use defined() instead.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150921074829.GA3550@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      93f13a9f
  4. 20 9月, 2015 1 次提交
  5. 12 9月, 2015 1 次提交
    • M
      sys_membarrier(): system-wide memory barrier (generic, x86) · 5b25b13a
      Mathieu Desnoyers 提交于
      Here is an implementation of a new system call, sys_membarrier(), which
      executes a memory barrier on all threads running on the system.  It is
      implemented by calling synchronize_sched().  It can be used to
      distribute the cost of user-space memory barriers asymmetrically by
      transforming pairs of memory barriers into pairs consisting of
      sys_membarrier() and a compiler barrier.  For synchronization primitives
      that distinguish between read-side and write-side (e.g.  userspace RCU
      [1], rwlocks), the read-side can be accelerated significantly by moving
      the bulk of the memory barrier overhead to the write-side.
      
      The existing applications of which I am aware that would be improved by
      this system call are as follows:
      
      * Through Userspace RCU library (http://urcu.so)
        - DNS server (Knot DNS) https://www.knot-dns.cz/
        - Network sniffer (http://netsniff-ng.org/)
        - Distributed object storage (https://sheepdog.github.io/sheepdog/)
        - User-space tracing (http://lttng.org)
        - Network storage system (https://www.gluster.org/)
        - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
        - Financial software (https://lkml.org/lkml/2015/3/23/189)
      
      Those projects use RCU in userspace to increase read-side speed and
      scalability compared to locking.  Especially in the case of RCU used by
      libraries, sys_membarrier can speed up the read-side by moving the bulk of
      the memory barrier cost to synchronize_rcu().
      
      * Direct users of sys_membarrier
        - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
      
      Microsoft core dotnet GC developers are planning to use the mprotect()
      side-effect of issuing memory barriers through IPIs as a way to implement
      Windows FlushProcessWriteBuffers() on Linux.  They are referring to
      sys_membarrier in their github thread, specifically stating that
      sys_membarrier() is what they are looking for.
      
      To explain the benefit of this scheme, let's introduce two example threads:
      
      Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
      Thread B (frequent, e.g. executing liburcu
      rcu_read_lock()/rcu_read_unlock())
      
      In a scheme where all smp_mb() in thread A are ordering memory accesses
      with respect to smp_mb() present in Thread B, we can change each
      smp_mb() within Thread A into calls to sys_membarrier() and each
      smp_mb() within Thread B into compiler barriers "barrier()".
      
      Before the change, we had, for each smp_mb() pairs:
      
      Thread A                    Thread B
      previous mem accesses       previous mem accesses
      smp_mb()                    smp_mb()
      following mem accesses      following mem accesses
      
      After the change, these pairs become:
      
      Thread A                    Thread B
      prev mem accesses           prev mem accesses
      sys_membarrier()            barrier()
      follow mem accesses         follow mem accesses
      
      As we can see, there are two possible scenarios: either Thread B memory
      accesses do not happen concurrently with Thread A accesses (1), or they
      do (2).
      
      1) Non-concurrent Thread A vs Thread B accesses:
      
      Thread A                    Thread B
      prev mem accesses
      sys_membarrier()
      follow mem accesses
                                  prev mem accesses
                                  barrier()
                                  follow mem accesses
      
      In this case, thread B accesses will be weakly ordered. This is OK,
      because at that point, thread A is not particularly interested in
      ordering them with respect to its own accesses.
      
      2) Concurrent Thread A vs Thread B accesses
      
      Thread A                    Thread B
      prev mem accesses           prev mem accesses
      sys_membarrier()            barrier()
      follow mem accesses         follow mem accesses
      
      In this case, thread B accesses, which are ensured to be in program
      order thanks to the compiler barrier, will be "upgraded" to full
      smp_mb() by synchronize_sched().
      
      * Benchmarks
      
      On Intel Xeon E5405 (8 cores)
      (one thread is calling sys_membarrier, the other 7 threads are busy
      looping)
      
      1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
      
      * User-space user of this system call: Userspace RCU library
      
      Both the signal-based and the sys_membarrier userspace RCU schemes
      permit us to remove the memory barrier from the userspace RCU
      rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
      accelerating them. These memory barriers are replaced by compiler
      barriers on the read-side, and all matching memory barriers on the
      write-side are turned into an invocation of a memory barrier on all
      active threads in the process. By letting the kernel perform this
      synchronization rather than dumbly sending a signal to every process
      threads (as we currently do), we diminish the number of unnecessary wake
      ups and only issue the memory barriers on active threads. Non-running
      threads do not need to execute such barrier anyway, because these are
      implied by the scheduler context switches.
      
      Results in liburcu:
      
      Operations in 10s, 6 readers, 2 writers:
      
      memory barriers in reader:    1701557485 reads, 2202847 writes
      signal-based scheme:          9830061167 reads,    6700 writes
      sys_membarrier:               9952759104 reads,     425 writes
      sys_membarrier (dyn. check):  7970328887 reads,     425 writes
      
      The dynamic sys_membarrier availability check adds some overhead to
      the read-side compared to the signal-based scheme, but besides that,
      sys_membarrier slightly outperforms the signal-based scheme. However,
      this non-expedited sys_membarrier implementation has a much slower grace
      period than signal and memory barrier schemes.
      
      Besides diminishing the number of wake-ups, one major advantage of the
      membarrier system call over the signal-based scheme is that it does not
      need to reserve a signal. This plays much more nicely with libraries,
      and with processes injected into for tracing purposes, for which we
      cannot expect that signals will be unused by the application.
      
      An expedited version of this system call can be added later on to speed
      up the grace period. Its implementation will likely depend on reading
      the cpu_curr()->mm without holding each CPU's rq lock.
      
      This patch adds the system call to x86 and to asm-generic.
      
      [1] http://urcu.so
      
      membarrier(2) man page:
      
      MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)
      
      NAME
             membarrier - issue memory barriers on a set of threads
      
      SYNOPSIS
             #include <linux/membarrier.h>
      
             int membarrier(int cmd, int flags);
      
      DESCRIPTION
             The cmd argument is one of the following:
      
             MEMBARRIER_CMD_QUERY
                    Query  the  set  of  supported commands. It returns a bitmask of
                    supported commands.
      
             MEMBARRIER_CMD_SHARED
                    Execute a memory barrier on all threads running on  the  system.
                    Upon  return from system call, the caller thread is ensured that
                    all running threads have passed through a state where all memory
                    accesses  to  user-space  addresses  match program order between
                    entry to and return from the system  call  (non-running  threads
                    are de facto in such a state). This covers threads from all pro=E2=80=90
                    cesses running on the system.  This command returns 0.
      
             The flags argument needs to be 0. For future extensions.
      
             All memory accesses performed  in  program  order  from  each  targeted
             thread is guaranteed to be ordered with respect to sys_membarrier(). If
             we use the semantic "barrier()" to represent a compiler barrier forcing
             memory  accesses  to  be performed in program order across the barrier,
             and smp_mb() to represent explicit memory barriers forcing full  memory
             ordering  across  the barrier, we have the following ordering table for
             each pair of barrier(), sys_membarrier() and smp_mb():
      
             The pair ordering is detailed as (O: ordered, X: not ordered):
      
                                    barrier()   smp_mb() sys_membarrier()
                    barrier()          X           X            O
                    smp_mb()           X           O            O
                    sys_membarrier()   O           O            O
      
      RETURN VALUE
             On success, these system calls return zero.  On error, -1 is  returned,
             and errno is set appropriately. For a given command, with flags
             argument set to 0, this system call is guaranteed to always return the
             same value until reboot.
      
      ERRORS
             ENOSYS System call is not implemented.
      
             EINVAL Invalid arguments.
      
      Linux                             2015-04-15                     MEMBARRIER(2)
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nicholas Miell <nmiell@comcast.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b25b13a
  6. 11 9月, 2015 1 次提交
  7. 05 9月, 2015 1 次提交
  8. 14 8月, 2015 1 次提交
  9. 08 8月, 2015 1 次提交
    • A
      x86/vdso: Emit a GNU hash · 6b7e2654
      Andy Lutomirski 提交于
      Some dynamic loaders may be slightly faster if a GNU hash is
      available.  Strangely, this seems to have no effect at all on
      the vdso size.
      
      This is unlikely to have any measurable effect on the time it
      takes to resolve vdso symbols (since there are so few of them).
      In some contexts, it can be a win for a different reason: if
      every DSO has a GNU hash section, then libc can avoid
      calculating SysV hashes at all.  Both musl and glibc appear to
      have this optimization.
      
      It's plausible that this breaks some ancient glibc version.  If
      so, then, depending on what glibc versions break, we could
      either require COMPAT_VDSO for them or consider reverting.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Isaac Dunham <ibid.ag@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nathan Lynch <nathan_lynch@mentor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: musl@lists.openwall.com <musl@lists.openwall.com>
      Link: http://lkml.kernel.org/r/fd56cc057a2d62ab31c56a48d04fccb435b3fd4f.1438897382.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b7e2654
  10. 05 8月, 2015 3 次提交
  11. 31 7月, 2015 1 次提交
    • B
      x86/vm86: Use the normal pt_regs area for vm86 · 5ed92a8a
      Brian Gerst 提交于
      Change to use the normal pt_regs area to enter and exit vm86
      mode.  This is done by increasing the padding at the top of the
      stack to make room for the extra vm86 segment slots in the IRET
      frame.  It then saves the 32-bit regs in the off-stack vm86
      data, and copies in the vm86 regs.  Exiting back to 32-bit mode
      does the reverse.  This allows removing the hacks to jump
      directly into the exit asm code due to having to change the
      stack pointer.  Returning normally from the vm86 syscall and the
      exception handlers allows things like ptrace and auditing to work properly.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ed92a8a
  12. 24 7月, 2015 1 次提交
  13. 21 7月, 2015 1 次提交
    • A
      x86/entry/syscalls: Wire up 32-bit direct socket calls · 9dea5dc9
      Andy Lutomirski 提交于
      On x86_64, there's no socketcall syscall; instead all of the
      socket calls are real syscalls.  For 32-bit programs, we're
      stuck offering the socketcall syscall, but it would be nice to
      expose the direct calls as well.  This will enable seccomp to
      filter socket calls (for new userspace only, but that's fine for
      some applications) and it will provide a tiny performance boost.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexander Larsson <alexl@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Cosimo Cecchi <cosimo@endlessm.com>
      Cc: Dan Nicholson <nicholson@endlessm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
      Cc: libc-alpha <libc-alpha@sourceware.org>
      Link: http://lkml.kernel.org/r/cb5138299d37d5800e2d135b01a7667fa6115854.1436912629.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dea5dc9
  14. 17 7月, 2015 9 次提交
    • A
      x86/entry: Fix _TIF_USER_RETURN_NOTIFY check in prepare_exit_to_usermode · d132803e
      Andy Lutomirski 提交于
      Linus noticed that the early return check was missing
      _TIF_USER_RETURN_NOTIFY.  If the only work flag was
      _TIF_USER_RETURN_NOTIFY, we'd skip user return notifiers.  Fix
      it. (This is the only missing bit.)
      
      This fixes double faults on a KVM host.  It's the same issue as
      last time, except that this time it's very easy to trigger.
      Apparently no one uses -next as a KVM host.
      
      ( I'm still not quite sure what it is that KVM does that blows up
        so badly if we miss a user return notifier.  My best guess is that KVM
        lets KERNEL_GS_BASE (i.e. the user's gs base) be negative and fixes
        it up in a user return notifier.  If we actually end up in user mode
        with a negative gs base, we blow up pretty badly. )
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: c5c46f59 ("x86/entry: Add new, comprehensible entry and exit handlers written in C")
      Link: http://lkml.kernel.org/r/3f801104d24ee7a6bb1446408d9950777aa63277.1436995419.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d132803e
    • A
      x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code · a97439aa
      Andy Lutomirski 提交于
      It turns out to be rather tedious to test the NMI nesting code.
      Make it easier: add a new CONFIG_DEBUG_ENTRY option that causes
      the NMI handler to pre-emptively unmask NMIs.
      
      With this option set, errors in the repeat_nmi logic or failures
      to detect that we're in a nested NMI will result in quick panics
      under perf (especially if multiple counters are running at high
      frequency) instead of requiring an unusual workload that
      generates page faults or breakpoints inside NMIs.
      
      I called it CONFIG_DEBUG_ENTRY instead of CONFIG_DEBUG_NMI_ENTRY
      because I want to add new non-NMI checks elsewhere in the entry
      code in the future, and I'd rather not add too many new config
      options or add this option and then immediately rename it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97439aa
    • A
      x86/nmi/64: Make the "NMI executing" variable more consistent · 36f1a77b
      Andy Lutomirski 提交于
      Currently, "NMI executing" is one the first time an outermost
      NMI hits repeat_nmi and zero thereafter.  Change it to be zero
      each time for consistency.
      
      This is intended to help NMI handling fail harder if it's buggy.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      36f1a77b
    • A
      x86/nmi/64: Minor asm simplification · 23a781e9
      Andy Lutomirski 提交于
      Replace LEA; MOV with an equivalent SUB.  This saves one
      instruction.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      23a781e9
    • A
      x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · 810bc075
      Andy Lutomirski 提交于
      We have a tricky bug in the nested NMI code: if we see RSP
      pointing to the NMI stack on NMI entry from kernel mode, we
      assume that we are executing a nested NMI.
      
      This isn't quite true.  A malicious userspace program can point
      RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
      happen while RSP is still pointing at the NMI stack.
      
      Fix it with a sneaky trick.  Set DF in the region of code that
      the RSP check is intended to detect.  IRET will clear DF
      atomically.
      
      ( Note: other than paravirt, there's little need for all this
        complexity. We could check RIP instead of RSP. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      810bc075
    • A
      x86/nmi/64: Reorder nested NMI checks · a27507ca
      Andy Lutomirski 提交于
      Check the repeat_nmi .. end_repeat_nmi special case first.  The
      next patch will rework the RSP check and, as a side effect, the
      RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
      we'll need this ordering of the checks.
      
      Note: this is more subtle than it appears.  The check for
      repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
      instead of adjusting the "iret" frame to force a repeat.  This
      is necessary, because the code between repeat_nmi and
      end_repeat_nmi sets "NMI executing" and then writes to the
      "iret" frame itself.  If a nested NMI comes in and modifies the
      "iret" frame while repeat_nmi is also modifying it, we'll end up
      with garbage.  The old code got this right, as does the new
      code, but the new code is a bit more explicit.
      
      If we were to move the check right after the "NMI executing"
      check, then we'd get it wrong and have random crashes.
      
      ( Because the "NMI executing" check would jump to the code that would
        modify the "iret" frame without checking if the interrupted NMI was
        currently modifying it. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a27507ca
    • A
      x86/nmi/64: Improve nested NMI comments · 0b22930e
      Andy Lutomirski 提交于
      I found the nested NMI documentation to be difficult to follow.
      Improve the comments.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0b22930e
    • A
      x86/nmi/64: Switch stacks on userspace NMI entry · 9b6e6a83
      Andy Lutomirski 提交于
      Returning to userspace is tricky: IRET can fail, and ESPFIX can
      rearrange the stack prior to IRET.
      
      The NMI nesting fixup relies on a precise stack layout and
      atomic IRET.  Rather than trying to teach the NMI nesting fixup
      to handle ESPFIX and failed IRET, punt: run NMIs that came from
      user mode on the normal kernel stack.
      
      This will make some nested NMIs visible to C code, but the C
      code is okay with that.
      
      As a side effect, this should speed up perf: it eliminates an
      RDMSR when NMIs come from user mode.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9b6e6a83
    • A
      x86/nmi/64: Remove asm code that saves CR2 · 0e181bb5
      Andy Lutomirski 提交于
      Now that do_nmi saves CR2, we don't need to save it in asm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e181bb5
  15. 09 7月, 2015 1 次提交
  16. 07 7月, 2015 10 次提交
    • A
      x86/entry: Remove SCHEDULE_USER and asm/context-tracking.h · 06a7b36c
      Andy Lutomirski 提交于
      SCHEDULE_USER is no longer used, and asm/context-tracking.h
      contained nothing else.  Remove the header entirely.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/854e9b45f69af20e26c47099eb236321563ebcee.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06a7b36c
    • A
      x86/asm/entry/64: Migrate error and IRQ exit work to C and remove old assembly code · 02bc7768
      Andy Lutomirski 提交于
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/60e90901eee611e59e958bfdbbe39969b4f88fe5.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      02bc7768
    • A
      x86/asm/entry/64: Simplify IRQ stack pt_regs handling · a586f98e
      Andy Lutomirski 提交于
      There's no need for both RSI and RDI to point to the original stack.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/3a0481f809dd340c7d3f54ce3fd6d66ef2a578cd.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a586f98e
    • A
      x86/asm/entry/64: Save all regs on interrupt entry · ff467594
      Andy Lutomirski 提交于
      To prepare for the big rewrite of the error and interrupt exit
      paths, we will need pt_regs completely filled in.
      
      It's already completely filled in when error_exit runs, so rearrange
      interrupt handling to match it.  This will slow down interrupt
      handling very slightly (eight instructions), but the
      simplification it enables will be more than worth it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/d8a766a7f558b30e6e01352854628a2d9943460c.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff467594
    • A
      x86/entry/64: Migrate 64-bit and compat syscalls to the new exit handlers and... · 29ea1b25
      Andy Lutomirski 提交于
      x86/entry/64: Migrate 64-bit and compat syscalls to the new exit handlers and remove old assembly code
      
      These need to be migrated together, as the compat case used to
      jump into the middle of the 64-bit exit code.
      
      Remove the old assembly code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/d4d1d70de08ac3640badf50048a9e8f18fe2497f.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29ea1b25
    • A
      x86/entry/64: Really create an error-entry-from-usermode code path · cb6f64ed
      Andy Lutomirski 提交于
      In 539f5113 ("x86/asm/entry/64: Disentangle error_entry/exit
      gsbase/ebx/usermode code"), I arranged the code slightly wrong
      -- IRET faults would skip the code path that was intended to
      execute on all error entries from user mode.  Fix it up.
      
      While we're at it, make all the labels in error_entry local.
      
      This does not fix a bug, but we'll need it, and it slightly
      shrinks the code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/91e17891e49fa3d61357eadc451529ad48143ee1.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb6f64ed
    • A
      x86/entry: Add new, comprehensible entry and exit handlers written in C · c5c46f59
      Andy Lutomirski 提交于
      The current x86 entry and exit code, written in a mixture of assembly and
      C code, is incomprehensible due to being open-coded in a lot of places
      without coherent documentation.
      
      It appears to work primary by luck and duct tape: i.e. obvious runtime
      failures were fixed on-demand, without re-thinking the design.
      
      Due to those reasons our confidence level in that code is low, and it is
      very difficult to incrementally improve.
      
      Add new code written in C, in preparation for simply deleting the old
      entry code.
      
      prepare_exit_to_usermode() is a new function that will handle all
      slow path exits to user mode.  It is called with IRQs disabled
      and it leaves us in a state in which it is safe to immediately
      return to user mode.  IRQs must not be re-enabled at any point
      after prepare_exit_to_usermode() returns and user mode is actually
      entered. (We can, of course, fail to enter user mode and treat
      that failure as a fresh entry to kernel mode.)
      
      All callers of do_notify_resume() will be migrated to call
      prepare_exit_to_usermode() instead; prepare_exit_to_usermode() needs
      to do everything that do_notify_resume() does today, but it also
      takes care of scheduling and context tracking.  Unlike
      do_notify_resume(), it does not need to be called in a loop.
      
      syscall_return_slowpath() is exactly what it sounds like: it will
      be called on any syscall exit slow path. It will replace
      syscall_trace_leave() and it calls prepare_exit_to_usermode() on the
      way out.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/c57c8b87661a4152801d7d3786eac2d1a2f209dd.1435952415.git.luto@kernel.org
      [ Improved the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5c46f59
    • A
      x86/entry: Add enter_from_user_mode() and use it in syscalls · feed36cd
      Andy Lutomirski 提交于
      Changing the x86 context tracking hooks is dangerous because
      there are no good checks that we track our context correctly.
      Add a helper to check that we're actually in CONTEXT_USER when
      we enter from user mode and wire it up for syscall entries.
      
      Subsequent patches will wire this up for all non-NMI entries as
      well.  NMIs are their own special beast and cannot currently
      switch overall context tracking state.  Instead, they have their
      own special RCU hooks.
      
      This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a
      branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a
      layer of indirection).  Eventually, we should fix up the core
      context tracking code to supply a function that does what we
      want (and can be much simpler than user_exit), which will enable
      us to get rid of the extra call.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/853b42420066ec3fb856779cdc223a6dcb5d355b.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      feed36cd
    • A
      x86/entry: Move C entry and exit code to arch/x86/entry/common.c · 1f484aa6
      Andy Lutomirski 提交于
      The entry and exit C helpers were confusingly scattered between
      ptrace.c and signal.c, even though they aren't specific to
      ptrace or signal handling.  Move them together in a new file.
      
      This change just moves code around.  It doesn't change anything.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f484aa6
    • A
      x86/entry/64/compat: Fix bad fast syscall arg failure path · 5e99cb7c
      Andy Lutomirski 提交于
      If user code does SYSCALL32 or SYSENTER without a valid stack,
      then our attempt to determine the syscall args will result in a
      failed uaccess fault.  Previously, we would try to recover by
      jumping to the syscall exit code, but we'd run the syscall exit
      work even though we never made it to the syscall entry work.
      
      Clean it up by treating the failure path as a non-syscall entry
      and exit pair.
      
      This fixes strace's output when running the syscall_arg_fault
      test. Without this fix, strace would get out of sync and would
      fail to associate syscall entries with syscall exits.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/903010762c07a3d67df914fea2da84b52b0f8f1d.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5e99cb7c
  17. 06 7月, 2015 3 次提交