1. 17 2月, 2012 1 次提交
    • L
      i387: fix x86-64 preemption-unsafe user stack save/restore · 15d8791c
      Linus Torvalds 提交于
      Commit 5b1cbac3 ("i387: make irq_fpu_usable() tests more robust")
      added a sanity check to the #NM handler to verify that we never cause
      the "Device Not Available" exception in kernel mode.
      
      However, that check actually pinpointed a (fundamental) race where we do
      cause that exception as part of the signal stack FPU state save/restore
      code.
      
      Because we use the floating point instructions themselves to save and
      restore state directly from user mode, we cannot do that atomically with
      testing the TS_USEDFPU bit: the user mode access itself may cause a page
      fault, which causes a task switch, which saves and restores the FP/MMX
      state from the kernel buffers.
      
      This kind of "recursive" FP state save is fine per se, but it means that
      when the signal stack save/restore gets restarted, it will now take the
      '#NM' exception we originally tried to avoid.  With preemption this can
      happen even without the page fault - but because of the user access, we
      cannot just disable preemption around the save/restore instruction.
      
      There are various ways to solve this, including using the
      "enable/disable_page_fault()" helpers to not allow page faults at all
      during the sequence, and fall back to copying things by hand without the
      use of the native FP state save/restore instructions.
      
      However, the simplest thing to do is to just allow the #NM from kernel
      space, but fix the race in setting and clearing CR0.TS that this all
      exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
      atomic wrt scheduling, so while the actual state save/restore can be
      interrupted and restarted, the act of actually clearing/setting CR0.TS
      and the TS_USEDFPU bit together must not.
      
      Instead of just adding random "preempt_disable/enable()" calls to what
      is already excessively ugly code, this introduces some helper functions
      that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
      the user state instead.
      
      Those helper functions should probably eventually replace the other
      ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
      some more: the task switching functionality in particular needs to
      expose the difference between the 'prev' and 'next' threads, while the
      new helper functions intentionally were written to only work with
      'current'.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15d8791c
  2. 14 2月, 2012 2 次提交
    • L
      i387: make irq_fpu_usable() tests more robust · 5b1cbac3
      Linus Torvalds 提交于
      Some code - especially the crypto layer - wants to use the x86
      FP/MMX/AVX register set in what may be interrupt (typically softirq)
      context.
      
      That *can* be ok, but the tests for when it was ok were somewhat
      suspect.  We cannot touch the thread-specific status bits either, so
      we'd better check that we're not going to try to save FP state or
      anything like that.
      
      Now, it may be that the TS bit is always cleared *before* we set the
      USEDFPU bit (and only set when we had already cleared the USEDFP
      before), so the TS bit test may actually have been sufficient, but it
      certainly was not obviously so.
      
      So this explicitly verifies that we will not touch the TS_USEDFPU bit,
      and adds a few related sanity-checks.  Because it seems that somehow
      AES-NI is corrupting user FP state.  The cause is not clear, and this
      patch doesn't fix it, but while debugging it I really wanted the code to
      be more obviously correct and robust.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b1cbac3
    • L
      i387: math_state_restore() isn't called from asm · be98c2cd
      Linus Torvalds 提交于
      It was marked asmlinkage for some really old and stale legacy reasons.
      Fix that and the equally stale comment.
      
      Noticed when debugging the irq_fpu_usable() bugs.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be98c2cd
  3. 22 12月, 2011 2 次提交
    • S
      x86: Add counter when debug stack is used with interrupts enabled · 42181186
      Steven Rostedt 提交于
      Mathieu Desnoyers pointed out a case that can cause issues with
      NMIs running on the debug stack:
      
        int3 -> interrupt -> NMI -> int3
      
      Because the interrupt changes the stack, the NMI will not see that
      it preempted the debug stack. Looking deeper at this case,
      interrupts only happen when the int3 is from userspace or in
      an a location in the exception table (fixup).
      
        userspace -> int3 -> interurpt -> NMI -> int3
      
      All other int3s that happen in the kernel should be processed
      without ever enabling interrupts, as the do_trap() call will
      panic the kernel if it is called to process any other location
      within the kernel.
      
      Adding a counter around the sections that enable interrupts while
      using the debug stack allows the NMI to also check that case.
      If the NMI sees that it either interrupted a task using the debug
      stack or the debug counter is non-zero, then it will have to
      change the IDT table to make the int3 not change stacks (which will
      corrupt the stack if it does).
      
      Note, I had to move the debug_usage functions out of processor.h
      and into debugreg.h because of the static inlined functions to
      inc and dec the debug_usage counter. __get_cpu_var() requires
      smp.h which includes processor.h, and would fail to build.
      
      Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.comReported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul Turner <pjt@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      42181186
    • S
      x86: Keep current stack in NMI breakpoints · 228bdaa9
      Steven Rostedt 提交于
      We want to allow NMI handlers to have breakpoints to be able to
      remove stop_machine from ftrace, kprobes and jump_labels. But if
      an NMI interrupts a current breakpoint, and then it triggers a
      breakpoint itself, it will switch to the breakpoint stack and
      corrupt the data on it for the breakpoint processing that it
      interrupted.
      
      Instead, have the NMI check if it interrupted breakpoint processing
      by checking if the stack that is currently used is a breakpoint
      stack. If it is, then load a special IDT that changes the IST
      for the debug exception to keep the same stack in kernel context.
      When the NMI is done, it puts it back.
      
      This way, if the NMI does trigger a breakpoint, it will keep
      using the same stack and not stomp on the breakpoint data for
      the breakpoint it interrupted.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      228bdaa9
  4. 06 12月, 2011 1 次提交
  5. 10 10月, 2011 1 次提交
  6. 11 8月, 2011 1 次提交
  7. 27 7月, 2011 1 次提交
  8. 07 6月, 2011 1 次提交
    • A
      x86-64: Emulate legacy vsyscalls · 5cec93c2
      Andy Lutomirski 提交于
      There's a fair amount of code in the vsyscall page.  It contains
      a syscall instruction (in the gettimeofday fallback) and who
      knows what will happen if an exploit jumps into the middle of
      some other code.
      
      Reduce the risk by replacing the vsyscalls with short magic
      incantations that cause the kernel to emulate the real
      vsyscalls. These incantations are useless if entered in the
      middle.
      
      This causes vsyscalls to be a little more expensive than real
      syscalls.  Fortunately sensible programs don't use them.
      The only exception is time() which is still called by glibc
      through the vsyscall - but calling time() millions of times
      per second is not sensible. glibc has this fixed in the
      development tree.
      
      This patch is not perfect: the vread_tsc and vread_hpet
      functions are still at a fixed address.  Fixing that might
      involve making alternative patching work in the vDSO.
      Signed-off-by: NAndy Lutomirski <luto@mit.edu>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
      Cc: Mikael Pettersson <mikpe@it.uu.se>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: pageexec@freemail.hu
      Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
      [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cec93c2
  9. 07 1月, 2011 4 次提交
  10. 05 1月, 2011 1 次提交
  11. 10 12月, 2010 1 次提交
  12. 18 11月, 2010 2 次提交
    • D
      x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog · 072b198a
      Don Zickus 提交于
      Now that the bulk of the old nmi_watchdog is gone, remove all
      the stub variables and hooks associated with it.
      
      This touches lots of files mainly because of how the io_apic
      nmi_watchdog was implemented.  Now that the io_apic nmi_watchdog
      is forever gone, remove all its fingers.
      
      Most of this code was not being exercised by virtue of
      nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
      risky here.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      072b198a
    • D
      x86, nmi_watchdog: Remove the old nmi_watchdog · 5f2b0ba4
      Don Zickus 提交于
      Now that we have a new nmi_watchdog that is more generic and
      sits on top of the perf subsystem, we really do not need the old
      nmi_watchdog any more.
      
      In addition, the old nmi_watchdog doesn't really work if you are
      using the default clocksource, hpet.  The old nmi_watchdog code
      relied on local apic interrupts to determine if the cpu is still
      alive.  With hpet as the clocksource, these interrupts don't
      increment any more and the old nmi_watchdog triggers false
      postives.
      
      This piece removes the old nmi_watchdog code and stubs out any
      variables and functions calls.  The stubs are the same ones used
      by the new nmi_watchdog code, so it should be well tested.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f2b0ba4
  13. 24 9月, 2010 1 次提交
    • B
      x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers. · 6554287b
      Bart Oldeman 提交于
      Impact: fix kernel bug such as:
      BUG: scheduling while atomic: dosemu.bin/19680/0x00000004
      See also Ubuntu bug 455067 at
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067
      
      Commits 4915a35e
      ("Use preempt_conditional_sti/cli in do_int3, like on x86_64.")
      and 3d2a71a5
      ("x86, traps: converge do_debug handlers")
      started disabling preemption in int1 and int3 handlers on i386.
      The problem with vm86 is that the call to handle_vm86_trap() may jump
      straight to entry_32.S and never returns so preempt is never enabled
      again, and there is an imbalance in the preempt count.
      
      Commit be716615 ("x86, vm86:
      fix preemption bug"), which was later (accidentally?) reverted by commit
      08d68323 ("hw-breakpoints: modifying
      generic debug exception to use thread-specific debug registers")
      fixed the problem for debug exceptions but not for breakpoints.
      
      There are three solutions to this problem.
      
      1. Reenable preemption before calling handle_vm86_trap(). This
      was the approach that was later reverted.
      
      2. Do not disable preemption for i386 in breakpoint and debug handlers.
      This was the situation before October 2008. As far as I understand
      preemption only needs to be disabled on x86_64 because a seperate stack is
      used, but it's nice to have things work the same way on
      i386 and x86_64.
      
      3. Let handle_vm86_trap() return instead of jumping to assembly code.
      By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME,
      the code in entry_32.S is instructed to return to 32 bit mode from
      V86 mode. The logic in entry_32.S was already present to handle signals.
      (I chose TIF_IRET because it's slightly more efficient in
      do_notify_resume() in signal.c, but in fact TIF_IRET can probably be
      replaced by TIF_NOTIFY_RESUME everywhere.)
      
      I'm submitting approach 3, because I believe it is the most elegant
      and prevents future confusion. Still, an obvious
      preempt_conditional_cli(regs); is necessary in traps.c to correct the
      bug.
      
      [ hpa: This is technically a regression, but because:
        1. the regression is so old,
        2. the patch seems relatively high risk, justifying more testing, and
        3. we're late in the 2.6.36-rc cycle,
      
        I'm queuing it up for the 2.6.37 merge window.  It might, however,
        justify as a -stable backport at a latter time, hence Cc: stable. ]
      Signed-off-by: NBart Oldeman <bartoldeman@users.sourceforge.net>
      LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: <stable@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      6554287b
  14. 10 9月, 2010 2 次提交
  15. 30 6月, 2010 1 次提交
    • F
      x86: Send a SIGTRAP for user icebp traps · a1e80faf
      Frederic Weisbecker 提交于
      Before we had a generic breakpoint layer, x86 used to send a
      sigtrap for any debug event that happened in userspace,
      except if it was caused by lazy dr7 switches.
      
      Currently we only send such signal for single step or breakpoint
      events.
      
      However, there are three other kind of debug exceptions:
      
      - debug register access detected: trigger an exception if the
        next instruction touches the debug registers. We don't use
        it.
      - task switch, but we don't use tss.
      - icebp/int01 trap. This instruction (0xf1) is undocumented and
        generates an int 1 exception. Unlike single step through TF
        flag, it doesn't set the single step origin of the exception
        in dr6.
      
      icebp then used to be reported in userspace using trap signals
      but this have been incidentally broken with the new breakpoint
      code. Reenable this. Since this is the only debug event that
      doesn't set anything in dr6, this is all we have to check.
      
      This fixes a regression in Wine where World Of Warcraft got broken
      as it uses this for software protection checks purposes. And
      probably other apps do.
      Reported-and-tested-by: NAlexandre Julliard <julliard@winehq.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>
      a1e80faf
  16. 21 5月, 2010 2 次提交
  17. 13 5月, 2010 1 次提交
    • D
      lockup_detector: Combine nmi_watchdog and softlockup detector · 58687acb
      Don Zickus 提交于
      The new nmi_watchdog (which uses the perf event subsystem) is very
      similar in structure to the softlockup detector.  Using Ingo's
      suggestion, I combined the two functionalities into one file:
      kernel/watchdog.c.
      
      Now both the nmi_watchdog (or hardlockup detector) and softlockup
      detector sit on top of the perf event subsystem, which is run every
      60 seconds or so to see if there are any lockups.
      
      To detect hardlockups, cpus not responding to interrupts, I
      implemented an hrtimer that runs 5 times for every perf event
      overflow event.  If that stops counting on a cpu, then the cpu is
      most likely in trouble.
      
      To detect softlockups, tasks not yielding to the scheduler, I used the
      previous kthread idea that now gets kicked every time the hrtimer fires.
      If the kthread isn't being scheduled neither is anyone else and the
      warning is printed to the console.
      
      I tested this on x86_64 and both the softlockup and hardlockup paths
      work.
      
      V2:
      - cleaned up the Kconfig and softlockup combination
      - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
      - seperated out the softlockup case from perf event subsystem
      - re-arranged the enabling/disabling nmi watchdog from proc space
      - added cpumasks for hardlockup failure cases
      - removed fallback to soft events if no PMU exists for hard events
      
      V3:
      - comment cleanups
      - drop support for older softlockup code
      - per_cpu cleanups
      - completely remove software clock base hardlockup detector
      - use per_cpu masking on hard/soft lockup detection
      - #ifdef cleanups
      - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
      - documentation additions
      
      V4:
      - documentation fixes
      - convert per_cpu to __get_cpu_var
      - powerpc compile fixes
      
      V5:
      - split apart warn flags for hard and soft lockups
      
      TODO:
      - figure out how to make an arch-agnostic clock2cycles call
        (if possible) to feed into perf events as a sample period
      
      [fweisbec: merged conflict patch]
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      58687acb
  18. 04 5月, 2010 4 次提交
  19. 26 3月, 2010 2 次提交
    • P
      x86, ptrace: Fix block-step · ea8e61b7
      Peter Zijlstra 提交于
      Implement ptrace-block-step using TIF_BLOCKSTEP which will set
      DEBUGCTLMSR_BTF when set for a task while preserving any other
      DEBUGCTLMSR bits.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100325135414.017536066@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea8e61b7
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  20. 25 2月, 2010 1 次提交
    • D
      nmi_watchdog: Clean up various small details · 47195d57
      Don Zickus 提交于
      Mostly copy/paste whitespace damage with a couple of nitpicks by
      the checkpatch script. Fix the struct definition as requested by Ingo too.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: gorcunov@gmail.com
      Cc: aris@redhat.com
      LKML-Reference: <1266880143-24943-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --
       arch/x86/kernel/apic/hw_nmi.c |   14 +++++------
       arch/x86/kernel/traps.c       |    6 ++--
       include/linux/nmi.h           |    2 -
       kernel/nmi_watchdog.c         |   51 ++++++++++++++++++++----------------------
       4 files changed, 36 insertions(+), 37 deletions(-)
      47195d57
  21. 08 2月, 2010 2 次提交
    • D
      nmi_watchdog: Config option to enable new nmi_watchdog · 84e478c6
      Don Zickus 提交于
      These are the bits that enable the new nmi_watchdog and safely
      isolate the old nmi_watchdog.  Only one or the other can run,
      not both at the same time.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: gorcunov@gmail.com
      Cc: aris@redhat.com
      Cc: peterz@infradead.org
      LKML-Reference: <1265424425-31562-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      84e478c6
    • D
      x86: Move notify_die from nmi.c to traps.c · e40b1720
      Don Zickus 提交于
      In order to handle a new nmi_watchdog approach, I need to move
      the notify_die() routine out of nmi_watchdog_tick() and into
      default_do_nmi(). This lets me easily swap out the old
      nmi_watchdog with the new one with just a config change.
      
      The change probably makes sense from a high level perspective
      because the nmi_watchdog shouldn't be handling notify_die
      routines anyway.  However, this move does change the semantics a
      little bit.  Instead of checking on every nmi interrupt if the
      cpus are stuck, only check them on the nmi_watchdog interrupts.
      
       v2: Move notify_die call into #idef block
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: gorcunov@gmail.com
      Cc: aris@redhat.com
      Cc: peterz@infradead.org
      LKML-Reference: <1265424425-31562-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e40b1720
  22. 29 1月, 2010 1 次提交
  23. 24 9月, 2009 1 次提交
  24. 20 9月, 2009 1 次提交
  25. 19 9月, 2009 1 次提交
  26. 31 8月, 2009 1 次提交
  27. 20 7月, 2009 1 次提交