1. 06 3月, 2015 1 次提交
  2. 04 2月, 2015 1 次提交
  3. 03 9月, 2014 2 次提交
  4. 07 3月, 2014 1 次提交
    • S
      x86: Keep thread_info on thread stack in x86_32 · 198d208d
      Steven Rostedt 提交于
      x86_64 uses a per_cpu variable kernel_stack to always point to
      the thread stack of current. This is where the thread_info is stored
      and is accessed from this location even when the irq or exception stack
      is in use. This removes the complexity of having to maintain the
      thread info on the stack when interrupts are running and having to
      copy the preempt_count and other fields to the interrupt stack.
      
      x86_32 uses the old method of copying the thread_info from the thread
      stack to the exception stack just before executing the exception.
      
      Having the two different requires #ifdefs and also the x86_32 way
      is a bit of a pain to maintain. By converting x86_32 to the same
      method of x86_64, we can remove #ifdefs, clean up the x86_32 code
      a little, and remove the overhead of the copy.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
      Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      198d208d
  5. 07 1月, 2014 1 次提交
  6. 13 11月, 2013 1 次提交
  7. 25 9月, 2013 1 次提交
  8. 07 8月, 2013 1 次提交
  9. 26 6月, 2013 1 次提交
  10. 19 6月, 2013 1 次提交
  11. 01 5月, 2013 1 次提交
    • T
      dump_stack: unify debug information printed by show_regs() · a43cb95d
      Tejun Heo 提交于
      show_regs() is inherently arch-dependent but it does make sense to print
      generic debug information and some archs already do albeit in slightly
      different forms.  This patch introduces a generic function to print debug
      information from show_regs() so that different archs print out the same
      information and it's much easier to modify what's printed.
      
      show_regs_print_info() prints out the same debug info as dump_stack()
      does plus task and thread_info pointers.
      
      * Archs which didn't print debug info now do.
      
        alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
        metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
        um, xtensa
      
      * Already prints debug info.  Replaced with show_regs_print_info().
        The printed information is superset of what used to be there.
      
        arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
      
      * s390 is special in that it used to print arch-specific information
        along with generic debug info.  Heiko and Martin think that the
        arch-specific extra isn't worth keeping s390 specfic implementation.
        Converted to use the generic version.
      
      Note that now all archs print the debug info before actual register
      dumps.
      
      An example BUG() dump follows.
      
       kernel BUG at /work/os/work/kernel/workqueue.c:4841!
       invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
       task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
       RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
       RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
       RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
       RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
       RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
        0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
        ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
       Call Trace:
        [<ffffffff81000312>] do_one_initcall+0x122/0x170
        [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        [<ffffffff81c4776e>] kernel_init+0xe/0xf0
        [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81c47760>] ? rest_init+0x140/0x140
        ...
      
      v2: Typo fix in x86-32.
      
      v3: CPU number dropped from show_regs_print_info() as
          dump_stack_print_info() has been updated to print it.  s390
          specific implementation dropped as requested by s390 maintainers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
      Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a43cb95d
  12. 29 11月, 2012 2 次提交
  13. 01 10月, 2012 2 次提交
  14. 20 9月, 2012 1 次提交
    • A
      x86: get rid of TIF_IRET hackery · e76623d6
      Al Viro 提交于
      TIF_NOTIFY_RESUME will work in precisely the same way; all that
      is achieved by TIF_IRET is appearing that there's some work to be
      done, so we end up on the iret exit path.  Just use NOTIFY_RESUME.
      And for execve() do that in 32bit start_thread(), not sys_execve()
      itself.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e76623d6
  15. 19 9月, 2012 1 次提交
    • S
      x86, fpu: use non-lazy fpu restore for processors supporting xsave · 304bceda
      Suresh Siddha 提交于
      Fundamental model of the current Linux kernel is to lazily init and
      restore FPU instead of restoring the task state during context switch.
      This changes that fundamental lazy model to the non-lazy model for
      the processors supporting xsave feature.
      
      Reasons driving this model change are:
      
      i. Newer processors support optimized state save/restore using xsaveopt and
      xrstor by tracking the INIT state and MODIFIED state during context-switch.
      This is faster than modifying the cr0.TS bit which has serializing semantics.
      
      ii. Newer glibc versions use SSE for some of the optimized copy/clear routines.
      With certain workloads (like boot, kernel-compilation etc), application
      completes its work with in the first 5 task switches, thus taking upto 5 #DNA
      traps with the kernel not getting a chance to apply the above mentioned
      pre-load heuristic.
      
      iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit
      and thus will not work correctly in the presence of lazy restore. Non-lazy
      state restore is needed for enabling such features.
      
      Some data on a two socket SNB system:
       * Saved 20K DNA exceptions during boot on a two socket SNB system.
       * Saved 50K DNA exceptions during kernel-compilation workload.
       * Improved throughput of the AVX based checksumming function inside the
         kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts
         pair.
      
      Also now kernel_fpu_begin/end() relies on the patched
      alternative instructions. So move check_fpu() which uses the
      kernel_fpu_begin/end() after alternative_instructions().
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com
      Merge 32-bit boot fix from,
      Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      304bceda
  16. 17 5月, 2012 1 次提交
    • S
      fork: move the real prepare_to_copy() users to arch_dup_task_struct() · 55ccf3fe
      Suresh Siddha 提交于
      Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
      the architectures and the rest following the x86 model of flushing the extended
      register state like fpu there.
      
      Remove it and use the arch_dup_task_struct() instead.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.comAcked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      55ccf3fe
  17. 15 5月, 2012 1 次提交
  18. 29 3月, 2012 1 次提交
  19. 26 3月, 2012 1 次提交
  20. 01 3月, 2012 1 次提交
  21. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  22. 21 2月, 2012 2 次提交
    • L
      i387: support lazy restore of FPU state · 7e16838d
      Linus Torvalds 提交于
      This makes us recognize when we try to restore FPU state that matches
      what we already have in the FPU on this CPU, and avoids the restore
      entirely if so.
      
      To do this, we add two new data fields:
      
       - a percpu 'fpu_owner_task' variable that gets written any time we
         update the "has_fpu" field, and thus acts as a kind of back-pointer
         to the task that owns the CPU.  The exception is when we save the FPU
         state as part of a context switch - if the save can keep the FPU
         state around, we leave the 'fpu_owner_task' variable pointing at the
         task whose FP state still remains on the CPU.
      
       - a per-thread 'last_cpu' field, that indicates which CPU that thread
         used its FPU on last.  We update this on every context switch
         (writing an invalid CPU number if the last context switch didn't
         leave the FPU in a lazily usable state), so we know that *that*
         thread has done nothing else with the FPU since.
      
      These two fields together can be used when next switching back to the
      task to see if the CPU still matches: if 'fpu_owner_task' matches the
      task we are switching to, we know that no other task (or kernel FPU
      usage) touched the FPU on this CPU in the meantime, and if the current
      CPU number matches the 'last_cpu' field, we know that this thread did no
      other FP work on any other CPU, so the FPU state on the CPU must match
      what was saved on last context switch.
      
      In that case, we can avoid the 'f[x]rstor' entirely, and just clear the
      CR0.TS bit.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e16838d
    • L
      i387: fix up some fpu_counter confusion · cea20ca3
      Linus Torvalds 提交于
      This makes sure we clear the FPU usage counter for newly created tasks,
      just so that we start off in a known state (for example, don't try to
      preload the FPU state on the first task switch etc).
      
      It also fixes a thinko in when we increment the fpu_counter at task
      switch time, introduced by commit 34ddc81a ("i387: re-introduce FPU
      state preloading at context switch time").  We should increment the
      *new* task fpu_counter, not the old task, and only if we decide to use
      that state (whether lazily or preloaded).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cea20ca3
  23. 19 2月, 2012 1 次提交
    • L
      i387: re-introduce FPU state preloading at context switch time · 34ddc81a
      Linus Torvalds 提交于
      After all the FPU state cleanups and finally finding the problem that
      caused all our FPU save/restore problems, this re-introduces the
      preloading of FPU state that was removed in commit b3b0870e ("i387:
      do not preload FPU state at task switch time").
      
      However, instead of simply reverting the removal, this reimplements
      preloading with several fixes, most notably
      
       - properly abstracted as a true FPU state switch, rather than as
         open-coded save and restore with various hacks.
      
         In particular, implementing it as a proper FPU state switch allows us
         to optimize the CR0.TS flag accesses: there is no reason to set the
         TS bit only to then almost immediately clear it again.  CR0 accesses
         are quite slow and expensive, don't flip the bit back and forth for
         no good reason.
      
       - Make sure that the same model works for both x86-32 and x86-64, so
         that there are no gratuitous differences between the two due to the
         way they save and restore segment state differently due to
         architectural differences that really don't matter to the FPU state.
      
       - Avoid exposing the "preload" state to the context switch routines,
         and in particular allow the concept of lazy state restore: if nothing
         else has used the FPU in the meantime, and the process is still on
         the same CPU, we can avoid restoring state from memory entirely, just
         re-expose the state that is still in the FPU unit.
      
         That optimized lazy restore isn't actually implemented here, but the
         infrastructure is set up for it.  Of course, older CPU's that use
         'fnsave' to save the state cannot take advantage of this, since the
         state saving also trashes the state.
      
      In other words, there is now an actual _design_ to the FPU state saving,
      rather than just random historical baggage.  Hopefully it's easier to
      follow as a result.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34ddc81a
  24. 17 2月, 2012 1 次提交
    • L
      i387: do not preload FPU state at task switch time · b3b0870e
      Linus Torvalds 提交于
      Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
      is spending several days looking for a bug in the state save/restore
      code.  And the preload code has some rather subtle interactions with
      both paravirtualization support and segment state restore, so it's not
      nearly as simple as it should be.
      
      Also, now that we no longer necessarily depend on a single bit (ie
      TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
      to do better.  If we are really switching between two processes that
      keep touching the FP state, save/restore is inevitable, but in the case
      of having one process that does most of the FPU usage, we may actually
      be able to do much better than the preloading.
      
      In particular, we may be able to keep track of which CPU the process ran
      on last, and also per CPU keep track of which process' FP state that CPU
      has.  For modern CPU's that don't destroy the FPU contents on save time,
      that would allow us to do a lazy restore by just re-enabling the
      existing FPU state - with no restore cost at all!
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b0870e
  25. 12 12月, 2011 3 次提交
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
  26. 10 10月, 2011 1 次提交
    • D
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus 提交于
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b227e233
  27. 15 9月, 2011 1 次提交
  28. 04 8月, 2011 1 次提交
    • L
      cpuidle: stop depending on pm_idle · a0bfa137
      Len Brown 提交于
      cpuidle users should call cpuidle_call_idle() directly
      rather than via (pm_idle)() function pointer.
      
      Architecture may choose to continue using (pm_idle)(),
      but cpuidle need not depend on it:
      
        my_arch_cpu_idle()
      	...
      	if(cpuidle_call_idle())
      		pm_idle();
      
      cc: Kevin Hilman <khilman@deeprootsystems.com>
      cc: Paul Mundt <lethal@linux-sh.org>
      cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a0bfa137
  29. 10 6月, 2011 1 次提交
    • M
      exec: delay address limit change until point of no return · dac853ae
      Mathias Krause 提交于
      Unconditionally changing the address limit to USER_DS and not restoring
      it to its old value in the error path is wrong because it prevents us
      using kernel memory on repeated calls to this function.  This, in fact,
      breaks the fallback of hard coded paths to the init program from being
      ever successful if the first candidate fails to load.
      
      With this patch applied switching to USER_DS is delayed until the point
      of no return is reached which makes it possible to have a multi-arch
      rootfs with one arch specific init binary for each of the (hard coded)
      probed paths.
      
      Since the address limit is already set to USER_DS when start_thread()
      will be invoked, this redundancy can be safely removed.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dac853ae
  30. 13 1月, 2011 1 次提交
    • T
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e
      Thomas Renninger 提交于
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
      
      Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
      events -> this patch fixes it and makes cpu_idle events throwing less complex.
      
      It also introduces cpu_idle events for all architectures which use
      the cpuidle subsystem, namely:
        - arch/arm/mach-at91/cpuidle.c
        - arch/arm/mach-davinci/cpuidle.c
        - arch/arm/mach-kirkwood/cpuidle.c
        - arch/arm/mach-omap2/cpuidle34xx.c
        - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
        - arch/x86/kernel/process.c (did throw events before, but was a mess)
        - drivers/idle/intel_idle.c (did throw events before)
      
      Convention should be:
      Fire cpu_idle events inside the current pm_idle function (not somewhere
      down the the callee tree) to keep things easy.
      
      Current possible pm_idle functions in X86:
      c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
      -> this is really easy is now.
      
      This affects userspace:
      The type field of the cpu_idle power event can now direclty get
      mapped to:
      /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
      instead of throwing very CPU/mwait specific values.
      This change is not visible for the intel_idle driver.
      For the acpi_idle driver it should only be visible if the vendor
      misses out C-states in his BIOS.
      Another (perf timechart) patch reads out cpuidle info of cpu_idle
      events from:
      /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
      to the correct C-/cpuidle state again, even if e.g. vendors miss
      out C-states in their BIOS and for example only export C1 and C3.
      -> everything is fine.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      CC: Robert Schoene <robert.schoene@tu-dresden.de>
      CC: Jean Pihet <j-pihet@ti.com>
      CC: Arjan van de Ven <arjan@linux.intel.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: linux-pm@lists.linux-foundation.org
      CC: linux-acpi@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: linux-perf-users@vger.kernel.org
      CC: linux-omap@vger.kernel.org
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f77cfe4e
  31. 04 1月, 2011 1 次提交
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  32. 18 6月, 2010 1 次提交
  33. 11 5月, 2010 1 次提交
  34. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e