1. 21 2月, 2012 1 次提交
    • L
      i387: fix up some fpu_counter confusion · cea20ca3
      Linus Torvalds 提交于
      This makes sure we clear the FPU usage counter for newly created tasks,
      just so that we start off in a known state (for example, don't try to
      preload the FPU state on the first task switch etc).
      
      It also fixes a thinko in when we increment the fpu_counter at task
      switch time, introduced by commit 34ddc81a ("i387: re-introduce FPU
      state preloading at context switch time").  We should increment the
      *new* task fpu_counter, not the old task, and only if we decide to use
      that state (whether lazily or preloaded).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cea20ca3
  2. 19 2月, 2012 1 次提交
    • L
      i387: re-introduce FPU state preloading at context switch time · 34ddc81a
      Linus Torvalds 提交于
      After all the FPU state cleanups and finally finding the problem that
      caused all our FPU save/restore problems, this re-introduces the
      preloading of FPU state that was removed in commit b3b0870e ("i387:
      do not preload FPU state at task switch time").
      
      However, instead of simply reverting the removal, this reimplements
      preloading with several fixes, most notably
      
       - properly abstracted as a true FPU state switch, rather than as
         open-coded save and restore with various hacks.
      
         In particular, implementing it as a proper FPU state switch allows us
         to optimize the CR0.TS flag accesses: there is no reason to set the
         TS bit only to then almost immediately clear it again.  CR0 accesses
         are quite slow and expensive, don't flip the bit back and forth for
         no good reason.
      
       - Make sure that the same model works for both x86-32 and x86-64, so
         that there are no gratuitous differences between the two due to the
         way they save and restore segment state differently due to
         architectural differences that really don't matter to the FPU state.
      
       - Avoid exposing the "preload" state to the context switch routines,
         and in particular allow the concept of lazy state restore: if nothing
         else has used the FPU in the meantime, and the process is still on
         the same CPU, we can avoid restoring state from memory entirely, just
         re-expose the state that is still in the FPU unit.
      
         That optimized lazy restore isn't actually implemented here, but the
         infrastructure is set up for it.  Of course, older CPU's that use
         'fnsave' to save the state cannot take advantage of this, since the
         state saving also trashes the state.
      
      In other words, there is now an actual _design_ to the FPU state saving,
      rather than just random historical baggage.  Hopefully it's easier to
      follow as a result.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34ddc81a
  3. 17 2月, 2012 2 次提交
    • L
      i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore · 4903062b
      Linus Torvalds 提交于
      The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
      pending.  In order to not leak FIP state from one process to another, we
      need to do a floating point load after the fxsave of the old process,
      and before the fxrstor of the new FPU state.  That resets the state to
      the (uninteresting) kernel load, rather than some potentially sensitive
      user information.
      
      We used to do this directly after the FPU state save, but that is
      actually very inconvenient, since it
      
       (a) corrupts what is potentially perfectly good FPU state that we might
           want to lazy avoid restoring later and
      
       (b) on x86-64 it resulted in a very annoying ordering constraint, where
           "__unlazy_fpu()" in the task switch needs to be delayed until after
           the DS segment has been reloaded just to get the new DS value.
      
      Coupling it to the fxrstor instead of the fxsave automatically avoids
      both of these issues, and also ensures that we only do it when actually
      necessary (the FP state after a save may never actually get used).  It's
      simply a much more natural place for the leaked state cleanup.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4903062b
    • L
      i387: do not preload FPU state at task switch time · b3b0870e
      Linus Torvalds 提交于
      Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
      is spending several days looking for a bug in the state save/restore
      code.  And the preload code has some rather subtle interactions with
      both paravirtualization support and segment state restore, so it's not
      nearly as simple as it should be.
      
      Also, now that we no longer necessarily depend on a single bit (ie
      TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
      to do better.  If we are really switching between two processes that
      keep touching the FP state, save/restore is inevitable, but in the case
      of having one process that does most of the FPU usage, we may actually
      be able to do much better than the preloading.
      
      In particular, we may be able to keep track of which CPU the process ran
      on last, and also per CPU keep track of which process' FP state that CPU
      has.  For modern CPU's that don't destroy the FPU contents on save time,
      that would allow us to do a lazy restore by just re-enabling the
      existing FPU state - with no restore cost at all!
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b0870e
  4. 12 12月, 2011 3 次提交
    • F
      x86: Enter rcu extended qs after idle notifier call · e37e112d
      Frederic Weisbecker 提交于
      The idle notifier, called by enter_idle(), enters into rcu read
      side critical section but at that time we already switched into
      the RCU-idle window (rcu_idle_enter() has been called). And it's
      illegal to use rcu_read_lock() in that state.
      
      This results in rcu reporting its bad mood:
      
      [    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    1.275635] Hardware name: AMD690VM-FMH
      [    1.275635] Modules linked in:
      [    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
      [    1.275635] Call Trace:
      [    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
      [    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
      [    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
      [    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
      [    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
      [    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
      [    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      [    1.275635] ---[ end trace a22d306b065d4a66 ]---
      
      Fix this by entering rcu extended quiescent state later, just before
      the CPU goes to sleep.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e37e112d
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
  5. 05 12月, 2011 1 次提交
  6. 10 10月, 2011 1 次提交
    • D
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus 提交于
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b227e233
  7. 04 8月, 2011 1 次提交
    • L
      cpuidle: stop depending on pm_idle · a0bfa137
      Len Brown 提交于
      cpuidle users should call cpuidle_call_idle() directly
      rather than via (pm_idle)() function pointer.
      
      Architecture may choose to continue using (pm_idle)(),
      but cpuidle need not depend on it:
      
        my_arch_cpu_idle()
      	...
      	if(cpuidle_call_idle())
      		pm_idle();
      
      cc: Kevin Hilman <khilman@deeprootsystems.com>
      cc: Paul Mundt <lethal@linux-sh.org>
      cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a0bfa137
  8. 10 6月, 2011 1 次提交
    • M
      exec: delay address limit change until point of no return · dac853ae
      Mathias Krause 提交于
      Unconditionally changing the address limit to USER_DS and not restoring
      it to its old value in the error path is wrong because it prevents us
      using kernel memory on repeated calls to this function.  This, in fact,
      breaks the fallback of hard coded paths to the init program from being
      ever successful if the first candidate fails to load.
      
      With this patch applied switching to USER_DS is delayed until the point
      of no return is reached which makes it possible to have a multi-arch
      rootfs with one arch specific init binary for each of the (hard coded)
      probed paths.
      
      Since the address limit is already set to USER_DS when start_thread()
      will be invoked, this redundancy can be safely removed.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dac853ae
  9. 24 3月, 2011 1 次提交
  10. 13 1月, 2011 1 次提交
    • T
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e
      Thomas Renninger 提交于
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
      
      Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
      events -> this patch fixes it and makes cpu_idle events throwing less complex.
      
      It also introduces cpu_idle events for all architectures which use
      the cpuidle subsystem, namely:
        - arch/arm/mach-at91/cpuidle.c
        - arch/arm/mach-davinci/cpuidle.c
        - arch/arm/mach-kirkwood/cpuidle.c
        - arch/arm/mach-omap2/cpuidle34xx.c
        - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
        - arch/x86/kernel/process.c (did throw events before, but was a mess)
        - drivers/idle/intel_idle.c (did throw events before)
      
      Convention should be:
      Fire cpu_idle events inside the current pm_idle function (not somewhere
      down the the callee tree) to keep things easy.
      
      Current possible pm_idle functions in X86:
      c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
      -> this is really easy is now.
      
      This affects userspace:
      The type field of the cpu_idle power event can now direclty get
      mapped to:
      /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
      instead of throwing very CPU/mwait specific values.
      This change is not visible for the intel_idle driver.
      For the acpi_idle driver it should only be visible if the vendor
      misses out C-states in his BIOS.
      Another (perf timechart) patch reads out cpuidle info of cpu_idle
      events from:
      /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
      to the correct C-/cpuidle state again, even if e.g. vendors miss
      out C-states in their BIOS and for example only export C1 and C3.
      -> everything is fine.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      CC: Robert Schoene <robert.schoene@tu-dresden.de>
      CC: Jean Pihet <j-pihet@ti.com>
      CC: Arjan van de Ven <arjan@linux.intel.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: linux-pm@lists.linux-foundation.org
      CC: linux-acpi@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: linux-perf-users@vger.kernel.org
      CC: linux-omap@vger.kernel.org
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f77cfe4e
  11. 04 1月, 2011 1 次提交
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  12. 10 9月, 2010 1 次提交
    • B
      x86-64, fpu: Disable preemption when using TS_USEDFPU · a4d4fbc7
      Brian Gerst 提交于
      Consolidates code and fixes the below race for 64-bit.
      
      commit 9fa2f37bfeb798728241cc4a19578ce6e4258f25
      Author: torvalds <torvalds>
      Date:   Tue Sep 2 07:37:25 2003 +0000
      
          Be a lot more careful about TS_USEDFPU and preemption
      
          We had some races where we testecd (or set) TS_USEDFPU together
          with sequences that depended on the setting (like clearing or
          setting the TS flag in %cr0) and we could be preempted in between,
          which screws up the FPU state, since preemption will itself change
          USEDFPU and the TS flag.
      
          This makes it a lot more explicit: the "internal" low-level FPU
          functions ("__xxxx_fpu()") all require preemption to be disabled,
          and the exported "real" functions will make sure that is the case.
      
          One case - in __switch_to() - was switched to the non-preempt-safe
          internal version, since the scheduler itself has already disabled
          preemption.
      
          BKrev: 3f5448b5WRiQuyzAlbajs3qoQjSobw
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1283563039-3466-6-git-send-email-brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a4d4fbc7
  13. 18 6月, 2010 1 次提交
  14. 11 5月, 2010 1 次提交
  15. 24 4月, 2010 1 次提交
  16. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  17. 17 2月, 2010 1 次提交
  18. 30 1月, 2010 1 次提交
  19. 14 1月, 2010 1 次提交
  20. 28 12月, 2009 1 次提交
    • P
      x86: Use KERN_DEFAULT log-level in __show_regs() · d015a092
      Pekka Enberg 提交于
      Andrew Morton reported a strange looking kmemcheck warning:
      
        WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004fba6c20)
        0000000000000000310000000000000000000000000000002413000000c9ffff
         u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
      
         [<ffffffff810af3aa>] kmemleak_scan+0x25a/0x540
         [<ffffffff810afbcb>] kmemleak_scan_thread+0x5b/0xe0
         [<ffffffff8104d0fe>] kthread+0x9e/0xb0
         [<ffffffff81003074>] kernel_thread_helper+0x4/0x10
         [<ffffffffffffffff>] 0xffffffffffffffff
      
      The above printout is missing register dump completely. The
      problem here is that the output comes from syslog which doesn't
      show KERN_INFO log-level messages. We didn't see this before
      because both of us were testing on 32-bit kernels which use the
      _default_ log-level.
      
      Fix that up by explicitly using KERN_DEFAULT log-level for
      __show_regs() printks.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1261988819.4641.2.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d015a092
  21. 11 12月, 2009 4 次提交
  22. 10 12月, 2009 2 次提交
  23. 09 12月, 2009 1 次提交
  24. 02 12月, 2009 1 次提交
  25. 24 11月, 2009 1 次提交
    • T
      sched, x86: Optimize branch hint in __switch_to() · a3a1de0c
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed 96%
      incorrect branch hints:
      
        6548732 174664120  96 __switch_to                    process_64.c
          406
        6548745 174565593  96 __switch_to                    process_64.c
          410
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBB93.3080307@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3a1de0c
  26. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  27. 04 11月, 2009 1 次提交
    • S
      x86, fs: Fix x86 procfs stack information for threads on 64-bit · 89240ba0
      Stefani Seibold 提交于
      This patch fixes two issues in the procfs stack information on
      x86-64 linux.
      
      The 32 bit loader compat_do_execve did not store stack
      start. (this was figured out by Alexey Dobriyan).
      
      The stack information on a x64_64 kernel always shows 0 kbyte
      stack usage, because of a missing implementation of the KSTK_ESP
      macro which always returned -1.
      
      The new implementation now returns the right value.
      Signed-off-by: NStefani Seibold <stefani@seibold.net>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1257240160.4889.24.camel@wall-e>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89240ba0
  28. 03 11月, 2009 1 次提交
  29. 10 10月, 2009 2 次提交
  30. 04 8月, 2009 1 次提交
    • T
      x86, percpu: Collect hot percpu variables into one cacheline · bdf977b3
      Tejun Heo 提交于
      On x86_64, percpu variables current_task and kernel_stack are used for
      get_current() and current_thread_info() respectively and thus are
      often used close to each other.  Move definition of current_task to
      kernel/cpu/common.c right above kernel_stack definition and align it
      to cacheline so that they always fall into the same cacheline.  Two
      percpu variables defined there together - irq_stack_ptr and irq_count
      - are also pretty hot and will benefit from sharing the cacheline.
      
      For consistency, current_task definition for x86_32 is also moved to
      kernel/cpu/common.c.
      
      Putting current_task and kernel_stack into the same cacheline was
      suggested by Linus Torvalds.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bdf977b3
  31. 18 6月, 2009 2 次提交