1. 24 9月, 2009 1 次提交
  2. 20 9月, 2009 1 次提交
    • A
      tracing, x86, cpuidle: Move the end point of a C state in the power tracer · 288f023e
      Arjan van de Ven 提交于
      The "end of a C state" trace point currently happens before
      the code runs that corrects the TSC for having stopped during idle.
      
      The result of this is that the timestamp of the end-of-C-state event
      is garbage on cpus where the TSC stops during idle.
      
      This patch moves the end point of the C state to after the timekeeping
      engine of the kernel has been corrected.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: fweisbec@gmail.com
      Cc: peterz@infradead.org
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090919133533.139c2a46@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      288f023e
  3. 19 9月, 2009 1 次提交
  4. 20 8月, 2009 1 次提交
    • S
      clockevent: Prevent dead lock on clockevents_lock · f833bab8
      Suresh Siddha 提交于
      Currently clockevents_notify() is called with interrupts enabled at
      some places and interrupts disabled at some other places.
      
      This results in a deadlock in this scenario.
      
      cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
      cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
      cpu C doing set_mtrr() which will try to rendezvous of all the cpus.
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the spinlock and thus not
      reaching the rendezvous point.
      
      Fix the clockevents code so that clockevents_lock is taken with
      interrupts disabled and thus avoid the above deadlock.
      
      Also call lapic_timer_propagate_broadcast() on the destination cpu so
      that we avoid calling smp_call_function() in the clockevents notifier
      chain.
      
      This issue left us wondering if we need to change the MTRR rendezvous
      logic to use stop machine logic (instead of smp_call_function) or add
      a check in spinlock debug code to see if there are other spinlocks
      which gets taken under both interrupts enabled/disabled conditions.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: "Brown Len" <len.brown@intel.com>
      LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f833bab8
  5. 15 6月, 2009 1 次提交
    • V
      kmemcheck: add mm functions · 2dff4405
      Vegard Nossum 提交于
      With kmemcheck enabled, the slab allocator needs to do this:
      
      1. Tell kmemcheck to allocate the shadow memory which stores the status of
         each byte in the allocation proper, e.g. whether it is initialized or
         uninitialized.
      2. Tell kmemcheck which parts of memory that should be marked uninitialized.
         There are actually a few more states, such as "not yet allocated" and
         "recently freed".
      
      If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
      memory that can take page faults because of kmemcheck.
      
      If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
      request memory with the __GFP_NOTRACK flag. This does not prevent the page
      faults from occuring, however, but marks the object in question as being
      initialized so that no warnings will ever be produced for this object.
      
      In addition to (and in contrast to) __GFP_NOTRACK, the
      __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should
      not be tracked _because_ it would produce a false positive. Their values
      are identical, but need not be so in the future (for example, we could now
      enable/disable false positives with a config option).
      
      Parts of this patch were contributed by Pekka Enberg but merged for
      atomicity.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      2dff4405
  6. 12 5月, 2009 1 次提交
  7. 12 4月, 2009 1 次提交
    • J
      x86: clean up declarations and variables · 2c1b284e
      Jaswinder Singh Rajput 提交于
      Impact: cleanup, no code changed
      
       - syscalls.h       update declarations due to unifications
       - irq.c            declare smp_generic_interrupt() before it gets used
       - process.c        declare sys_fork() and sys_vfork() before they get used
       - tsc.c            rename tsc_khz shadowed variable
       - apic/probe_32.c  declare apic_default before it gets used
       - apic/nmi.c       prev_nmi_count should be unsigned
       - apic/io_apic.c   declare smp_irq_move_cleanup_interrupt() before it gets used
       - mm/init.c        declare direct_gbpages and free_initrd_mem before they get used
      Signed-off-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c1b284e
  8. 07 4月, 2009 1 次提交
    • M
      x86, ds: add leakage warning · 2311f0de
      Markus Metzger 提交于
      Add a warning in case a debug store context is not removed before
      the task it is attached to is freed.
      
      Remove the old warning at thread exit. It is too early.
      
      Declare the debug store context field in thread_struct unconditionally.
      
      Remove ds_copy_thread() and ds_exit_thread() and do the work directly
      in process*.c.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Cc: roland@redhat.com
      Cc: eranian@googlemail.com
      Cc: oleg@redhat.com
      Cc: juan.villacis@intel.com
      Cc: ak@linux.jf.intel.com
      LKML-Reference: <20090403144601.254472000@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2311f0de
  9. 18 3月, 2009 1 次提交
    • R
      cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash · 30e1e6d1
      Rusty Russell 提交于
      Impact: Fix cpu offline when CONFIG_MAXSMP=y
      
      Changeset bc9b83dd "cpumask: convert
      c1e_mask in arch/x86/kernel/process.c to cpumask_var_t" contained a
      bug: c1e_mask is manipulated even if C1E isn't detected (and hence
      not allocated).
      
      This is simply fixed by checking for NULL (which gcc optimizes out
      anyway of CONFIG_CPUMASK_OFFSTACK=n, since it knows ce1_mask can never
      be NULL).
      
      In addition, fix a leak where select_idle_routine re-allocates
      (and re-clears) c1e_mask on every cpu init.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Mike Travis <travis@sgi.com>
      LKML-Reference: <200903171450.34549.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      30e1e6d1
  10. 16 3月, 2009 1 次提交
  11. 13 3月, 2009 2 次提交
  12. 02 3月, 2009 1 次提交
  13. 13 2月, 2009 1 次提交
  14. 09 2月, 2009 2 次提交
  15. 29 1月, 2009 1 次提交
    • I
      x86: replace CONFIG_X86_SMP with CONFIG_SMP · 3e5095d1
      Ingo Molnar 提交于
      The x86/Voyager subarch used to have this distinction between
       'x86 SMP support' and 'Voyager SMP support':
      
       config X86_SMP
      	bool
      	depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
      
      This is a pointless distinction - Voyager can (and already does) use
      smp_ops to implement various SMP quirks it has - and it can be extended
      more to cover all the specialities of Voyager.
      
      So remove this complication in the Kconfig space.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e5095d1
  16. 19 12月, 2008 1 次提交
  17. 17 12月, 2008 1 次提交
    • V
      x86: support always running TSC on Intel CPUs · 40fb1715
      Venki Pallipadi 提交于
      Impact: reward non-stop TSCs with good TSC-based clocksources, etc.
      
      Add support for CPUID_0x80000007_Bit8 on Intel CPUs as well. This bit means
      that the TSC is invariant with C/P/T states and always runs at constant
      frequency.
      
      With Intel CPUs, we have 3 classes
      * CPUs where TSC runs at constant rate and does not stop n C-states
      * CPUs where TSC runs at constant rate, but will stop in deep C-states
      * CPUs where TSC rate will vary based on P/T-states and TSC will stop in deep
        C-states.
      
      To cover these 3, one feature bit (CONSTANT_TSC) is not enough. So, add a
      second bit (NONSTOP_TSC). CONSTANT_TSC indicates that the TSC runs at
      constant frequency irrespective of P/T-states, and NONSTOP_TSC indicates
      that TSC does not stop in deep C-states.
      
      CPUID_0x8000000_Bit8 indicates both these feature bit can be set.
      We still have CONSTANT_TSC _set_ and NONSTOP_TSC _not_set_ on some older Intel
      CPUs, based on model checks. We can use TSC on such CPUs for time, as long as
      those CPUs do not support/enter deep C-states.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      40fb1715
  18. 26 11月, 2008 1 次提交
    • A
      tracing: add "power-tracer": C/P state tracer to help power optimization · f3f47a67
      Arjan van de Ven 提交于
      Impact: new "power-tracer" ftrace plugin
      
      This patch adds a C/P-state ftrace plugin that will generate
      detailed statistics about the C/P-states that are being used,
      so that we can look at detailed decisions that the C/P-state
      code is making, rather than the too high level "average"
      that we have today.
      
      An example way of using this is:
      
       mount -t debugfs none /sys/kernel/debug
       echo cstate > /sys/kernel/debug/tracing/current_tracer
       echo 1 > /sys/kernel/debug/tracing/tracing_enabled
       sleep 1
       echo 0 > /sys/kernel/debug/tracing/tracing_enabled
       cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3f47a67
  19. 11 11月, 2008 1 次提交
    • I
      x86: call machine_shutdown and stop all CPUs in native_machine_halt · d3ec5cae
      Ivan Vecera 提交于
      Impact: really halt all CPUs on halt
      
      Function machine_halt (resp. native_machine_halt) is empty for x86
      architectures. When command 'halt -f' is invoked, the message "System
      halted." is displayed but this is not really true because all CPUs are
      still running.
      
      There are also similar inconsistencies for other arches (some uses
      power-off for halt or forever-loop with IRQs enabled/disabled).
      
      IMO there should be used the same approach for all architectures OR
      what does the message "System halted" really mean?
      
      This patch fixes it for x86.
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3ec5cae
  20. 23 9月, 2008 3 次提交
  21. 08 9月, 2008 1 次提交
  22. 28 8月, 2008 1 次提交
    • J
      x86: make poll_idle behave more like the other idle methods · 2c7e9fd4
      Joe Korty 提交于
      Make poll_idle() behave more like the other idle methods.
      
      Currently, poll_idle() returns immediately.  The other
      idle methods all wait indefinately for some condition
      to come true before returning.  poll_idle should emulate
      these other methods and also wait for a return condition,
      in this case, for need_resched() to become 'true'.
      
      Without this delay the idle loop spends all of its time
      in the outer loop that calls poll_idle.  This outer loop,
      these days, does real work, some of it under rcu locks.
      That work should only be done when idle is entered and
      when idle exits, not continuously while idle is spinning.
      Signed-off-by: NJoe Korty <joe.korty@ccur.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c7e9fd4
  23. 19 7月, 2008 2 次提交
  24. 17 7月, 2008 2 次提交
  25. 09 7月, 2008 1 次提交
  26. 08 7月, 2008 2 次提交
    • T
      x86: add C1E aware idle function, fix · 0beefa20
      Thomas Gleixner 提交于
      On Tue, 17 Jun 2008, Rafael J. Wysocki wrote:
      >
      > BTW, with the C1E patches reverted I don't get the
      > WARNING: at /home/rafael/src/linux-next/kernel/smp.c:215 smp_call_function_single+0x3d/0xa2
      > in the log.  Thomas?
      
      The BROADCAST_FORCE notification uses smp_function_call and therefor
      must be run with interrupts enabled.
      
      While at it, add a comment for the BROADCAST_EXIT notifier as well.
      Reported-and-bisected-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0beefa20
    • T
      x86, clockevents: add C1E aware idle function · aa276e1c
      Thomas Gleixner 提交于
      C1E on AMD machines is like C3 but without control from the OS. Up to
      now we disabled the local apic timer for those machines as it stops
      when the CPU goes into C1E. This excludes those machines from high
      resolution timers / dynamic ticks, which hurts especially X2 based
      laptops.
      
      The current boot time C1E detection has another, more serious flaw
      as well: some BIOSes do not enable C1E until the ACPI processor module
      is loaded. This causes systems to stop working after that point.
      
      To work nicely with C1E enabled machines we use a separate idle
      function, which checks on idle entry whether C1E was enabled in the
      Interrupt Pending Message MSR. This allows us to do timer broadcasting
      for C1E and covers the late enablement of C1E as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa276e1c
  27. 27 6月, 2008 1 次提交
  28. 10 6月, 2008 3 次提交
  29. 18 5月, 2008 2 次提交
    • T
      x86: disable mwait for AMD family 10H/11H CPUs · e9623b35
      Thomas Gleixner 提交于
      The previous revert of 0c07ee38 left
      out the mwait disable condition for AMD family 10H/11H CPUs.
      
      Andreas Herrman said:
      
      It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
      Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
      depend on a clock divisor and current Pstate of the core.
      
      If all cores of a processor are in halt state (C1) the processor can
      enter the C1E (C1 enhanced) state. If mwait is used this will never
      happen.
      
      Thus HLT saves more power than MWAIT here.
      
      It might be best to switch off the mwait flag for these AMD CPU
      families like it was introduced with commit
      f039b754 (x86: Don't use MWAIT on AMD
      Family 10)
      
      Re-add the AMD families 10H/11H check and disable the mwait usage for
      those.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e9623b35
    • I
      x86: remove mwait capability C-state check · a738d897
      Ingo Molnar 提交于
      Vegard Nossum reports:
      
      | powertop shows between 200-400 wakeups/second with the description
      | "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
      | I need to run two busy-loops on my 2-CPU system for this to show up).
      |
      | The bisect resulted in this commit:
      |
      | commit 0c07ee38
      | Date:   Wed Jan 30 13:33:16 2008 +0100
      |
      |     x86: use the correct cpuid method to detect MWAIT support for C states
      
      remove the functional effects of this patch and make mwait unconditional.
      
      A future patch will turn off mwait on specific CPUs where that causes
      power to be wasted.
      Bisected-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a738d897
  30. 27 4月, 2008 1 次提交
    • P
      fix idle (arch, acpi and apm) and lockdep · 7f424a8b
      Peter Zijlstra 提交于
      OK, so 25-mm1 gave a lockdep error which made me look into this.
      
      The first thing that I noticed was the horrible mess; the second thing I
      saw was hacks like: 71e93d15
      
      The problem is that arch idle routines are somewhat inconsitent with
      their IRQ state handling and instead of fixing _that_, we go paper over
      the problem.
      
      So the thing I've tried to do is set a standard for idle routines and
      fix them all up to adhere to that. So the rules are:
      
        idle routines are entered with IRQs disabled
        idle routines will exit with IRQs enabled
      
      Nearly all already did this in one form or another.
      
      Merge the 32 and 64 bit bits so they no longer have different bugs.
      
      As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
      irq-enable.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f424a8b