1. 03 1月, 2013 1 次提交
  2. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  3. 15 11月, 2012 4 次提交
  4. 09 10月, 2012 1 次提交
    • S
      ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug · cf31cd1a
      Srivatsa S. Bhat 提交于
      On a KVM guest, when a CPU is taken offline and brought back online, we hit
      the following NULL pointer dereference:
      
      [   45.400843] Unregister pv shared memory for cpu 1
      [   45.412331] smpboot: CPU 1 is now offline
      [   45.529894] SMP alternatives: lockdep: fixing up alternatives
      [   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
      [   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
      [   45.571370] KVM setup async PF for cpu 1
      [   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
      [   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
      [   45.576017] Oops: 0000 [#1] SMP
      [   45.576017] Modules linked in:
      [   45.576017] CPU 0
      [   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
      [   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
      [   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
      [   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
      [   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
      [   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
      [   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
      [   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
      [   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
      [   45.576017] Stack:
      [   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
      [   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
      [   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
      [   45.576017] Call Trace:
      [   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
      [   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
      [   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
      [   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
      [   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
      [   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
      [   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
      [   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
      [   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
      [   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
      [   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
      [   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
      [   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
      [   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
      [   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
      [   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017]  RSP <ffff880005d93ce8>
      [   45.576017] CR2: 0000000000000000
      [   45.656079] ---[ end trace 433d6c9ac0b02cef ]---
      
      Analysis:
      Commit 3d339dcb (cpuidle / ACPI : move cpuidle_device field out of the
      acpi_processor_power structure()) made the allocation of the dev structure
      (struct cpuidle) of a CPU dynamic, whereas previously it was statically
      allocated. And this dynamic allocation occurs in acpi_processor_power_init()
      if pr->flags.power evaluates to non-zero.
      
      On KVM guests, pr->flags.power evaluates to zero, hence dev is never
      allocated. This causes the NULL pointer (dev) dereference in
      cpuidle_disable_device() during a subsequent CPU online operation. Fix this
      by ensuring that dev is non-NULL before dereferencing.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      cf31cd1a
  5. 11 7月, 2012 1 次提交
    • P
      PM / cpuidle: System resume hang fix with cpuidle · 8651f97b
      Preeti U Murthy 提交于
      On certain bios, resume hangs if cpus are allowed to enter idle states
      during suspend [1].
      
      This was fixed in apci idle driver [2].But intel_idle driver does not
      have this fix. Thus instead of replicating the fix in both the idle
      drivers, or in more platform specific idle drivers if needed, the
      more general cpuidle infrastructure could handle this.
      
      A suspend callback in cpuidle_driver could handle this fix. But
      a cpuidle_driver provides only basic functionalities like platform idle
      state detection capability and mechanisms to support entry and exit
      into CPU idle states. All other cpuidle functions are found in the
      cpuidle generic infrastructure for good reason that all cpuidle
      drivers, irrepective of their platforms will support these functions.
      
      One option therefore would be to register a suspend callback in cpuidle
      which handles this fix. This could be called through a PM_SUSPEND_PREPARE
      notifier. But this is too generic a notfier for a driver to handle.
      
      Also, ideally the job of cpuidle is not to handle side effects of suspend.
      It should expose the interfaces which "handle cpuidle 'during' suspend"
      or any other operation, which the subsystems call during that respective
      operation.
      
      The fix demands that during suspend, no cpus should be allowed to enter
      deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
      ensures that. Not just that it also kicks all the cpus which are already
      in idle out of their idle states which was being done during cpu hotplug
      through a CPU_DYING_FROZEN callbacks.
      
      Now the question arises about when during suspend should
      cpuidle_uninstall_idle_handler() be called. Since we are dealing with
      drivers it seems best to call this function during dpm_suspend().
      Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
      before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
      operations. In dpm_suspend_noirq(), it would be wise to place this call
      before suspend_device_irqs() to avoid ugly interactions with the same.
      
      Ananlogously, during resume.
      
      References:
      [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
      [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2Reported-and-tested-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      8651f97b
  6. 04 7月, 2012 2 次提交
    • R
      PM / Domains: Add preliminary support for cpuidle, v2 · cbc9ef02
      Rafael J. Wysocki 提交于
      On some systems there are CPU cores located in the same power
      domains as I/O devices.  Then, power can only be removed from the
      domain if all I/O devices in it are not in use and the CPU core
      is idle.  Add preliminary support for that to the generic PM domains
      framework.
      
      First, the platform is expected to provide a cpuidle driver with one
      extra state designated for use with the generic PM domains code.
      This state should be initially disabled and its exit_latency value
      should be set to whatever time is needed to bring up the CPU core
      itself after restoring power to it, not including the domain's
      power on latency.  Its .enter() callback should point to a procedure
      that will remove power from the domain containing the CPU core at
      the end of the CPU power transition.
      
      The remaining characteristics of the extra cpuidle state, referred to
      as the "domain" cpuidle state below, (e.g. power usage, target
      residency) should be populated in accordance with the properties of
      the hardware.
      
      Next, the platform should execute genpd_attach_cpuidle() on the PM
      domain containing the CPU core.  That will cause the generic PM
      domains framework to treat that domain in a special way such that:
      
       * When all devices in the domain have been suspended and it is about
         to be turned off, the states of the devices will be saved, but
         power will not be removed from the domain.  Instead, the "domain"
         cpuidle state will be enabled so that power can be removed from
         the domain when the CPU core is idle and the state has been chosen
         as the target by the cpuidle governor.
      
       * When the first I/O device in the domain is resumed and
         __pm_genpd_poweron(() is called for the first time after
         power has been removed from the domain, the "domain" cpuidle
         state will be disabled to avoid subsequent surprise power removals
         via cpuidle.
      
      The effective exit_latency value of the "domain" cpuidle state
      depends on the time needed to bring up the CPU core itself after
      restoring power to it as well as on the power on latency of the
      domain containing the CPU core.  Thus the "domain" cpuidle state's
      exit_latency has to be recomputed every time the domain's power on
      latency is updated, which may happen every time power is restored
      to the domain, if the measured power on latency is greater than
      the latency stored in the corresponding generic_pm_domain structure.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cbc9ef02
    • S
      cpuidle: move field disable from per-driver to per-cpu · dc7fd275
      ShuoX Liu 提交于
      Andrew J.Schorr raises a question.  When he changes the disable setting on
      a single CPU, it affects all the other CPUs.  Basically, currently, the
      disable field is per-driver instead of per-cpu.  All the C states of the
      same driver are shared by all CPU in the same machine.
      
      The patch changes the `disable' field to per-cpu, so we could set this
      separately for each cpu.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Reported-by: NAndrew J.Schorr <aschorr@telemetry-investments.com>
      Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      dc7fd275
  7. 02 6月, 2012 5 次提交
    • C
      cpuidle: add support for states that affect multiple cpus · 4126c019
      Colin Cross 提交于
      On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
      cpus cannot be independently powered down, either due to
      sequencing restrictions (on Tegra 2, cpu 0 must be the last to
      power down), or due to HW bugs (on OMAP4460, a cpu powering up
      will corrupt the gic state unless the other cpu runs a work
      around).  Each cpu has a power state that it can enter without
      coordinating with the other cpu (usually Wait For Interrupt, or
      WFI), and one or more "coupled" power states that affect blocks
      shared between the cpus (L2 cache, interrupt controller, and
      sometimes the whole SoC).  Entering a coupled power state must
      be tightly controlled on both cpus.
      
      The easiest solution to implementing coupled cpu power states is
      to hotplug all but one cpu whenever possible, usually using a
      cpufreq governor that looks at cpu load to determine when to
      enable the secondary cpus.  This causes problems, as hotplug is an
      expensive operation, so the number of hotplug transitions must be
      minimized, leading to very slow response to loads, often on the
      order of seconds.
      
      This file implements an alternative solution, where each cpu will
      wait in the WFI state until all cpus are ready to enter a coupled
      state, at which point the coupled state function will be called
      on all cpus at approximately the same time.
      
      Once all cpus are ready to enter idle, they are woken by an smp
      cross call.  At this point, there is a chance that one of the
      cpus will find work to do, and choose not to enter idle.  A
      final pass is needed to guarantee that all cpus will call the
      power state enter function at the same time.  During this pass,
      each cpu will increment the ready counter, and continue once the
      ready counter matches the number of online coupled cpus.  If any
      cpu exits idle, the other cpus will decrement their counter and
      retry.
      
      To use coupled cpuidle states, a cpuidle driver must:
      
         Set struct cpuidle_device.coupled_cpus to the mask of all
         coupled cpus, usually the same as cpu_possible_mask if all cpus
         are part of the same cluster.  The coupled_cpus mask must be
         set in the struct cpuidle_device for each cpu.
      
         Set struct cpuidle_device.safe_state to a state that is not a
         coupled state.  This is usually WFI.
      
         Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
         state that affects multiple cpus.
      
         Provide a struct cpuidle_state.enter function for each state
         that affects multiple cpus.  This function is guaranteed to be
         called on all cpus at approximately the same time.  The driver
         should ensure that the cpus all abort together if any cpu tries
         to abort once the function is called.
      
      update1:
      
      cpuidle: coupled: fix count of online cpus
      
      online_count was never incremented on boot, and was also counting
      cpus that were not part of the coupled set.  Fix both issues by
      introducting a new function that counts online coupled cpus, and
      call it from register as well as the hotplug notifier.
      
      update2:
      
      cpuidle: coupled: fix decrementing ready count
      
      cpuidle_coupled_set_not_ready sometimes refuses to decrement the
      ready count in order to prevent a race condition.  This makes it
      unsuitable for use when finished with idle.  Add a new function
      cpuidle_coupled_set_done that decrements both the ready count and
      waiting count, and call it after idle is complete.
      
      Cc: Amit Kucheria <amit.kucheria@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Trinabh Gupta <g.trinabh@gmail.com>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      4126c019
    • C
      cpuidle: fix error handling in __cpuidle_register_device · 3af272ab
      Colin Cross 提交于
      Fix the error handling in __cpuidle_register_device to include
      the missing list_del.  Move it to a label, which will simplify
      the error handling when coupled states are added.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3af272ab
    • C
      cpuidle: refactor out cpuidle_enter_state · 56cfbf74
      Colin Cross 提交于
      Split the code to enter a state and update the stats into a helper
      function, cpuidle_enter_state, and export it.  This function will
      be called by the coupled state code to handle entering the safe
      state and the final coupled state.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      56cfbf74
    • S
      cpuidle: add checks to avoid NULL pointer dereference · 1b0a0e9a
      Srivatsa S. Bhat 提交于
      The existing check for dev == NULL in __cpuidle_register_device() is
      rendered useless because dev is dereferenced before the check itself.
      Moreover, correctly speaking, it is the job of the callers of this
      function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
      also happen to be exported functions) to ensure that
      __cpuidle_register_device() is called with a non-NULL dev.
      
      So add the necessary dev == NULL checks in the two callers and remove the
      (useless) check from __cpuidle_register_device().
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      1b0a0e9a
    • S
      cpuidle: remove unused hrtimer_peek_ahead_timers() call · 0aeb9cac
      Sergey Senozhatsky 提交于
        commit 9a655837
        Author: Arjan van de Ven <arjan@linux.intel.com>
        Date:   Sun Nov 9 12:45:10 2008 -0800
      
           regression: disable timer peek-ahead for 2.6.28
      
           It's showing up as regressions; disabling it very likely just papers
           over an underlying issue, but time is running out for 2.6.28, lets get
           back to this for 2.6.29
      
       Many years has passed since 2008, so it seems ok to remove whole `#if 0' block.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Trinabh Gupta <g.trinabh@gmail.com>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      0aeb9cac
  8. 08 5月, 2012 1 次提交
  9. 07 4月, 2012 1 次提交
  10. 30 3月, 2012 3 次提交
  11. 21 3月, 2012 1 次提交
  12. 13 2月, 2012 1 次提交
  13. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  14. 07 11月, 2011 4 次提交
  15. 01 11月, 2011 1 次提交
  16. 25 8月, 2011 1 次提交
  17. 04 8月, 2011 3 次提交
  18. 13 1月, 2011 4 次提交
    • T
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e
      Thomas Renninger 提交于
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
      
      Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
      events -> this patch fixes it and makes cpu_idle events throwing less complex.
      
      It also introduces cpu_idle events for all architectures which use
      the cpuidle subsystem, namely:
        - arch/arm/mach-at91/cpuidle.c
        - arch/arm/mach-davinci/cpuidle.c
        - arch/arm/mach-kirkwood/cpuidle.c
        - arch/arm/mach-omap2/cpuidle34xx.c
        - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
        - arch/x86/kernel/process.c (did throw events before, but was a mess)
        - drivers/idle/intel_idle.c (did throw events before)
      
      Convention should be:
      Fire cpu_idle events inside the current pm_idle function (not somewhere
      down the the callee tree) to keep things easy.
      
      Current possible pm_idle functions in X86:
      c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
      -> this is really easy is now.
      
      This affects userspace:
      The type field of the cpu_idle power event can now direclty get
      mapped to:
      /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
      instead of throwing very CPU/mwait specific values.
      This change is not visible for the intel_idle driver.
      For the acpi_idle driver it should only be visible if the vendor
      misses out C-states in his BIOS.
      Another (perf timechart) patch reads out cpuidle info of cpu_idle
      events from:
      /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
      to the correct C-/cpuidle state again, even if e.g. vendors miss
      out C-states in their BIOS and for example only export C1 and C3.
      -> everything is fine.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      CC: Robert Schoene <robert.schoene@tu-dresden.de>
      CC: Jean Pihet <j-pihet@ti.com>
      CC: Arjan van de Ven <arjan@linux.intel.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: linux-pm@lists.linux-foundation.org
      CC: linux-acpi@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: linux-perf-users@vger.kernel.org
      CC: linux-omap@vger.kernel.org
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f77cfe4e
    • L
      cpuidle: delete NOP CPUIDLE_FLAG_POLL · d247632c
      Len Brown 提交于
      it serves no purpose
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d247632c
    • T
      cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL · 720f1c30
      Thomas Renninger 提交于
      C0 means and is well know as "not idle".
      All documentation out there uses this term as "running"/"not idle"
      state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat)
      show C0 residency which there is correct, but means something totally
      else than cpuidle "POLL" state.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      720f1c30
    • R
      cpuidle: Make cpuidle_enable_device() call poll_idle_init() · d8c216cf
      Rafael J. Wysocki 提交于
      The following scenario is possible with the current cpuidle code and
      the ACPI cpuidle driver:
      (1) acpi_processor_cst_has_changed() is called,
      (2) cpuidle_disable_device() is called,
      (3) cpuidle_remove_state_sysfs() is called to remove the (presumably
          outdated) states info from sysfs,
      (3) acpi_processor_get_power_info() is called, the first entry in the
          pr->power.states[] table is filled with zeros,
      (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the
          first entry in pr->power.states[],
      (5) cpuidle_enable_device() is called,
      (6) __cpuidle_register_device() is _not_ called, since the device has
          already been registered,
      (7) Consequently, poll_idle_init() is _not_ called either,
      (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes
          for the new states and it uses the bogus first table entry from
          acpi_processor_get_power_info() for creating state0.
      
      This problem is avoided if cpuidle_enable_device()
      unconditionally calls poll_idle_init().
      Reported-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      cc: stable@kernel.org
      d8c216cf
  19. 04 1月, 2011 1 次提交
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  20. 17 12月, 2010 1 次提交
    • C
      drivers: Replace __get_cpu_var with __this_cpu_read if not used for an address. · 4a6f4fe8
      Christoph Lameter 提交于
      __get_cpu_var() can be replaced with this_cpu_read and will then use a single
      read instruction with implied address calculation to access the correct per cpu
      instance.
      
      However, the address of a per cpu variable passed to __this_cpu_read() cannot be
      determed (since its an implied address conversion through segment prefixes).
      Therefore apply this only to uses of __get_cpu_var where the addres of the
      variable is not used.
      
      V3->V4:
      	- Move one instance of this_cpu_inc_return to a later patch
      	  so that this one can go in without percpu infrastructrure
      	  changes.
      
      Sedat: fixed compile failure caused by an extra ')'.
      
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4a6f4fe8
  21. 10 8月, 2010 1 次提交
    • A
      cpuidle: extend cpuidle and menu governor to handle dynamic states · 71abbbf8
      Ai Li 提交于
      On some SoC chips, HW resources may be in use during any particular idle
      period.  As a consequence, the cpuidle states that the SoC is safe to
      enter can change from idle period to idle period.  In addition, the
      latency and threshold of each cpuidle state can vary, depending on the
      operating condition when the CPU becomes idle, e.g.  the current cpu
      frequency, the current state of the HW blocks, etc.
      
      cpuidle core and the menu governor, in the current form, are geared
      towards cpuidle states that are static, i.e.  the availabiltiy of the
      states, their latencies, their thresholds are non-changing during run
      time.  cpuidle does not provide any hook that cpuidle drivers can use to
      adjust those values on the fly for the current idle period before the menu
      governor selects the target cpuidle state.
      
      This patch extends cpuidle core and the menu governor to handle states
      that are dynamic.  There are three additions in the patch and the patch
      maintains backwards-compatibility with existing cpuidle drivers.
      
      1) add prepare() to struct cpuidle_device.  A cpuidle driver can hook
         into the callback and cpuidle will call prepare() before calling the
         governor's select function.  The callback gives the cpuidle driver a
         chance to update the dynamic information of the cpuidle states for the
         current idle period, e.g.  state availability, latencies, thresholds,
         power values, etc.
      
      2) add CPUIDLE_FLAG_IGNORE as one of the state flags.  In the prepare()
         function, a cpuidle driver can set/clear the flag to indicate to the
         menu governor whether a cpuidle state should be ignored, i.e.  not
         available, during the current idle period.
      
      3) add power_specified bit to struct cpuidle_device.  The menu governor
         currently assumes that the cpuidle states are arranged in the order of
         increasing latency, threshold, and power savings.  This is true or can
         be made true for static states.  Once the state parameters are dynamic,
         the latencies, thresholds, and power savings for the cpuidle states can
         increase or decrease by different amounts from idle period to idle
         period.  So the assumption of increasing latency, threshold, and power
         savings from Cn to C(n+1) can no longer be guaranteed.
      
      It can be straightforward to calculate the power consumption of each
      available state and to specify it in power_usage for the idle period.
      Using the power_usage fields, the menu governor then selects the state
      that has the lowest power consumption and that still satisfies all other
      critieria.  The power_specified bit defaults to 0.  For existing cpuidle
      drivers, cpuidle detects that power_specified is 0 and fills in a dummy
      set of power_usage values.
      Signed-off-by: NAi Li <aili@codeaurora.org>
      Cc: Len Brown <len.brown@intel.com>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71abbbf8
  22. 04 8月, 2010 1 次提交
    • T
      [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent · 6f4f2723
      Thomas Renninger 提交于
      and fix the broken case if a core's frequency depends on others.
      
      trace_power_frequency was only implemented in a rather ungeneric way
      in acpi-cpufreq driver's target() function only.
      -> Move the call to trace_power_frequency to
         cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
         notifier is triggered.
         This will support power frequency tracing by all cpufreq drivers
      
      trace_power_frequency did not trace frequency changes correctly when
      the userspace governor was used or when CPU cores' frequency depend
      on each other.
      -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
         which gets switched automatically fixes this.
      
      Robert Schoene provided some important fixes on top of my initial
      quick shot version which are integrated in this patch:
      - Forgot some changes in power_end trace (TP_printk/variable names)
      - Variable dummy in power_end must now be cpu_id
      - Use static 64 bit variable instead of unsigned int for cpu_id
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      CC: davej@redhat.com
      CC: arjan@infradead.org
      CC: linux-kernel@vger.kernel.org
      CC: robert.schoene@tu-dresden.de
      Tested-by: robert.schoene@tu-dresden.de
      Signed-off-by: NDave Jones <davej@redhat.com>
      6f4f2723