1. 09 7月, 2014 1 次提交
  2. 07 5月, 2014 1 次提交
  3. 01 5月, 2014 1 次提交
  4. 12 3月, 2014 1 次提交
    • P
      cpuidle: delay enabling interrupts until all coupled CPUs leave idle · 0b89e9aa
      Paul Burton 提交于
      As described by a comment at the end of cpuidle_enter_state_coupled it
      can be inefficient for coupled idle states to return with IRQs enabled
      since they may proceed to service an interrupt instead of clearing the
      coupled idle state. Until they have finished & cleared the idle state
      all CPUs coupled with them will spin rather than being able to enter a
      safe idle state.
      
      Commits e1689795 "cpuidle: Add common time keeping and irq
      enabling" and 554c06ba "cpuidle: remove en_core_tk_irqen flag" led
      to the cpuidle_enter_state enabling interrupts for all idle states,
      including coupled ones, making this inefficiency unavoidable by drivers
      & the local_irq_enable near the end of cpuidle_enter_state_coupled
      redundant. This patch avoids enabling interrupts in cpuidle_enter_state
      after a coupled state has been entered, allowing them to remain disabled
      until all coupled CPUs have exited the idle state and
      cpuidle_enter_state_coupled re-enables them.
      
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0b89e9aa
  5. 11 3月, 2014 2 次提交
  6. 07 2月, 2014 1 次提交
    • P
      cpuidle: Handle clockevents_notify(BROADCAST_ENTER) failure · ba8f20c2
      Preeti U Murthy 提交于
      Some archs set the CPUIDLE_FLAG_TIMER_STOP flag for idle states in which the
      local timers stop. The cpuidle_idle_call() currently handles such idle states
      by calling into the broadcast framework so as to wakeup CPUs at their next
      wakeup event. With the hrtimer mode of broadcast, the BROADCAST_ENTER call
      into the broadcast frameowork can fail for archs that do not have an external
      clock device to handle wakeups and the CPU in question has thus to be made
      the stand by CPU. This patch handles such cases by failing the call into
      cpuidle so that the arch can take some default action. The arch will certainly
      not enter a similar idle state because a failed cpuidle call will also implicitly
      indicate that the broadcast framework has not registered this CPU to be woken up.
      Hence we are safe if we fail the cpuidle call.
      
      In the process move the functions that trace idle statistics just before and
      after the entry and exit into idle states respectively. In other
      scenarios where the call to cpuidle fails, we end up not tracing idle
      entry and exit since a decision on an idle state could not be taken. Similarly
      when the call to broadcast framework fails, we skip tracing idle statistics
      because we are in no further position to take a decision on an alternative
      idle state to enter into.
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: deepthi@linux.vnet.ibm.com
      Cc: paulmck@linux.vnet.ibm.com
      Cc: fweisbec@gmail.com
      Cc: paulus@samba.org
      Cc: srivatsa.bhat@linux.vnet.ibm.com
      Cc: svaidy@linux.vnet.ibm.com
      Cc: peterz@infradead.org
      Cc: benh@kernel.crashing.org
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20140207080652.17187.66344.stgit@preeti.in.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ba8f20c2
  7. 04 12月, 2013 1 次提交
    • K
      cpuidle: Check for dev before deregistering it. · 813e8e3d
      Konrad Rzeszutek Wilk 提交于
      If not, we could end up in the unfortunate situation where
      we dereference a NULL pointer b/c we have cpuidle disabled.
      
      This is the case when booting under Xen (which uses the
      ACPI P/C states but disables the CPU idle driver) - and can
      be easily reproduced when booting with cpuidle.off=1.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8156db4a>] cpuidle_unregister_device+0x2a/0x90
      .. snip..
      Call Trace:
       [<ffffffff813b15b4>] acpi_processor_power_exit+0x3c/0x5c
       [<ffffffff813af0a9>] acpi_processor_stop+0x61/0xb6
       [<ffffffff814215bf>] __device_release_driver+0fffff81421653>] device_release_driver+0x23/0x30
       [<ffffffff81420ed8>] bus_remove_device+0x108/0x180
       [<ffffffff8141d9d9>] device_del+0x129/0x1c0
       [<ffffffff813cb4b0>] ? unregister_xenbus_watch+0x1f0/0x1f0
       [<ffffffff8141da8e>] device_unregister+0x1e/0x60
       [<ffffffff814243e9>] unregister_cpu+0x39/0x60
       [<ffffffff81019e03>] arch_unregister_cpu+0x23/0x30
       [<ffffffff813c3c51>] handle_vcpu_hotplug_event+0xc1/0xe0
       [<ffffffff813cb4f5>] xenwatch_thread+0x45/0x120
       [<ffffffff810af010>] ? abort_exclusive_wait+0xb0/0xb0
       [<ffffffff8108ec42>] kthread+0xd2/0xf0
       [<ffffffff8108eb70>] ? kthread_create_on_node+0x180/0x180
       [<ffffffff816ce17c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8108eb70>] ? kthread_create_on_node+0x180/0x180
      
      This problem also appears in 3.12 and could be a candidate for backport.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      813e8e3d
  8. 30 10月, 2013 7 次提交
  9. 15 7月, 2013 4 次提交
  10. 11 6月, 2013 1 次提交
    • D
      cpuidle: simplify multiple driver support · 82467a5a
      Daniel Lezcano 提交于
      Commit bf4d1b5d (cpuidle: support multiple drivers) introduced support
      for using multiple cpuidle drivers at the same time.  It added a
      couple of new APIs to register the driver per CPU, but that led to
      some unnecessary code complexity related to the kernel config options
      deciding whether or not the multiple driver support is enabled.  The
      code has to work as it did before when the multiple driver support is
      not enabled and the multiple driver support has to be compatible with
      the previously existing API.
      
      Remove the new API, not used by any driver in the tree yet (but
      needed for the HMP cpuidle drivers that will be submitted soon), and
      add a new cpumask pointer to the cpuidle driver structure that will
      point to the mask of CPUs handled by the given driver.  That will
      allow the cpuidle_[un]register_driver() API to be used for the
      multiple driver support along with the cpuidle_[un]register()
      functions added recently.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      82467a5a
  11. 24 4月, 2013 1 次提交
  12. 23 4月, 2013 2 次提交
    • D
      cpuidle: make a single register function for all · 4c637b21
      Daniel Lezcano 提交于
      The usual scheme to initialize a cpuidle driver on a SMP is:
      
      	cpuidle_register_driver(drv);
      	for_each_possible_cpu(cpu) {
      		device = &per_cpu(cpuidle_dev, cpu);
      		cpuidle_register_device(device);
      	}
      
      This code is duplicated in each cpuidle driver.
      
      On UP systems, it is done this way:
      
      	cpuidle_register_driver(drv);
      	device = &per_cpu(cpuidle_dev, cpu);
      	cpuidle_register_device(device);
      
      On UP, the macro 'for_each_cpu' does one iteration:
      
      #define for_each_cpu(cpu, mask)                 \
              for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
      
      Hence, the initialization loop is the same for UP than SMP.
      
      Beside, we saw different bugs / mis-initialization / return code unchecked in
      the different drivers, the code is duplicated including bugs. After fixing all
      these ones, it appears the initialization pattern is the same for everyone.
      
      Please note, some drivers are doing dev->state_count = drv->state_count. This is
      not necessary because it is done by the cpuidle_enable_device function in the
      cpuidle framework. This is true, until you have the same states for all your
      devices. Otherwise, the 'low level' API should be used instead with the specific
      initialization for the driver.
      
      Let's add a wrapper function doing this initialization with a cpumask parameter
      for the coupled idle states and use it for all the drivers.
      
      That will save a lot of LOC, consolidate the code, and the modifications in the
      future could be done in a single place. Another benefit is the consolidation of
      the cpuidle_device variable which is now in the cpuidle framework and no longer
      spread accross the different arch specific drivers.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4c637b21
    • D
      cpuidle: remove en_core_tk_irqen flag · 554c06ba
      Daniel Lezcano 提交于
      The en_core_tk_irqen flag is set in all the cpuidle driver which
      means it is not necessary to specify this flag.
      
      Remove the flag and the code related to it.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      554c06ba
  13. 01 4月, 2013 1 次提交
  14. 26 1月, 2013 1 次提交
    • P
      PM / tracing: remove deprecated power trace API · 43720bd6
      Paul Gortmaker 提交于
      The text in Documentation said it would be removed in 2.6.41;
      the text in the Kconfig said removal in the 3.1 release.  Either
      way you look at it, we are well past both, so push it off a cliff.
      
      Note that the POWER_CSTATE and the POWER_PSTATE are part of the
      legacy tracing API.  Remove all tracepoints which use these flags.
      As can be seen from context, most already have a trace entry via
      trace_cpu_idle anyways.
      
      Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
      compared to the CSTATE ones which all have a clear start/stop.
      As part of this, the trace_power_frequency also becomes orphaned,
      so it too is deleted.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      43720bd6
  15. 15 1月, 2013 1 次提交
  16. 03 1月, 2013 1 次提交
  17. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  18. 15 11月, 2012 4 次提交
  19. 09 10月, 2012 1 次提交
    • S
      ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug · cf31cd1a
      Srivatsa S. Bhat 提交于
      On a KVM guest, when a CPU is taken offline and brought back online, we hit
      the following NULL pointer dereference:
      
      [   45.400843] Unregister pv shared memory for cpu 1
      [   45.412331] smpboot: CPU 1 is now offline
      [   45.529894] SMP alternatives: lockdep: fixing up alternatives
      [   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
      [   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
      [   45.571370] KVM setup async PF for cpu 1
      [   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
      [   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
      [   45.576017] Oops: 0000 [#1] SMP
      [   45.576017] Modules linked in:
      [   45.576017] CPU 0
      [   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
      [   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
      [   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
      [   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
      [   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
      [   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
      [   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
      [   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
      [   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
      [   45.576017] Stack:
      [   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
      [   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
      [   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
      [   45.576017] Call Trace:
      [   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
      [   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
      [   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
      [   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
      [   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
      [   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
      [   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
      [   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
      [   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
      [   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
      [   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
      [   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
      [   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
      [   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
      [   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
      [   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017]  RSP <ffff880005d93ce8>
      [   45.576017] CR2: 0000000000000000
      [   45.656079] ---[ end trace 433d6c9ac0b02cef ]---
      
      Analysis:
      Commit 3d339dcb (cpuidle / ACPI : move cpuidle_device field out of the
      acpi_processor_power structure()) made the allocation of the dev structure
      (struct cpuidle) of a CPU dynamic, whereas previously it was statically
      allocated. And this dynamic allocation occurs in acpi_processor_power_init()
      if pr->flags.power evaluates to non-zero.
      
      On KVM guests, pr->flags.power evaluates to zero, hence dev is never
      allocated. This causes the NULL pointer (dev) dereference in
      cpuidle_disable_device() during a subsequent CPU online operation. Fix this
      by ensuring that dev is non-NULL before dereferencing.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      cf31cd1a
  20. 11 7月, 2012 1 次提交
    • P
      PM / cpuidle: System resume hang fix with cpuidle · 8651f97b
      Preeti U Murthy 提交于
      On certain bios, resume hangs if cpus are allowed to enter idle states
      during suspend [1].
      
      This was fixed in apci idle driver [2].But intel_idle driver does not
      have this fix. Thus instead of replicating the fix in both the idle
      drivers, or in more platform specific idle drivers if needed, the
      more general cpuidle infrastructure could handle this.
      
      A suspend callback in cpuidle_driver could handle this fix. But
      a cpuidle_driver provides only basic functionalities like platform idle
      state detection capability and mechanisms to support entry and exit
      into CPU idle states. All other cpuidle functions are found in the
      cpuidle generic infrastructure for good reason that all cpuidle
      drivers, irrepective of their platforms will support these functions.
      
      One option therefore would be to register a suspend callback in cpuidle
      which handles this fix. This could be called through a PM_SUSPEND_PREPARE
      notifier. But this is too generic a notfier for a driver to handle.
      
      Also, ideally the job of cpuidle is not to handle side effects of suspend.
      It should expose the interfaces which "handle cpuidle 'during' suspend"
      or any other operation, which the subsystems call during that respective
      operation.
      
      The fix demands that during suspend, no cpus should be allowed to enter
      deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
      ensures that. Not just that it also kicks all the cpus which are already
      in idle out of their idle states which was being done during cpu hotplug
      through a CPU_DYING_FROZEN callbacks.
      
      Now the question arises about when during suspend should
      cpuidle_uninstall_idle_handler() be called. Since we are dealing with
      drivers it seems best to call this function during dpm_suspend().
      Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
      before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
      operations. In dpm_suspend_noirq(), it would be wise to place this call
      before suspend_device_irqs() to avoid ugly interactions with the same.
      
      Ananlogously, during resume.
      
      References:
      [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
      [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2Reported-and-tested-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      8651f97b
  21. 04 7月, 2012 2 次提交
    • R
      PM / Domains: Add preliminary support for cpuidle, v2 · cbc9ef02
      Rafael J. Wysocki 提交于
      On some systems there are CPU cores located in the same power
      domains as I/O devices.  Then, power can only be removed from the
      domain if all I/O devices in it are not in use and the CPU core
      is idle.  Add preliminary support for that to the generic PM domains
      framework.
      
      First, the platform is expected to provide a cpuidle driver with one
      extra state designated for use with the generic PM domains code.
      This state should be initially disabled and its exit_latency value
      should be set to whatever time is needed to bring up the CPU core
      itself after restoring power to it, not including the domain's
      power on latency.  Its .enter() callback should point to a procedure
      that will remove power from the domain containing the CPU core at
      the end of the CPU power transition.
      
      The remaining characteristics of the extra cpuidle state, referred to
      as the "domain" cpuidle state below, (e.g. power usage, target
      residency) should be populated in accordance with the properties of
      the hardware.
      
      Next, the platform should execute genpd_attach_cpuidle() on the PM
      domain containing the CPU core.  That will cause the generic PM
      domains framework to treat that domain in a special way such that:
      
       * When all devices in the domain have been suspended and it is about
         to be turned off, the states of the devices will be saved, but
         power will not be removed from the domain.  Instead, the "domain"
         cpuidle state will be enabled so that power can be removed from
         the domain when the CPU core is idle and the state has been chosen
         as the target by the cpuidle governor.
      
       * When the first I/O device in the domain is resumed and
         __pm_genpd_poweron(() is called for the first time after
         power has been removed from the domain, the "domain" cpuidle
         state will be disabled to avoid subsequent surprise power removals
         via cpuidle.
      
      The effective exit_latency value of the "domain" cpuidle state
      depends on the time needed to bring up the CPU core itself after
      restoring power to it as well as on the power on latency of the
      domain containing the CPU core.  Thus the "domain" cpuidle state's
      exit_latency has to be recomputed every time the domain's power on
      latency is updated, which may happen every time power is restored
      to the domain, if the measured power on latency is greater than
      the latency stored in the corresponding generic_pm_domain structure.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cbc9ef02
    • S
      cpuidle: move field disable from per-driver to per-cpu · dc7fd275
      ShuoX Liu 提交于
      Andrew J.Schorr raises a question.  When he changes the disable setting on
      a single CPU, it affects all the other CPUs.  Basically, currently, the
      disable field is per-driver instead of per-cpu.  All the C states of the
      same driver are shared by all CPU in the same machine.
      
      The patch changes the `disable' field to per-cpu, so we could set this
      separately for each cpu.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Reported-by: NAndrew J.Schorr <aschorr@telemetry-investments.com>
      Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      dc7fd275
  22. 02 6月, 2012 4 次提交
    • C
      cpuidle: add support for states that affect multiple cpus · 4126c019
      Colin Cross 提交于
      On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
      cpus cannot be independently powered down, either due to
      sequencing restrictions (on Tegra 2, cpu 0 must be the last to
      power down), or due to HW bugs (on OMAP4460, a cpu powering up
      will corrupt the gic state unless the other cpu runs a work
      around).  Each cpu has a power state that it can enter without
      coordinating with the other cpu (usually Wait For Interrupt, or
      WFI), and one or more "coupled" power states that affect blocks
      shared between the cpus (L2 cache, interrupt controller, and
      sometimes the whole SoC).  Entering a coupled power state must
      be tightly controlled on both cpus.
      
      The easiest solution to implementing coupled cpu power states is
      to hotplug all but one cpu whenever possible, usually using a
      cpufreq governor that looks at cpu load to determine when to
      enable the secondary cpus.  This causes problems, as hotplug is an
      expensive operation, so the number of hotplug transitions must be
      minimized, leading to very slow response to loads, often on the
      order of seconds.
      
      This file implements an alternative solution, where each cpu will
      wait in the WFI state until all cpus are ready to enter a coupled
      state, at which point the coupled state function will be called
      on all cpus at approximately the same time.
      
      Once all cpus are ready to enter idle, they are woken by an smp
      cross call.  At this point, there is a chance that one of the
      cpus will find work to do, and choose not to enter idle.  A
      final pass is needed to guarantee that all cpus will call the
      power state enter function at the same time.  During this pass,
      each cpu will increment the ready counter, and continue once the
      ready counter matches the number of online coupled cpus.  If any
      cpu exits idle, the other cpus will decrement their counter and
      retry.
      
      To use coupled cpuidle states, a cpuidle driver must:
      
         Set struct cpuidle_device.coupled_cpus to the mask of all
         coupled cpus, usually the same as cpu_possible_mask if all cpus
         are part of the same cluster.  The coupled_cpus mask must be
         set in the struct cpuidle_device for each cpu.
      
         Set struct cpuidle_device.safe_state to a state that is not a
         coupled state.  This is usually WFI.
      
         Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
         state that affects multiple cpus.
      
         Provide a struct cpuidle_state.enter function for each state
         that affects multiple cpus.  This function is guaranteed to be
         called on all cpus at approximately the same time.  The driver
         should ensure that the cpus all abort together if any cpu tries
         to abort once the function is called.
      
      update1:
      
      cpuidle: coupled: fix count of online cpus
      
      online_count was never incremented on boot, and was also counting
      cpus that were not part of the coupled set.  Fix both issues by
      introducting a new function that counts online coupled cpus, and
      call it from register as well as the hotplug notifier.
      
      update2:
      
      cpuidle: coupled: fix decrementing ready count
      
      cpuidle_coupled_set_not_ready sometimes refuses to decrement the
      ready count in order to prevent a race condition.  This makes it
      unsuitable for use when finished with idle.  Add a new function
      cpuidle_coupled_set_done that decrements both the ready count and
      waiting count, and call it after idle is complete.
      
      Cc: Amit Kucheria <amit.kucheria@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Trinabh Gupta <g.trinabh@gmail.com>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      4126c019
    • C
      cpuidle: fix error handling in __cpuidle_register_device · 3af272ab
      Colin Cross 提交于
      Fix the error handling in __cpuidle_register_device to include
      the missing list_del.  Move it to a label, which will simplify
      the error handling when coupled states are added.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3af272ab
    • C
      cpuidle: refactor out cpuidle_enter_state · 56cfbf74
      Colin Cross 提交于
      Split the code to enter a state and update the stats into a helper
      function, cpuidle_enter_state, and export it.  This function will
      be called by the coupled state code to handle entering the safe
      state and the final coupled state.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      56cfbf74
    • S
      cpuidle: add checks to avoid NULL pointer dereference · 1b0a0e9a
      Srivatsa S. Bhat 提交于
      The existing check for dev == NULL in __cpuidle_register_device() is
      rendered useless because dev is dereferenced before the check itself.
      Moreover, correctly speaking, it is the job of the callers of this
      function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
      also happen to be exported functions) to ensure that
      __cpuidle_register_device() is called with a non-NULL dev.
      
      So add the necessary dev == NULL checks in the two callers and remove the
      (useless) check from __cpuidle_register_device().
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      1b0a0e9a