提交 · 732b6d617a4cfd8363d1f70a06bff38b8c1a19e9 · openeuler / raspberrypi-kernel

15 6月, 2015 3 次提交

cpufreq: governor: Serialize governor callbacks · 732b6d61

由 Viresh Kumar 提交于 6月 03, 2015

There are several races reported in cpufreq core around governors (only
ondemand and conservative) by different people.

There are at least two race scenarios present in governor code:
 (a) Concurrent access/updates of governor internal structures.

 It is possible that fields such as 'dbs_data->usage_count', etc.  are
 accessed simultaneously for different policies using same governor
 structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
 because of this we can dereference bad pointers.

 For example consider a system with two CPUs with separate 'struct
 cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
 CPU0 switching to powersave and CPU1 to ondemand:
	CPU0				CPU1

	store*				store*

	cpufreq_governor_exit()		cpufreq_governor_init()
					dbs_data = cdata->gdbs_data;

	if (!--dbs_data->usage_count)
		kfree(dbs_data);

					dbs_data->usage_count++;
					*Bad pointer dereference*

 There are other races possible between EXIT and START/STOP/LIMIT as
 well. Its really complicated.

 (b) Switching governor state in bad sequence:

 For example trying to switch a governor to START state, when the
 governor is in EXIT state. There are some checks present in
 __cpufreq_governor() but they aren't sufficient as they compare events
 against 'policy->governor_enabled', where as we need to take governor's
 state into account, which can be used by multiple policies.

These two issues need to be solved separately and the responsibility
should be properly divided between cpufreq and governor core.

The first problem is more about the governor core, as it needs to
protect its structures properly. And the second problem should be fixed
in cpufreq core instead of governor, as its all about sequence of
events.

This patch is trying to solve only the first problem.

There are two types of data we need to protect,
- 'struct common_dbs_data': No matter what, there is going to be a
  single copy of this per governor.
- 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
  will have per-policy copy of this data, otherwise a single copy.

Because of such complexities, the mutex present in 'struct dbs_data' is
insufficient to solve our problem. For example we need to protect
fetching of 'dbs_data' from different structures at the beginning of
cpufreq_governor_dbs(), to make sure it isn't currently being updated.

This can be fixed if we can guarantee serialization of event parsing
code for an individual governor. This is best solved with a mutex per
governor, and the placeholder for that is 'struct common_dbs_data'.

And so this patch moves the mutex from 'struct dbs_data' to 'struct
common_dbs_data' and takes it at the beginning and drops it at the end
of cpufreq_governor_dbs().

Tested with and without following configuration options:

CONFIG_LOCKDEP_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

732b6d61

cpufreq: governor: split cpufreq_governor_dbs() · 714a2d9c

由 Viresh Kumar 提交于 6月 04, 2015

cpufreq_governor_dbs() is hardly readable, it is just too big and
complicated. Lets make it more readable by splitting out event specific
routines.

Order of statements is changed at few places, but that shouldn't bring
any functional change.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

714a2d9c

cpufreq: governor: register notifier from cs_init() · 8e0484d2

由 Viresh Kumar 提交于 6月 03, 2015

Notifiers are required only for conservative governor and the common
governor code is unnecessarily polluted with that. Handle that from
cs_init/exit() instead of cpufreq_governor_dbs().
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8e0484d2

11 6月, 2015 5 次提交

cpufreq: Remove cpufreq_update_policy() · 37829029

由 Viresh Kumar 提交于 6月 08, 2015

cpufreq_update_policy() was kept as a separate routine earlier as it was
handling migration of sysfs directories, which isn't the case anymore.
It is only updating policy->cpu now and is called by a single caller.

The WARN_ON() isn't really required anymore, as we are just updating the
cpu now, not moving the sysfs directories.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

37829029

cpufreq: Restart governor as soon as possible · 9591becb

由 Viresh Kumar 提交于 6月 10, 2015

__cpufreq_remove_dev_finish() is doing two things today:
- Restarts the governor if some CPUs from concerned policy are still
  online.
- Frees the policy if all CPUs are offline.

The first task of restarting the governor can be moved to
__cpufreq_remove_dev_prepare() to restart the governor early. There is
no race between _prepare() and _finish() as they would be handling
completely different cases. _finish() will only be required if we are
going to free the policy and that has nothing to do with restarting the
governor.
Original-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9591becb

cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() · 3654c5cc

由 Viresh Kumar 提交于 6月 08, 2015

cpufreq_policy_put_kobj() is actually part of freeing the policy and can
be called from cpufreq_policy_free() directly instead of a separate
call.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3654c5cc

cpufreq: Initialize policy->kobj while allocating policy · 2fc3384d

由 Viresh Kumar 提交于 6月 08, 2015

policy->kobj is required to be initialized once in the lifetime of a
policy. Currently we are initializing it from __cpufreq_add_dev() and
that doesn't look to be the best place for doing so as we have to do
this on special cases (like: !recover_policy).

We can initialize it from a more obvious place cpufreq_policy_alloc()
and that will make code look cleaner, specially the error handling part.

The error handling part of __cpufreq_add_dev() was doing almost the same
thing while recover_policy is true or false. Fix that as well by always
calling cpufreq_policy_put_kobj() with an additional parameter to skip
notification part of it.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2fc3384d

cpufreq: Stop migrating sysfs files on hotplug · 87549141

由 Viresh Kumar 提交于 6月 10, 2015

When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
the outgoing cpu was the owner of policy->kobj earlier then we migrate
the sysfs directory to under another online cpu.

There are few disadvantages this brings:
- Code Complexity
- Slower hotplug/suspend/resume
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume

To overcome these, this patch modifies the way sysfs directories are
managed:
- Select sysfs kobjects owner while initializing policy and don't change
  it during hotplugs. Track it with kobj_cpu created earlier.

- Create symlinks for all related CPUs (can be offline) instead of
  affected CPUs on policy initialization and remove them only when the
  policy is freed.

- Free policy structure only on the removal of cpufreq-driver and not
  during hotplug/suspend/resume, detected by checking 'struct
  subsys_interface *' (Valid only when called from
  subsys_interface_unregister() while unregistering driver).

Apart from this, special care is taken to handle physical hoplug of CPUs
as we wouldn't remove sysfs links or remove policies on logical
hotplugs. Physical hotplug happens in the following sequence.

Hot removal:
- CPU is offlined first, ~ 'echo 0 >
  /sys/devices/system/cpu/cpuX/online'
- Then its device is removed along with all sysfs files, cpufreq core
  notified with cpufreq_remove_dev() callback from subsys-interface..

Hot addition:
- First the device along with its sysfs files is added, cpufreq core
  notified with cpufreq_add_dev() callback from subsys-interface..
- CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'

We call the same routines with both hotplug and subsys callbacks, and we
sense physical hotplug with cpu_offline() check in subsys callback. We
can handle most of the stuff with regular hotplug callback paths and
add/remove cpufreq sysfs links or free policy from subsys callbacks.
Original-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

87549141

10 6月, 2015 3 次提交

cpufreq: Don't allow updating inactive policies from sysfs · 11e584cf

由 Viresh Kumar 提交于 6月 10, 2015

Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed only for
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.

User may accidentally try to update the sysfs files in following
directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would
result in undefined behavior as policy wouldn't be active then.

Apart from updating the store() routine, we also update __cpufreq_get()
which can call cpufreq_out_of_sync(). The later routine tries to update
policy->cur and starts notifying kernel about it.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

11e584cf

intel_pstate: Force setting target pstate when required · 6c1e4591

由 Doug Smythies 提交于 6月 01, 2015

During initialization and exit it is possible that the target pstate
might not actually be set. Furthermore, the result can be that the
driver and the processor are out of synch and, under some conditions,
the driver might never send the processor the proper target pstate.

This patch adds a bypass or do_checks flag to the call to
intel_pstate_set_pstate. If bypass, then specifically bypass clamp
checks and the do not send if it is the same as last time check. If
do_checks, then, and as before, do the current policy clamp checks,
and do not do actual send if the new target is the same as the old.
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Reported-by: NMarien Zwart <marien.zwart@gmail.com>
Reported-by: NAlex Lochmann <alexander.lochmann@tu-dortmund.de>
Reported-by: NPiotr Ko?aczkowski <pkolaczk@gmail.com>
Reported-by: NClemens Eisserer <linuxhippy@gmail.com>
Tested-by: NMarien Zwart <marien.zwart@gmail.com>
Tested-by: NDoug Smythies <dsmythies@telus.net>
[ rjw: Dropped pointless symbol definitions, rebased ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6c1e4591

intel_pstate: change some inconsistent debug information · f16255eb

由 Doug Smythies 提交于 5月 31, 2015

Commit ce717613 (intel_pstate: Turn per cpu printk into pr_debug)
turned per cpu printk into pr_debug.  However, only half of the change
was done, introducing an inconsistency between entry and exit from
driver pstate control.  This patch changes the exit message to pr_debug
also.

The various messages are inconsistent with respect to any identifier
text that can be used to help isolate the desired information from a
huge log.  This patch makes a consistent identifier portion of the
string.

Amends: ce717613 (intel_pstate: Turn per cpu printk into pr_debug)
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f16255eb

23 5月, 2015 2 次提交

cpufreq: Track cpu managing sysfs kobjects separately · 9d16f207

由 Saravana Kannan 提交于 5月 18, 2015

In order to prepare for the next few commits, that will stop migrating
sysfs files on cpu hotplug, this patch starts managing sysfs-cpu
separately.

The behavior is still the same as we are still migrating sysfs files on
hotplug, later commits would change that.
Signed-off-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9d16f207

cpufreq: Fix for typos in two comments · 58405af6

由 Shailendra Verma 提交于 5月 22, 2015

Signed-off-by: NShailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

58405af6

15 5月, 2015 7 次提交

cpufreq: Mark policy->governor = NULL for inactive policies · 18bf3a12

由 Viresh Kumar 提交于 5月 12, 2015

Later commits would change the way policies are managed today. Policies
wouldn't be freed on cpu hotplug (currently they aren't freed on
suspend), and while the CPU is offline, the sysfs cpufreq files would
still be present.

Because we don't mark policy->governor as NULL, it still contains
pointer of the last used governor. And if the governor is removed, while
all the CPUs of a policy are hotplugged out, this pointer wouldn't be
valid anymore. And if we try to read the 'scaling_governor', etc.  from
sysfs, it will result in kernel OOPs.

To prevent this, mark policy->governor as NULL for all inactive policies
while the governor is removed from kernel.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

18bf3a12

cpufreq: Manage governor usage history with 'policy->last_governor' · 4573237b

由 Viresh Kumar 提交于 5月 12, 2015

History of which governor was used last is common to all CPUs within a
policy and maintaining it per-cpu isn't the best approach for sure.

Apart from wasting memory, this also increases the complexity of
managing this data structure as it has to be updated for all CPUs.

To make that somewhat simpler, lets store this information in a new
field 'last_governor' in struct cpufreq_policy and update it on removal
of last cpu of a policy.

As a side-effect it also solves an old problem, consider a system with
two clusters 0 & 1. And there is one policy per cluster.

Cluster 0: CPU0 and 1.
Cluster 1: CPU2 and 3.

 - CPU2 is first brought online, and governor is set to performance
   (default as cpufreq_cpu_governor wasn't set).
 - Governor is changed to ondemand.
 - CPU2 is taken offline and cpufreq_cpu_governor is updated for CPU2.
 - CPU3 is brought online.
 - Because cpufreq_cpu_governor wasn't set for CPU3, the default governor
   performance is picked for CPU3.

This patch fixes the bug as we now have a single variable to update for
policy.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4573237b

cpufreq: Don't traverse all active policies to find policy for a cpu · 9104bb26

由 Viresh Kumar 提交于 5月 12, 2015

We reach here while adding policy for a CPU and enter into the 'if'
block only if a policy already exists for the CPU.

As cpufreq_cpu_data is set for all policy->related_cpus now, when the
policy is first added, we can use that to find the CPU's policy instead
of traversing the list of all active policies.
Acked-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9104bb26

cpufreq: Get rid of cpufreq_cpu_data_fallback · 3914d379

由 Viresh Kumar 提交于 5月 08, 2015

We can extract the same information from cpufreq_cpu_data as it is also
available for inactive policies now. And so don't need
cpufreq_cpu_data_fallback anymore.

Also add a WARN_ON() for the case where we try to restore from an active
policy.
Acked-by: NSaravana Kannan <skannan@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3914d379

cpufreq: Don't clear cpufreq_cpu_data and policy list for inactive policies · 988bed09

由 Viresh Kumar 提交于 5月 08, 2015

Now that we can check policy->cpus to find if policy is active or not,
we don't need to clean cpufreq_cpu_data and delete policy from the list
on light weight tear down of policies (like in suspend).

To make it consistent and clean, set cpufreq_cpu_data for all related
CPUs when the policy is first created and clean it only while it is
freed.

Also update cpufreq_cpu_get_raw() to check if cpu is part of
policy->cpus mask, so that we don't end up getting policies for offline
CPUs.

In order to make sure that no users of 'policy' are using an inactive
policy, use cpufreq_cpu_get_raw() instead of directly accessing
cpufreq_cpu_data.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

988bed09

cpufreq: Create for_each_{in}active_policy() · f963735a

由 Viresh Kumar 提交于 5月 12, 2015

policy->cpus is cleared unconditionally now on hotplug-out of a CPU and
it can be checked to know if a policy is active or not. Create helper
routines to iterate over all active/inactive policies, based on
policy->cpus field.

Replace all instances of for_each_policy() with for_each_active_policy()
to make them iterate only for active policies. (We haven't made changes
yet to keep inactive policies in the same list, but that will be
followed in a later patch).
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f963735a

cpufreq: arm_big_little: remove compile-time dependency on BIG_LITTLE · 14730145

由 Sudeep Holla 提交于 5月 13, 2015

With the addition of switcher code, there's compile-time dependency on
BIG_LITTLE to get arm_big_little driver compiling on ARM64. Since ARM64
will never add support for bL switcher, it's better to remove the
dependency so that the driver can be reused on ARM64 platforms.

This patch adds stubs to remove BIG_LITTLE dependency in the driver.
Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

14730145

13 5月, 2015 1 次提交

intel_pstate: set BYT MSR with wrmsrl_on_cpu() · 0dd23f94

由 Joe Konno 提交于 5月 12, 2015

Commit 007bea09 (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it.  It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.

Fixes: 007bea09 (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: NJoe Konno <joe.konno@intel.com>
Acked-by: NKristen Carlson Accardi <kristen@linux.intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0dd23f94

08 5月, 2015 5 次提交

cpufreq: Clear policy->cpus even for the last CPU · 303ae723