提交 · b7723f9d21d8d6043e63f5e3e412f321f5f1900c · openanolis / cloud-kernel

10 5月, 2010 2 次提交

ondemand: Make the iowait-is-busy time a sysfs tunable · 19379b11

由 Arjan van de Ven 提交于 5月 09, 2010

Pavel Machek pointed out that not all CPUs have an efficient
idle at high frequency. Specifically, older Intel and various
AMD cpus would get a higher powerusage when copying files from
USB.

Mike Chan pointed out that the same is true for various ARM
chips as well.

Thomas Renninger suggested to make this a sysfs tunable with a
reasonable default.

This patch adds a sysfs tunable for the new behavior, and uses
a very simple function to determine a reasonable default,
depending on the CPU vendor/type.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082651.46914d04@infradead.org>
[ minor tidyup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

19379b11

ondemand: Solve a big performance issue by counting IOWAIT time as busy · 6b8fcd90

由 Arjan van de Ven 提交于 5月 09, 2010

The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
time) as a measure for scaling the CPU frequency up or down.
If the CPU is busy, the CPU frequency scales up, if it's idle,
the CPU frequency scales down. Effectively, it uses the CPU busy
time as proxy variable for the more nebulous "how critical is
performance right now" question.

This algorithm falls flat on its face in the light of workloads
where you're alternatingly disk and CPU bound, such as the ever
popular "git grep", but also things like startup of programs and
maildir using email clients... much to the chagarin of Andrew
Morton.

This patch changes the ondemand algorithm to count iowait time
as busy, not idle, time. As shown in the breakdown cases above,
iowait is performance critical often, and by counting iowait,
the proxy variable becomes a more accurate representation of the
"how critical is performance" question.

The problem and fix are both verified with the "perf timechar"
tool.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Jones <davej@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100509082606.3d9f00d0@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6b8fcd90

01 4月, 2010 2 次提交

[CPUFREQ] use max load in conservative governor · fd187aaf

由 Dominik Brodowski 提交于 3月 26, 2010

Instead of using the load of the last CPU in a package, use the
maximum load of all CPUs in a package.
Reported-by: NJean-Christian Goussard <jeanchristian.goussard@sfr.fr>
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NDave Jones <davej@redhat.com>

fd187aaf

[CPUFREQ] fix a lockdep warning · 499bca9b

由 Amerigo Wang 提交于 3月 04, 2010

There is no need to do sysfs_remove_link() or kobject_put() etc.
when policy_rwsem_write is held, move them after releasing the lock.

This fixes the lockdep warning:

halt/4071 is trying to acquire lock:
 (s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0

but task is already holding lock:
 (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4
Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

499bca9b

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

08 3月, 2010 1 次提交

Driver core: Constify struct sysfs_ops in struct kobj_type · 52cf25d0

由 Emese Revfy 提交于 1月 19, 2010

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing
Signed-off-by: NEmese Revfy <re.emese@gmail.com>
Acked-by: NDavid Teigland <teigland@redhat.com>
Acked-by: NMatt Domsch <Matt_Domsch@dell.com>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: NHans J. Koch <hjk@linutronix.de>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

52cf25d0

13 1月, 2010 1 次提交

[CPUFREQ] Fix ondemand to not request targets outside policy limits · 1dbf5888

由 Nagananda.Chumbalkar@hp.com 提交于 12月 21, 2009

Dominik said:
target_freq cannot be below policy->min or above policy->max.
If it were, the whole cpufreq subsystem is broken.

But (answer):
I think the "ondemand" governor can ask for a target frequency that is
below policy->min.
...
A patch such as below may be needed to sanitize the target frequency
requested by "ondemand". The "conservative" governor already has this check:
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

# diff -bur x/drivers/cpufreq/cpufreq_ondemand.c.orig y/drivers/cpufreq/cpufreq_ondemand.c

1dbf5888

25 11月, 2009 3 次提交

[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface · e2f74f35

由 Thomas Renninger 提交于 11月 19, 2009

This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.

Why is this needed:

Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
  - any userspace prog writing to scaling_max_freq
  - thermal limitations
  - hardware (_PPC in ACPI case) limitiations

Therefore export bios_limit (in kHz) to:
  - Point the user that it's the BIOS (broken or intended) which limits
    frequency
  - Export it as a sysfs interface for userspace progs.
    While this was a rarely used feature on laptops, there will appear
    more and more server implemenations providing "Green IT" features like
    allowing the service processor to limit the frequency. People want
    to know about HW/BIOS frequency limitations.

All ACPI P-state driven cpufreq drivers are covered with this patch:
  - powernow-k8
  - powernow-k7
  - acpi-cpufreq

Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations

# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation

CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

e2f74f35

[CPUFREQ] make internal cpufreq_add_dev_* static · cf3289d0

由 Alex Chiang 提交于 11月 17, 2009

No need to export these symbols; make them static.

	cpufreq_add_dev_policy
	cpufreq_add_dev_symlink
	cpufreq_add_dev_interface
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NDave Jones <davej@redhat.com>

cf3289d0

[CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings · 49b015ce

由 Thomas Renninger 提交于 10月 01, 2009

Same adustments that have been added to the ondemand recently.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

49b015ce

18 11月, 2009 3 次提交

[CPUFREQ] Fix stale cpufreq_cpu_governor pointer · 90e41bac

由 Prarit Bhargava 提交于 11月 12, 2009

Dave,

Attached is an update of my patch against the cpufreq fixes branch.

Before applying the patch I compiled and booted the tree to see if the panic
was still there -- to my surprise it was not.  This is because of the conversion
of cpufreq_cpu_governor to a char[].

While the panic is kaput, the problem of stale data continues and my patch is
still valid.  It is possible to end up with the wrong governor after hotplug
events because CPUFREQ_DEFAULT_GOVERNOR is statically linked to a default,
while the cpu siblings may have had a different governor assigned by a user.

ie) the patch is still needed in order to keep the governors assigned
properly when hotplugging devices
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NDave Jones <davej@redhat.com>

90e41bac

[CPUFREQ] Resolve time unit thinko in ondemand/conservative govs · 54c9a35d

由 Pallipadi, Venkatesh 提交于 11月 11, 2009

ondemand and conservative governors are messing up time units in the
code path where NO_HZ is not enabled and ignore_nice is set. The walltime
idletime stored is in jiffies and nice time calculation is happening in
microseconds.

The problem was reported and diagnosed by Alexander here.
http://marc.info/?l=linux-kernel&m=125752550404513&w=2

The patch below fixes this thinko.
Reported-by: NAlexander Miller <Miller@fmi.uni-stuttgart.de>
Tested-by: NAlexander Miller <Miller@fmi.uni-stuttgart.de>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

54c9a35d

[CPUFREQ] Fix use after free on governor restore · e77b89f1

由 Dmitry Monakhov 提交于 10月 05, 2009

Currently on governer backup/restore path we storing governor's pointer.
This is wrong because one may unload governor's module after cpu goes
offline. As result use-after-free will take place on restored cpu.
It is not easy to exploit this bug, but still we have to close this
issue ASAP. Issue was introduced by following commit
084f3493

##TESTCASE##
#!/bin/sh -x
modprobe acpi_cpufreq
# Any non default governor, in may case it is "ondemand"
modprobe cpufreq_ondemand
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
rmmod acpi_cpufreq
rmmod cpufreq_ondemand
modprobe acpi_cpufreq  # << use-after-free here.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NDave Jones <davej@redhat.com>

e77b89f1

29 10月, 2009 1 次提交

percpu: make percpu symbols in cpufreq unique · f1625066

由 Tejun Heo 提交于 10月 29, 2009

This patch updates percpu related symbols in cpufreq such that percpu
symbols are unique and don't clash with local symbols.  This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.

* drivers/cpufreq/cpufreq.c: s/policy_cpu/cpufreq_policy_cpu/
* drivers/cpufreq/freq_table.c: s/show_table/cpufreq_show_table/
* arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: s/drv_data/acfreq_data/
  					      s/old_perf/acfreq_old_perf/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>

f1625066

02 9月, 2009 10 次提交

[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) · 395913d0

由 Mathieu Desnoyers 提交于 6月 08, 2009

remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)

commit	42a06f21

Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.

Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.

However, the rwlock write-lock is not needed upon governor stop.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: trenn@suse.de
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
Signed-off-by: NDave Jones <davej@redhat.com>

395913d0

[CPUFREQ] ondemand - Use global sysfs dir for tuning settings · 0e625ac1

由 Thomas Renninger 提交于 7月 24, 2009

Ondemand has only global variables for userspace tunings via sysfs.
But they were exposed per CPU which wrongly implies to the user that
his settings are applied per cpu. Also locking sysfs against concurrent
access won't be necessary anymore after deprecation time.

This means the ondemand config dir is moved:
/sys/devices/system/cpu/cpu*/cpufreq/ondemand ->
     /sys/devices/system/cpu/cpufreq/ondemand

The old files will still exist, but reading or writing to them will
result in one (printk_once) deprecation msg to syslog per file.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

0e625ac1

[CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq · 8aa84ad8

由 Thomas Renninger 提交于 7月 24, 2009

Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.

Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

8aa84ad8

[CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created · 4bfa042c

由 Thomas Renninger 提交于 7月 24, 2009

Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online

on a managed CPU will result in:
Jul 22 15:15:37 linux kernel: [ 80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6()
Jul 22 15:15:37 linux kernel: [ 80.013866] Hardware name: To Be Filled By O.E.M.
Jul 22 15:15:37 linux kernel: [ 80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Jul 22 15:15:37 linux kernel: [ 80.013870] Modules linked in: powernow_k8
Jul 22 15:15:37 linux kernel: [ 80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40
Jul 22 15:15:37 linux kernel: [ 80.013876] Call Trace:
Jul 22 15:15:37 linux kernel: [ 80.013879] [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013884] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 15:15:37 linux kernel: [ 80.013888] [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e
Jul 22 15:15:37 linux kernel: [ 80.013891] [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6
Jul 22 15:15:37 linux kernel: [ 80.013894] [<ffffffff8112f213>] create_dir+0x58/0x87
Jul 22 15:15:37 linux kernel: [ 80.013898] [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f
Jul 22 15:15:37 linux kernel: [ 80.013902] [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de
Jul 22 15:15:37 linux kernel: [ 80.013905] [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e
Jul 22 15:15:37 linux kernel: [ 80.013908] [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57
Jul 22 15:15:37 linux kernel: [ 80.013913] [<ffffffff810667bc>] ? mark_lock+0x22/0x228
Jul 22 15:15:37 linux kernel: [ 80.013918] [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4
...

This bug slipped in by git commit:
150b06f7f223cfd0f808737a5243cceca8ea47fa

When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function
is not left anymore, only cpufreq_add_dev_policy.
This patch should reconstruct the identical functionality again as it
was before the split.

CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

4bfa042c

D
[CPUFREQ] Factor out policy setting from cpufreq_add_dev · ecf7e461
由 Dave Jones 提交于 7月 08, 2009
```
Signed-off-by: NDave Jones <davej@redhat.com>
```
ecf7e461
D
[CPUFREQ] Factor out interface creation from cpufreq_add_dev · 909a694e
由 Dave Jones 提交于 7月 08, 2009
```
Signed-off-by: NDave Jones <davej@redhat.com>
```
909a694e
D
[CPUFREQ] Factor out symlink creation from cpufreq_add_dev · 19d6f7ec
由 Dave Jones 提交于 7月 08, 2009
```
Signed-off-by: NDave Jones <davej@redhat.com>
```
19d6f7ec
D
[CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev · 059019a3
由 Dave Jones 提交于 7月 08, 2009
```
Signed-off-by: NDave Jones <davej@redhat.com>
```
059019a3
D
[CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev · 54e6fe16
由 Dave Jones 提交于 7月 08, 2009
```
Signed-off-by: NDave Jones <davej@redhat.com>
```
54e6fe16

[CPUFREQ] Re-enable cpufreq suspend and resume code · ce6c3997

由 Dominik Brodowski 提交于 8月 07, 2009

Commit 4bc5d341 is broken and causes regressions:

(1) cpufreq_driver->resume() and ->suspend() were only called on
__powerpc__, but you could set them on all architectures. In fact,
->resume() was defined and used before the PPC-related commit
42d4dc3f complained about in 4bc5d341.

(2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
would never be called.

(3) This means speedstep-smi would be unusuable after suspend or resume.

The _real_ problem was calling cpufreq_driver->get() with interrupts
off, but it re-enabling interrupts on some platforms. Why is ->get()
necessary?

Some systems like to change the CPU frequency behind our
back, especially during BIOS-intensive operations like suspend or
resume. If such systems also use a CPU frequency-dependant timing loop,
delays might be off by large factors. Therefore, we need to ascertain
as soon as possible that the CPU frequency is indeed at the speed we
think it is. You can do this two ways: either setting it anew, or trying
to get it. The latter is what was done, the former also has the same IRQ
issue.

So, let's try something different: defer the checking to after interrupts
are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
Timings may be off until this later stage, so let's watch out for
resume regressions caused by the deferred handling of frequency changes
behind the kernel's back.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NDave Jones <davej@redhat.com>

ce6c3997

05 8月, 2009 4 次提交

[CPUFREQ] Make cpufreq suspend code conditional on powerpc. · 4bc5d341

由 Dave Jones 提交于 8月 04, 2009

The suspend code runs with interrupts disabled, and the powerpc workaround we
do in the cpufreq suspend hook calls the drivers ->get method.

powernow-k8's ->get does an smp_call_function_single
which needs interrupts enabled

cpufreq's suspend/resume code was added in 42d4dc3f to work around
a hardware problem on ppc powerbooks.  If we make all this code
conditional on powerpc, we avoid the issue above.
Signed-off-by: NDave Jones <davej@redhat.com>

4bc5d341

[CPUFREQ] Fix a kobject reference bug related to managed CPUs · d5194dec

由 Thomas Renninger 提交于 7月 29, 2009

The first offline/online cycle is successful, the second not.
Doing:
echo 0 >cpu1/online
echo 1 >cpu1/online
echo 0 >cpu1/online

The last command will trigger:
Jul 22 14:39:50 linux kernel: [  593.210125] ------------[ cut here ]------------
Jul 22 14:39:50 linux kernel: [  593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b()
Jul 22 14:39:50 linux kernel: [  593.210144] Hardware name: To Be Filled By O.E.M.
Jul 22 14:39:50 linux kernel: [  593.210148] Modules linked in: powernow_k8
Jul 22 14:39:50 linux kernel: [  593.210158] Pid: 378, comm: kondemand/2 Tainted: G        W  2.6.31-rc2 #38
Jul 22 14:39:50 linux kernel: [  593.210163] Call Trace:
Jul 22 14:39:50 linux kernel: [  593.210171]  [<ffffffff812008e8>] ? kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210181]  [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4
Jul 22 14:39:50 linux kernel: [  593.210190]  [<ffffffff81041962>] warn_slowpath_null+0xf/0x11
Jul 22 14:39:50 linux kernel: [  593.210198]  [<ffffffff812008e8>] kref_get+0x23/0x2b
Jul 22 14:39:50 linux kernel: [  593.210206]  [<ffffffff811ffa19>] kobject_get+0x1a/0x22
Jul 22 14:39:50 linux kernel: [  593.210214]  [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb
Jul 22 14:39:50 linux kernel: [  593.210222]  [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67
Jul 22 14:39:50 linux kernel: [  593.210231]  [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f
Jul 22 14:39:50 linux kernel: [  593.210240]  [<ffffffff810529ea>] worker_thread+0x200/0x313
...

The output continues on every do_dbs_timer ondemand freq checking poll.
This regression was introduced by git commit:
3f4a782b

The policy is released when the cpufreq device is removed in:
__cpufreq_remove_dev():
	/* if this isn't the CPU which is the parent of the kobj, we
	 * only need to unlink, put and exit
	 */

Not creating the symlink is not sever at all.
As long as:
sysfs_remove_link(&sys_dev->kobj, "cpufreq");
handles it gracefully that the symlink did not exist.
Possibly no error should be returned at all, because ondemand
governor would still provide the same functionality.
Userspace in userspace gov case might be confused if the link
is missing.

Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903

CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

d5194dec

[CPUFREQ] Do not set policy for offline cpus · 42c74b84

由 Prarit Bhargava 提交于 8月 03, 2009

Suspend/Resume fails on multi socket, multi core systems because the cpufreq
code erroneously sets the per_cpu policy_cpu value when a logical cpu is
offline.

This most notably results in missing sysfs files that are used to set the
cpu frequencies of the various cpus.
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NDave Jones <davej@redhat.com>

42c74b84

[CPUFREQ] Fix NULL pointer dereference regression in conservative governor · 26d204af

由 Pallipadi, Venkatesh 提交于 7月 29, 2009

Commit ee88415c
introduced this regression when it removed enable bit in cpu_dbs_info_s.
That added a possibility of dbs_cpufreq_notifier getting called for a
CPU that is not yet managed by conservative governor. That will happen
as the transition notifier is set as soon as one CPU switches to
conservative governor and other CPUs can get a NULL pointer dereference
without the enable bit check. Add the enable bit back again.
Reported-by: NLermytte Christophe <Christophe.Lermytte@thomson.net>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

26d204af

09 7月, 2009 1 次提交

[CPUFREQ] Fix compile failure in cpufreq.c · 5e1596f7

由 Dave Jones 提交于 7月 08, 2009

managed_policy is out of scope for the non-smp case.
Declare it locally where used (twice)
Signed-off-by: NDave Jones <davej@redhat.com>

5e1596f7

07 7月, 2009 4 次提交

[CPUFREQ] fix (utter) cpufreq_add_dev mess · 3f4a782b

由 Mathieu Desnoyers 提交于 7月 03, 2009

OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.

This is step one of fixing the overall locking dependency mess in cpufreq.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
CC: rjw@sisk.pl
CC: mingo@elte.hu
CC: Shaohua Li <shaohua.li@intel.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Dave Young <hidave.darkstar@gmail.com>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: sven.wegener@stealer.net
CC: cpufreq@vger.kernel.org
CC: Thomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

3f4a782b

[CPUFREQ] Cleanup locking in conservative governor · ee88415c

由 venkatesh.pallipadi@intel.com 提交于 7月 02, 2009

Redesign the locking inside conservative driver. Make dbs_mutex handle all the
global state changes inside the driver and invent a new percpu mutex
to serialize percpu timer and frequency limit change.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

ee88415c

[CPUFREQ] Cleanup locking in ondemand governor · 5a75c828

由 venkatesh.pallipadi@intel.com 提交于 7月 02, 2009

Redesign the locking inside ondemand driver. Make dbs_mutex handle all the
global state changes inside the driver and invent a new percpu mutex
to serialize percpu timer and frequency limit change.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

5a75c828

[CPUFREQ] Eliminate the recent lockdep warnings in cpufreq · 7d26e2d5

由 venkatesh.pallipadi@intel.com 提交于 7月 02, 2009

Commit b14893a6 although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.

Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here

http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

and few others..
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

7d26e2d5

24 6月, 2009 1 次提交

percpu: clean up percpu variable definitions · 245b2e70

由 Tejun Heo 提交于 6月 24, 2009

Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique.  Update percpu
variable definitions accordingly.

* as,cfq: rename ioc_count uniquely

* cpufreq: rename cpu_dbs_info uniquely

* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
  rename it

* ipv4,6: rename cookie_scratch uniquely

* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
  pmc_irq_entry and nmi_entry to pmc_nmi_entry

* perf_counter: rename disable_count to perf_disable_count

* ftrace: rename test_event_disable to ftrace_test_event_disable

* kmemleak: rename test_pointer to kmemleak_test_pointer

* mce: rename next_interval to mce_next_interval

[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>

245b2e70

15 6月, 2009 2 次提交

[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful · 4f4d1ad6

由 Thomas Renninger 提交于 4月 22, 2009

Update the documentation accordingly.
Cleanup and use printk_once.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

4f4d1ad6

[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case · cef9615a

由 Thomas Renninger 提交于 4月 22, 2009

With this patch you have following minimal sampling rate restrictions:

Kernel restrictions:
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us  (20ms)
HZ=250:  min=80000us  (80ms)
HZ=100:  min=200000us (200ms)

HW restrictions:
Do not sample/poll more often than HW latency * 100  exported by the low
level cpufreq HW driver

The higher value of above restrictions is the minimal sampling rate
that can be set (and can be seen via ondemand/sampling_rate_min sysfs file)

Default sampling rate still is HW latency * 1000, but this will now end
up in lower values on latest (Intel and AMD) hardware as these can switch
really fast and sampling rate mostly was limited to the 80ms or 200ms
(depending on whether HZ=250 or HZ=1000 is used).
Signed-off-by: NThomas Renninger <trenn@suse.de>
Cc: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
Signed-off-by: NDave Jones <davej@redhat.com>

cef9615a

09 6月, 2009 1 次提交

cpumask: alloc zeroed cpumask for static cpumask_var_ts · eaa95840

由 Yinghai Lu 提交于 6月 06, 2009

These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already.  Avoid surprises when MAXSMP is enabled.
Signed-off-by: NYinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

eaa95840

27 5月, 2009 3 次提交

[CPUFREQ] fix timer teardown in ondemand governor · b14893a6

由 Mathieu Desnoyers 提交于 5月 17, 2009

* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/
>

(updated changelog)

cpufreq fix timer teardown in ondemand governor

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

Depends on patch
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: NDave Jones <davej@redhat.com>

b14893a6

[CPUFREQ] fix timer teardown in conservative governor · b253d2b2

由 Mathieu Desnoyers 提交于 5月 17, 2009

* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/
>

(re-send with updated changelog)

cpufreq fix timer teardown in conservative governor

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

Depends on patch
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

The following patch applies to 2.6.30-rc2. Stable kernels have a similar
issue which should also be fixed, but the code changed between 2.6.29
and 2.6.30, so this patch only applies to 2.6.30-rc.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: NDave Jones <davej@redhat.com>

b253d2b2

[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call · 42a06f21

由 Mathieu Desnoyers 提交于 5月 17, 2009

* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject		: cpufreq timer teardown problem
> Submitter	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date		: 2009-04-23 14:00 (24 days old)
> References	: http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By	: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch		: http://patchwork.kernel.org/patch/19754/
> 		  http://patchwork.kernel.org/patch/19753/

The patches linked above depend on the following patch to remove
circular locking dependency :

cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

(the following issue was faced when using cancel_delayed_work_sync() in the
timer teardown (which fixes a race).

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
>
> A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
>
>

Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.

The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().

Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :

cpufreq.c: __cpufreq_remove_dev()
...
        if (cpufreq_driver->target)
                __cpufreq_governor(data, CPUFREQ_GOV_STOP);

        unlock_policy_rwsem_write(cpu);

To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.

Applies on top of the 2.6.30-rc5 tree.

Required to remove circular dep in teardown of both conservative and
ondemande governors so they can use cancel_delayed_work_sync().
CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
unneeded.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Jones <davej@redhat.com>

42a06f21

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功