提交 · e02e60c109ca70935bad1131976bdbf5160cf576 · openeuler / raspberrypi-kernel

24 4月, 2013 6 次提交

sched: Prevent to re-select dst-cpu in load_balance() · e02e60c1

由 Joonsoo Kim 提交于 4月 23, 2013

Commit 88b8dac0 makes load_balance() consider other cpus in its
group. But, in that, there is no code for preventing to
re-select dst-cpu. So, same dst-cpu can be selected over and
over.

This patch add functionality to load_balance() in order to
exclude cpu which is selected once. We prevent to re-select
dst_cpu via env's cpus, so now, env's cpus is a candidate not
only for src_cpus, but also dst_cpus.

With this patch, we can remove lb_iterations and
max_lb_iterations, because we decide whether we can go ahead or
not via env's cpus.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJason Low <jason.low2@hp.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-7-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e02e60c1

sched: Rename load_balance_tmpmask to load_balance_mask · e6252c3e

由 Joonsoo Kim 提交于 4月 23, 2013

This name doesn't represent specific meaning.
So rename it to imply it's purpose.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJason Low <jason.low2@hp.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e6252c3e

sched: Move up affinity check to mitigate useless redoing overhead · d3198084

由 Joonsoo Kim 提交于 4月 23, 2013

Currently, LBF_ALL_PINNED is cleared after affinity check is
passed. So, if task migration is skipped by small load value or
small imbalance value in move_tasks(), we don't clear
LBF_ALL_PINNED. At last, we trigger 'redo' in load_balance().

Imbalance value is often so small that any tasks cannot be moved
to other cpus and, of course, this situation may be continued
after we change the target cpu. So this patch move up affinity
check code and clear LBF_ALL_PINNED before evaluating load value
in order to mitigate useless redoing overhead.

In addition, re-order some comments correctly.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJason Low <jason.low2@hp.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-5-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d3198084

sched: Don't consider other cpus in our group in case of NEWLY_IDLE · cfc03118

由 Joonsoo Kim 提交于 4月 23, 2013

Commit 88b8dac0 makes load_balance() consider other cpus in its
group, regardless of idle type. When we do NEWLY_IDLE balancing,
we should not consider it, because a motivation of NEWLY_IDLE
balancing is to turn this cpu to non idle state if needed. This
is not the case of other cpus. So, change code not to consider
other cpus for NEWLY_IDLE balancing.

With this patch, assign 'if (pulled_task) this_rq->idle_stamp =
0' in idle_balance() is corrected, because NEWLY_IDLE balancing
doesn't consider other cpus. Assigning to 'this_rq->idle_stamp'
is now valid.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: NJason Low <jason.low2@hp.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cfc03118

sched: Explicitly cpu_idle_type checking in rebalance_domains() · de5eb2dd

由 Joonsoo Kim 提交于 4月 23, 2013

After commit 88b8dac0, dst-cpu can be changed in load_balance(),
then we can't know cpu_idle_type of dst-cpu when load_balance()
return positive. So, add explicit cpu_idle_type checking.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: NJason Low <jason.low2@hp.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

de5eb2dd

sched: Change position of resched_cpu() in load_balance() · f1cd0858

由 Joonsoo Kim 提交于 4月 23, 2013

cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK.
So, there is possibility that we miss doing resched_cpu().
Correct it as changing position of resched_cpu()
before checking LBF_NEED_BREAK.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: NJason Low <jason.low2@hp.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366705662-3587-2-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f1cd0858

21 4月, 2013 1 次提交

sched: Fix wrong rq's runnable_avg update with rt tasks · 642dbc39

由 Vincent Guittot 提交于 4月 18, 2013

The current update of the rq's load can be erroneous when RT
tasks are involved.

The update of the load of a rq that becomes idle, is done only
if the avg_idle is less than sysctl_sched_migration_cost. If RT
tasks and short idle duration alternate, the runnable_avg will
not be updated correctly and the time will be accounted as idle
time when a CFS task wakes up.

A new idle_enter function is called when the next task is the
idle function so the elapsed time will be accounted as run time
in the load of the rq, whatever the average idle time is. The
function update_rq_runnable_avg is removed from idle_balance.

When a RT task is scheduled on an idle CPU, the update of the
rq's load is not done when the rq exit idle state because CFS's
functions are not called. Then, the idle_balance, which is
called just before entering the idle function, updates the rq's
load and makes the assumption that the elapsed time since the
last update, was only running time.

As a consequence, the rq's load of a CPU that only runs a
periodic RT task, is close to LOAD_AVG_MAX whatever the running
duration of the RT task is.

A new idle_exit function is called when the prev task is the
idle function so the elapsed time will be accounted as idle time
in the rq's load.
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: linaro-kernel@lists.linaro.org
Cc: peterz@infradead.org
Cc: pjt@google.com
Cc: fweisbec@gmail.com
Cc: efault@gmx.de
Link: http://lkml.kernel.org/r/1366302867-5055-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

642dbc39

10 4月, 2013 14 次提交

sched/cpuacct/UML: Fix header file dependency bug on the UML build · b329fd5b

由 Ingo Molnar 提交于 4月 10, 2013

The cpuacct split caused this build failure on UML:

  kernel/sched/cpuacct.c:94:2: error: implicit declaration of  function 'ERR_PTR'

Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

b329fd5b

sched/cpuacct: No need to check subsys active state · a2b0ae25

由 Li Zefan 提交于 3月 29, 2013

Now we're guaranteed when cpuacct_charge() and
cpuacct_account_field() are called, cpuacct has already been
properly initialized, so we no longer need those checks.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155384C.7000508@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a2b0ae25

sched/cpuacct: Initialize cpuacct subsystem earlier · 621e2de0

由 Li Zefan 提交于 3月 29, 2013

Initialize cpuacct before the scheduler is functioning, so when
cpuacct_charge() and cpuacct_account_field() are called,
task_ca() won't return NULL.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155383F.8000005@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

621e2de0

sched/cpuacct: Initialize root cpuacct earlier · 14c6d3c8

由 Li Zefan 提交于 3月 29, 2013

Now we don't need cpuacct_init(), and instead we just initialize
root_cpuacct when it's defined.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553834.9090701@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

14c6d3c8

sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically · 7943e15a

由 Li Zefan 提交于 3月 29, 2013

This is a preparation, so later we can initialize cpuacct
earlier.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553822.5000403@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7943e15a

sched/cpuacct: Clean up cpuacct.h · d1712796

由 Li Zefan 提交于 3月 29, 2013

Now most of the code in cpuacct.h can be moved to cpuacct.c
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536D5.2080401@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d1712796

sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field() · 5f40d804

由 Li Zefan 提交于 3月 29, 2013

This is a micro optimazation for a hot path.

- We don't need to check if @CA returned from task_ca() is NULL.
- We don't need to check if @CA returned from parent_ca() is NULL.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536B7.6060602@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5f40d804

sched/cpuacct: Remove redundant NULL checks in cpuacct_charge() · 543bc0e7

由 Li Zefan 提交于 3月 29, 2013

This is a micro optimization for the hot path.

- We don't need to check if @CA is NULL in parent_ca().
- We don't need to check if @CA is NULL in the beginning of the for loop.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/515536A9.5000700@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

543bc0e7

sched/cpuacct: Add cpuacct_acount_field() · 1966aaf7

由 Li Zefan 提交于 3月 29, 2013

So we can remove open-coded cpuacct code in cputime.c.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553692.9060008@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1966aaf7

sched/cpuacct: Add cpuacct_init() · dbe4b41f

由 Li Zefan 提交于 3月 29, 2013

So we don't open-coded initialization of cpuacct in core.c.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51553687.1060906@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

dbe4b41f

sched: Split cpuacct code out of sched.h · 60fed789

由 Li Zefan 提交于 3月 29, 2013

Add cpuacct.h and let sched.h include it.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155367B.2060506@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

60fed789

sched: Split cpuacct code out of core.c · 2e76c24d

由 Li Zefan 提交于 3月 29, 2013

Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5155366F.5060404@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2e76c24d

sched: Fix comment in rebalance_domains() · b9b0853a

由 Libin 提交于 4月 01, 2013

A comment in function rebalance_domains() mentions
arch_init_sched_domains(), but that function does not exist
anymore. The proper function is init_sched_domains().
Signed-off-by: NLibin <huawei.libin@huawei.com>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1364814841-49156-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b9b0853a

sched: Simplify can_migrate_task() · 4e2dcb73

由 Zhang Hang 提交于 4月 10, 2013

At this point tsk_cache_hot is always true, so no need to check it.
Signed-off-by: NZhang Hang <bob.zhanghang@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51650107.9040606@huawei.com
[ Also remove unnecessary schedstat #ifdefs. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4e2dcb73

08 4月, 2013 1 次提交

sched: Fix typo inside comment · 28b4a521

由 Viresh Kumar 提交于 4月 05, 2013

Fix typo:

 sched_domains_nume_distance ->
 sched_domains_numa_distance
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: patches@linaro.org
Cc: robin.randhawa@arm.com
Cc: Steve.Bannister@arm.com
Cc: Liviu.Dudau@arm.com
Cc: charles.garcia-tobin@arm.com
Cc: arvind.chauhan@arm.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/cd8084746ac932106d6fa6be388b8f2d6aa9617c.1365159023.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

28b4a521

18 3月, 2013 1 次提交

sched/tracing: Allow tracing the preemption decision on wakeup · a8d7ad52

由 Peter Zijlstra 提交于 3月 14, 2013

Thomas noted that we do the wakeup preemption check after the
wakeup trace point, this means the tracepoint cannot test/report
this decision; which is rather important for latency sensitive
workloads. Therefore move the tracepoint after doing the
preemption check.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NPaul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/1363254519.26965.9.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>

a8d7ad52

14 3月, 2013 2 次提交

sched: Fix variable name misnomer, add comments · 1bf08230

由 Andrei Epure 提交于 3月 12, 2013

The min_vruntime variable actually stores the maximum value.
The added comment was taken from place_entity function.
Signed-off-by: NAndrei Epure <epure.andrei@gmail.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1bf08230

sched: Lower chances of cputime scaling overflow · d9a3c982

由 Frederic Weisbecker 提交于 2月 20, 2013

Some users have reported that after running a process with
hundreds of threads on intensive CPU-bound loads, the cputime
of the group started to freeze after a few days.

This is due to how we scale the tick-based cputime against
the scheduler precise execution time value.

We add the values of all threads in the group and we multiply
that against the sum of the scheduler exec runtime of the whole
group.

This easily overflows after a few days/weeks of execution.

A proposed solution to solve this was to compute that multiplication
on stime instead of utime:
   62188451
   ("cputime: Avoid multiplication overflow on utime scaling")

The rationale behind that was that it's easy for a thread to
spend most of its time in userspace under intensive CPU-bound workload
but it's much harder to do CPU-bound intensive long run in the kernel.

This postulate got defeated when a user recently reported he was still
seeing cputime freezes after the above patch. The workload that
triggers this issue relates to intensive networking workloads where
most of the cputime is consumed in the kernel.

To reduce much more the opportunities for multiplication overflow,
lets reduce the multiplication factors to the remainders of the division
between sched exec runtime and cputime. Assuming the difference between
these shouldn't ever be that large, it could work on many situations.

This gets the same results as in the upstream scaling code except for
a small difference: the upstream code always rounds the results to
the nearest integer not greater to what would be the precise result.
The new code rounds to the nearest integer either greater or not
greater. In practice this difference probably shouldn't matter but
it's worth mentioning.

If this solution appears not to be enough in the end, we'll
need to partly revert back to the behaviour prior to commit
     0cf55e1e
     ("sched, cputime: Introduce thread_group_times()")

Back then, the scaling was done on exit() time before adding the cputime
of an exiting thread to the signal struct. And then we'll need to
scale one-by-one the live threads cputime in thread_group_cputime(). The
drawback may be a slightly slower code on exit time.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>

d9a3c982

11 3月, 2013 2 次提交

sched: Spelling fix · 660cc00f

由 Andrei Epure 提交于 3月 11, 2013

Signed-off-by: NAndrei Epure <epure.andrei@gmail.com>
Cc: trivial@kernel.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1362996200-2674-1-git-send-email-epure.andrei@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

660cc00f

sched: Fix update_group_power() prototype placement to fix build warning when !CONFIG_SMP · b719203b

由 Li Zefan 提交于 3月 07, 2013

All warnings:

   In file included from kernel/sched/core.c:85:0:
   kernel/sched/sched.h:1036:39: warning: 'struct sched_domain' declared inside parameter list
   kernel/sched/sched.h:1036:39: warning: its scope is only this definition or declaration, which is probably not what you want

It's because struct sched_domain is defined inside #if CONFIG_SMP,
while update_group_power() is declared unconditionally.

Fix this warning by declaring update_group_power() only if
CONFIG_SMP=n.

Build tested with CONFIG_SMP enabled and then disabled.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5137F4BA.2060101@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b719203b

08 3月, 2013 2 次提交

cputime: Dynamically scale cputime for full dynticks accounting · 9fbc42ea

由 Frederic Weisbecker 提交于 2月 25, 2013

The full dynticks cputime accounting is able to account either
using the tick or the context tracking subsystem. This way
the housekeeping CPU can keep the low overhead tick based
solution.

This latter mode has a low jiffies resolution granularity and
need to be scaled against CFS precise runtime accounting to
improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING,
now we also need to expand it to full dynticks accounting dynamic
off-case as well.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

9fbc42ea

context_tracking: Restore preempted context state after preempt_schedule_irq() · b22366cd

由 Frederic Weisbecker 提交于 2月 24, 2013

From the context tracking POV, preempt_schedule_irq() behaves pretty much
like an exception: It can be called anytime and schedule another task.

But currently it doesn't restore the context tracking state of the preempted
code on preempt_schedule_irq() return.

As a result, if preempt_schedule_irq() is called in the tiny frame between
user_enter() and the actual return to userspace, we resume userspace with
the wrong context tracking state.

Fix this by using exception_enter/exit() which are a perfect fit for this
kind of issue.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

b22366cd

06 3月, 2013 7 次提交

sched: Remove double declaration of root_task_group · 27b4b931

由 Li Zefan 提交于 3月 05, 2013

It's already declared in include/linux/sched.h
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A7D8.7000107@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

27b4b931

sched: Move group scheduling functions out of include/linux/sched.h · 25cc7da7

由 Li Zefan 提交于 3月 05, 2013

- Make sched_group_{set_,}runtime(), sched_group_{set_,}period()
and sched_rt_can_attach() static.

- Move sched_{create,destroy,online,offline}_group() to
kernel/sched/sched.h.

- Remove declaration of sched_group_shares().
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A7C5.3000708@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

25cc7da7

sched: Make default_scale_freq_power() static · 15f803c9

由 Li Zefan 提交于 3月 05, 2013

As default_scale_{freq,smt}_power() and update_rt_power() are
used in kernel/sched/fair.c only, annotate them as static
functions.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A7AF.8010900@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

15f803c9

sched: Move struct sched_class to kernel/sched/sched.h · c82ba9fa

由 Li Zefan 提交于 3月 05, 2013

It's used internally only.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A79F.8090502@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c82ba9fa

sched: Move wake flags to kernel/sched/sched.h · b13095f0

由 Li Zefan 提交于 3月 05, 2013

They are used internally only.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A78E.7040609@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b13095f0

sched: Move struct sched_group to kernel/sched/sched.h · 5e6521ea

由 Li Zefan 提交于 3月 05, 2013

Move struct sched_group_power and sched_group and related inline
functions to kernel/sched/sched.h, as they are used internally
only.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A77F.2010705@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5e6521ea

sched: Move SCHED_LOAD_SHIFT macros to kernel/sched/sched.h · cc1f4b1f

由 Li Zefan 提交于 3月 05, 2013

They are used internally only.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5135A771.4070104@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cc1f4b1f

28 2月, 2013 1 次提交

hlist: drop the node parameter from iterators · b67bfe0d

由 Sasha Levin 提交于 2月 27, 2013

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b67bfe0d

24 2月, 2013 2 次提交

cputime: Use local_clock() for full dynticks cputime accounting · 7f6575f1

由 Frederic Weisbecker 提交于 2月 23, 2013

Running the full dynticks cputime accounting with preemptible
kernel debugging trigger the following warning:

	[    4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
	[    4.490971] caller is native_sched_clock+0x22/0x80
	[    4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
	[    4.496376] Call Trace:
	[    4.498996]  [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
	[    4.501716]  [<ffffffff8101e642>] native_sched_clock+0x22/0x80
	[    4.504434]  [<ffffffff8101db99>] sched_clock+0x9/0x10
	[    4.507185]  [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
	[    4.509916]  [<ffffffff81096dd5>] task_cputime+0x35/0x60
	[    4.512622]  [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
	[    4.515372]  [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
	[    4.518117]  [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
	[    4.520844]  [<ffffffff81867a10>] ? rest_init+0x160/0x160
	[    4.523554]  [<ffffffff8117d457>] do_execve+0x37/0x40
	[    4.526276]  [<ffffffff810021a3>] run_init_process+0x23/0x30
	[    4.528953]  [<ffffffff81867aac>] kernel_init+0x9c/0xf0
	[    4.531608]  [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0

We use sched_clock() to perform and fixup the cputime
accounting. However we are calling it with preemption enabled
from the read side, which trigger the bug above.

To fix this up, use local_clock() instead. It takes care of
preemption and also provide a more reliable clock source. This
is welcome for this kind of statistic that is widely relied on
in userspace.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Reported-by: NIngo Molnar <mingo@kernel.org>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f6575f1

sched: do not use cpu_to_node() to find an offlined cpu's node. · aa00d89c

由 Tang Chen 提交于 2月 22, 2013

If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu)
will return -1.  As a result, cpumask_of_node(nid) will return NULL.  In
this case, find_next_bit() in for_each_cpu will get a NULL pointer and
cause panic.

Here is a call trace:
  Call Trace:
   <IRQ>
    select_fallback_rq+0x71/0x190
    try_to_wake_up+0x2cb/0x2f0
    wake_up_process+0x15/0x20
    hrtimer_wakeup+0x22/0x30
    __run_hrtimer+0x83/0x320
    hrtimer_interrupt+0x106/0x280
    smp_apic_timer_interrupt+0x69/0x99
    apic_timer_interrupt+0x6f/0x80

There is a hrtimer process sleeping, whose cpu has already been
offlined.  When it is waken up, it tries to find another cpu to run, and
get a -1 nid.  As a result, cpumask_of_node(-1) returns NULL, and causes
ernel panic.

This patch fixes this problem by judging if the nid is -1.  If nid is
not -1, a cpu on the same node will be picked.  Else, a online cpu on
another node will be picked.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa00d89c

22 2月, 2013 1 次提交

sched: Fix /proc/sched_debug failure on very very large systems · bbbfeac9

由 Nathan Zimmer 提交于 2月 21, 2013

On systems with 4096 cores attemping to read /proc/sched_debug
fails because we are trying to push all the data into a single
kmalloc buffer.

The issue is on these very large machines all the data will not
fit in 4mb.

A better solution is to not us the single_open mechanism but to
provide our own seq_operations and treat each cpu as an
individual record.

The output should be identical to the previous version.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>)
[ Whitespace fixlet]
[ Fix spello in comment]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

bbbfeac9