提交 · e788892ba3cc71d385b75895f7a375fbc659ce86 · openanolis / cloud-kernel

03 6月, 2016 1 次提交

cpufreq: governor: Get rid of governor events · e788892b

由 Rafael J. Wysocki 提交于 6月 02, 2016

The design of the cpufreq governor API is not very straightforward,
as struct cpufreq_governor provides only one callback to be invoked
from different code paths for different purposes.  The purpose it is
invoked for is determined by its second "event" argument, causing it
to act as a "callback multiplexer" of sorts.

Unfortunately, that leads to extra complexity in governors, some of
which implement the ->governor() callback as a switch statement
that simply checks the event argument and invokes a separate function
to handle that specific event.

That extra complexity can be eliminated by replacing the all-purpose
->governor() callback with a family of callbacks to carry out specific
governor operations: initialization and exit, start and stop and policy
limits updates.  That also turns out to reduce the code size too, so
do it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

e788892b

25 5月, 2016 1 次提交

sched/core: Fix remote wakeups · b7e7ade3

由 Peter Zijlstra 提交于 5月 23, 2016

Commit:

  b5179ac7 ("sched/fair: Prepare to fix fairness problems on migration")

... introduced a bug: Mike Galbraith found that it introduced a
performance regression, while Paul E. McKenney reported lost
wakeups and bisected it to this commit.

The reason is that I mis-read ttwu_queue() such that I assumed any
wakeup that got a remote queue must have had the task migrated.

Since this is not so; we need to transfer this information between
queueing the wakeup and actually doing the wakeup. Use a new
task_struct::sched_flag for this, we already write to
sched_contributes_to_load in the wakeup path so this is a hot and
modified cacheline.
Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Fixes: b5179ac7 ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b7e7ade3

19 5月, 2016 1 次提交

cpufreq: schedutil: Improve prints messages with pr_fmt · 60f05e86

由 Viresh Kumar 提交于 5月 18, 2016

Prefix print messages with KBUILD_MODNAME, i.e 'cpufreq_schedutil: '.
This helps to keep similar formatting for all the print messages
particular to a file and identify those easily in kernel logs.

Its already done this way for rest of the governors.

Along with that, remove the (now) redundant bits from a print message.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

60f05e86

12 5月, 2016 10 次提交

sched/core: Provide a tsk_nr_cpus_allowed() helper · 50605ffb

由 Thomas Gleixner 提交于 5月 11, 2016

tsk_nr_cpus_allowed() is an accessor for task->nr_cpus_allowed which allows
us to change the representation of ->nr_cpus_allowed if required.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-2-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

50605ffb

sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed · ade42e09

由 Thomas Gleixner 提交于 5月 11, 2016

Use the future-safe accessor for struct task_struct's.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462969411-17735-1-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

ade42e09

sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems · 20878232

由 Vik Heyndrickx 提交于 4月 28, 2016

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.
Tested-by: NDamien Wyart <damien.wyart@free.fr>
Signed-off-by: NVik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

20878232

sched/fair: Correct unit of load_above_capacity · cfa10334

由 Morten Rasmussen 提交于 4月 29, 2016

In calculate_imbalance() load_above_capacity currently has the unit
[capacity] while it is used as being [load/capacity]. Not only is it
wrong it also makes it unlikely that load_above_capacity is ever used
as the subsequent code picks the smaller of load_above_capacity and
the avg_load

This patch ensures that load_above_capacity has the right unit
[load/capacity].
Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
[ Changed changelog to note it was in capacity unit; +rebase. ]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cfa10334

sched/fair: Clean up scale confusion · 1be0eb2a

由 Peter Zijlstra 提交于 5月 06, 2016

Wanpeng noted that the scale_load_down() in calculate_imbalance() was
weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
to compare against busiest->group_capacity, which is in [capacity]
units.
Reported-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

1be0eb2a

sched/nohz: Fix affine unpinned timers mess · 44496922

由 Wanpeng Li 提交于 5月 04, 2016

The following commit:

9642d18e ("nohz: Affine unpinned timers to housekeepers")'

intended to affine unpinned timers to housekeepers:

However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
intention to:

unpinned timers(full dynaticks, idle) => any housekeepers(no mattter cpu topology)
unpinned timers(full dynaticks, busy) => any housekeepers(no mattter cpu topology)
unpinned timers(housekeepers, idle) => any busy cpus(otherwise, fallback to any housekeepers)

This patch fixes it by checking if there are busy housekeepers nearby,
otherwise falls to any housekeepers/itself. After the patch:

unpinned timers(full dynaticks, idle) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(full dynaticks, busy) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
unpinned timers(housekeepers, idle) => nearest busy housekeepers(otherwise, fallback to itself)
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
[ Fixed the changelog. ]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 'commit 9642d18e ("nohz: Affine unpinned timers to housekeepers")'
Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

44496922

sched/fair: Fix fairness issue on migration · 2f950354

由 Peter Zijlstra 提交于 5月 11, 2016

Pavan reported that in the presence of very light tasks (or cgroups)
the placement of migrated tasks can cause severe fairness issues.

The problem is that enqueue_entity() places the task before it updates
time, thereby it can place the task far in the past (remember that
light tasks will shoot virtual time forward at a high speed, so in
relation to the pre-existing light task, we can land far in the past).

This is done because update_curr() needs the current task, and we
might be placing the current task.

The obvious solution is to differentiate between the current and any
other task; placing the current before we update time, and placing any
other task after, such that !curr tasks end up at the current moment
in time, and not in the past.

This commit re-introduces the previously reverted commit:

  3a47d512 ("sched/fair: Fix fairness issue on migration")

... which is now safe to do, after we've also fixed another
underlying bug first, in:

  sched/fair: Prepare to fix fairness problems on migration

and cleaned up other details in the migration code:

  sched/core: Kill sched_class::task_waking
Reported-by: NPavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2f950354

sched/core: Kill sched_class::task_waking to clean up the migration logic · 59efa0ba

由 Peter Zijlstra 提交于 5月 10, 2016

With sched_class::task_waking being called only when we do
set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
and eliminate sched_class::task_waking entirely.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

59efa0ba

sched/fair: Prepare to fix fairness problems on migration · b5179ac7

由 Peter Zijlstra 提交于 5月 11, 2016

Mike reported that our recent attempt to fix migration problems:

  3a47d512 ("sched/fair: Fix fairness issue on migration")

broke interactivity and the signal starve test. We reverted that
commit and now let's try it again more carefully, with some other
underlying problems fixed first.

One problem is that I assumed ENQUEUE_WAKING was only set when we do a
cross-cpu wakeup (migration), which isn't true. This means we now
destroy the vruntime history of tasks and wakeup-preemption suffers.

Cure this by making my assumption true, only call
sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
the indirect call in the case we do a local wakeup.
Reported-by: NMike Galbraith <mgalbraith@suse.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Fixes: 3a47d512 ("sched/fair: Fix fairness issue on migration")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

b5179ac7

sched/fair: Move record_wakee() · c58d25f3

由 Peter Zijlstra 提交于 5月 12, 2016

Since I want to make ->task_woken() conditional on the task getting
migrated, we cannot use it to call record_wakee().

Move it to select_task_rq_fair(), which gets called in almost all the
same conditions. The only exception is if the woken task (@p) is
CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c58d25f3

11 5月, 2016 1 次提交

Revert "sched/fair: Fix fairness issue on migration" · 53d3bc77

由 Ingo Molnar 提交于 5月 11, 2016

Mike reported that this recent commit:

  3a47d512 ("sched/fair: Fix fairness issue on migration")

... broke interactivity and the signal starvation test.

We have a proper fix series in the works but ran out of time for
v4.6, so revert the commit.
Reported-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

53d3bc77

10 5月, 2016 1 次提交

sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02

由 Xunlei Pang 提交于 5月 09, 2016

We got this warning:

    WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
    [...]
    Call Trace:

    dump_stack+0x63/0x87
    __warn+0xd1/0xf0
    warn_slowpath_null+0x1d/0x20
    set_task_cpu+0x1af/0x1c0
    push_dl_task.part.34+0xea/0x180
    push_dl_tasks+0x17/0x30
    __balance_callback+0x45/0x5c
    __sched_setscheduler+0x906/0xb90
    SyS_sched_setattr+0x150/0x190
    do_syscall_64+0x62/0x110
    entry_SYSCALL64_slow_path+0x25/0x25

This corresponds to:

    WARN_ON_ONCE(p->state == TASK_RUNNING &&
             p->sched_class == &fair_sched_class &&
             (p->on_rq && !task_on_rq_migrating(p)))

It happens because in find_lock_later_rq(), the task whose scheduling
class was changed to fair class is still pushed away as if it were
a deadline task ...

So, check in find_lock_later_rq() after double_lock_balance(), if the
scheduling class of the deadline task was changed, break and retry.

Apply the same logic to RT tasks.
Signed-off-by: NXunlei Pang <xlpang@redhat.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

13b5ab02

09 5月, 2016 2 次提交

sched/core: Fix comment typo in wake_q_add() · 58fe9c46

由 Davidlohr Bueso 提交于 5月 08, 2016

... the comment clearly refers to wake_up_q(), and not
wake_up_list().
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1462766290-28664-1-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

58fe9c46

sched/core: Remove unused variable · 8c5e9554

由 Muhammad Falak R Wani 提交于 5月 05, 2016

Remove unused variable 'ret', and directly return 0.
Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462441879-10092-1-git-send-email-falakreyaz@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8c5e9554

07 5月, 2016 1 次提交

sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage · 536bd00c

由 Rafael J. Wysocki 提交于 5月 06, 2016

The following commit:

  34e2c555 ("cpufreq: Add mechanism for registering utilization update callbacks")

overlooked the fact that update_load_avg(), where CFS invokes cpufreq
utilization update callbacks, becomes an empty stub on UP kernels.

In consequence, if !CONFIG_SMP, cpufreq governors are never invoked
from CFS and they do not have a chance to evaluate CPU performace
levels and update them often enough.

Needless to say, things don't work as expected then.

Fix the problem by making the !CONFIG_SMP stub of update_load_avg()
invoke cpufreq update callbacks too.
Reported-by: NSteve Muckle <steve.muckle@linaro.org>
Tested-by: NSteve Muckle <steve.muckle@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSteve Muckle <steve.muckle@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Fixes: 34e2c555 (cpufreq: Add mechanism for registering utilization update callbacks)
Link: http://lkml.kernel.org/r/6282396.VVEdgVYxO3@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

536bd00c

06 5月, 2016 13 次提交

sched: Make hrtick_notifier an explicit call · e5ef27d0

由 Thomas Gleixner 提交于 3月 10, 2016

No need for an extra notifier. We don't need to handle all these states. It's
sufficient to kill the timer when the cpu dies.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.770528462@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e5ef27d0

sched/fair: Make ilb_notifier an explicit call · 20a5c8cc

由 Thomas Gleixner 提交于 3月 10, 2016

No need for an extra notifier.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.693720241@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

20a5c8cc

sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() · f2785ddb

由 Thomas Gleixner 提交于 3月 10, 2016

Remove the hotplug notifier and make it an explicit state.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.502222097@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f2785ddb

sched/migration: Move CPU_ONLINE into scheduler state · 7d976699

由 Thomas Gleixner 提交于 3月 10, 2016

The alleged requirement that the migration notifier has a lower priority than
perf is completely undocumented and there is no indication at all that this is
true. perf does not even handle the CPU_ONLINE notification and perf really
has nothing to do with migration.

Move the CPU_ONLINE code into the sched_activate_cpu() state callback.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.421743581@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7d976699

sched/migration: Move calc_load_migrate() into CPU_DYING · e9cd8fa4

由 Thomas Gleixner 提交于 3月 10, 2016

It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.328739226@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e9cd8fa4

sched/migration: Move prepare transition to SCHED_STARTING state · 94baf7a5

由 Thomas Gleixner 提交于 3月 10, 2016

We can piggy pack that on the SCHED_STARTING state. It's not required before
the cpu actually comes online. Name the function proper as it has nothing to
do with migration.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.248226511@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

94baf7a5

sched/hotplug: Move sync_rcu to be with set_cpu_active(false) · b2454caa

由 Peter Zijlstra 提交于 3月 10, 2016

The sync_rcu stuff is specificically for clearing bits in the active
mask, such that everybody will observe the bit cleared and will not
consider the cleared CPU for load-balancing etc.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.169219710@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b2454caa

sched/hotplug: Convert cpu_[in]active notifiers to state machine · 40190a78

由 Thomas Gleixner 提交于 3月 10, 2016

Now that we reduced everything into single notifiers, it's simple to move them
into the hotplug state machine space.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

40190a78

sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE · c6d2c747

由 Thomas Gleixner 提交于 3月 10, 2016

This is the last operation on the cpu before vanishing. No point in calling
that on CPU_DEAD.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c6d2c747

sched: Consolidate the notifier maze · 135fb3e1

由 Thomas Gleixner 提交于 3月 10, 2016

We can maintain the ordering of the scheduler cpu hotplug functionality nicely
in one notifer. Get rid of the maze.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

135fb3e1

sched: Allow hotplug notifiers to be setup early · e26fbffd

由 Thomas Gleixner 提交于 3月 10, 2016

Prevent the SMP scheduler related notifiers to be executed before the smp
scheduler is initialized and install them early.

This is a preparatory change for further consolidation of the hotplug notifier
maze.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e26fbffd

sched: Make set_cpu_rq_start_time() a built in hotplug state · 9cf7243d

由 Thomas Gleixner 提交于 3月 10, 2016

Start distangling the maze of hotplug notifiers in the scheduler.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9cf7243d

sched: Allow per-cpu kernel threads to run on online && !active · e9d867a6

由 Peter Zijlstra (Intel) 提交于 3月 10, 2016

In order to enable symmetric hotplug, we must mirror the online &&
!active state of cpu-down on the cpu-up side.

However, to retain sanity, limit this state to per-cpu kthreads.

Aside from the change to set_cpus_allowed_ptr(), which allow moving
the per-cpu kthreads on, the other critical piece is the cpu selection
for pinned tasks in select_task_rq(). This avoids dropping into
select_fallback_rq().

select_fallback_rq() cannot be allowed to select !active cpus because
its used to migrate user tasks away. And we do not want to move user
tasks onto cpus that are in transition.
Requested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160301152303.GV6356@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e9d867a6

05 5月, 2016 9 次提交

sched/fair: Fix comment in calculate_imbalance() · 885e542c

由 Dietmar Eggemann 提交于 4月 29, 2016

The comment in calculate_imbalance() was introduced in commit:

 2dd73a4f ("[PATCH] sched: implement smpnice")

which described the logic as it was then, but a later commit:

  b1885550 ("sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()")

.. complicated this logic some more so that the comment does not match anymore.

Update the comment to match the code.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461958364-675-3-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

885e542c

sched/fair: Remove stale power aware scheduling comments · 0a9b23ce

由 Dietmar Eggemann 提交于 4月 29, 2016

Commit 8e7fbcbc ("sched: Remove stale power aware scheduling remnants
and dysfunctional knobs") deleted the power aware scheduling support.

This patch gets rid of the remaining power aware scheduling related
comments in the code as well.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461958364-675-2-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a9b23ce

sched/fair: Update rq clock before updating nohz CPU load · b52fad2d

由 Matt Fleming 提交于 5月 03, 2016

If we're accessing rq_clock() (e.g. in sched_avg_update()) we should
update the rq clock before calling cpu_load_update(), otherwise any
time calculations will be stale.

All other paths currently call update_rq_clock().
Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462304814-11715-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>

b52fad2d

sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernels · db6ea2fb

由 Wanpeng Li 提交于 5月 03, 2016

The max_idle_balance_cost and avg_idle values which are tracked and ar used to
capture short idle incidents, are not associated with schedstats, however the
information of these two values isn't printed out on !CONFIG_SCHEDSTATS kernels.

Fix this by moving the value printout out of the CONFIG_SCHEDSTATS section.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462250305-4523-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

db6ea2fb

sched/fair: Optimize sum computation with a lookup table · 7b20b916

由 Yuyang Du 提交于 5月 03, 2016

__compute_runnable_contrib() uses a loop to compute sum, whereas a
table lookup can do it faster in a constant amount of time.

The program to generate the constants is located at:

  Documentation/scheduler/sched-avg.txt
Signed-off-by: NYuyang Du <yuyang.du@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NMorten Rasmussen <morten.rasmussen@arm.com>
Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@arm.com
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/1462226078-31904-2-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7b20b916

sched/fair: Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT and remove SCHED_LOAD_SCALE · 172895e6

由 Yuyang Du 提交于 4月 05, 2016

After cleaning up the sched metrics, there are two definitions that are
ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT.

Resolve this:

 - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what
   it is.

 - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE.
Suggested-by: NBen Segall <bsegall@google.com>
Signed-off-by: NYuyang Du <yuyang.du@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com
[ Rewrote the changelog and fixed the build on 32-bit kernels. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

172895e6

sched/fair: Generalize the load/util averages resolution definition · 6ecdd749

由 Yuyang Du 提交于 4月 05, 2016

Integer metric needs fixed point arithmetic. In sched/fair, a few
metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity,
may have different fixed point ranges, which makes their update and
usage error-prone.

In order to avoid the errors relating to the fixed point range, we
definie a basic fixed point range, and then formalize all metrics to
base on the basic range.

The basic range is 1024 or (1 << 10). Further, one can recursively
apply the basic range to have larger range.

Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has
1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they
must be well calibrated.
Signed-off-by: NYuyang Du <yuyang.du@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: umgwanakikbuti@gmail.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6ecdd749

sched/core: Enable increased load resolution on 64-bit kernels · 2159197d

由 Peter Zijlstra 提交于 4月 28, 2016

Mike ran into the low load resolution limitation on his big machine.

So reenable these bits; nobody could ever reproduce/analyze the
reported power usage claim and Google has been running with this for
years as well.
Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2159197d

locking/lockdep, sched/core: Implement a better lock pinning scheme · e7904a28

由 Peter Zijlstra 提交于 8月 01, 2015

The problem with the existing lock pinning is that each pin is of
value 1; this mean you can simply unpin if you know its pinned,
without having any extra information.

This scheme generates a random (16 bit) cookie for each pin and
requires this same cookie to unpin. This means you have to keep the
cookie in context.

No objsize difference for !LOCKDEP kernels.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e7904a28

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功