- 30 9月, 2016 2 次提交
-
-
由 Peter Zijlstra 提交于
select_idle_siblings() is a known pain point for a number of workloads; it either does too much or not enough and sometimes just does plain wrong. This rewrite attempts to address a number of issues (but sadly not all). The current code does an unconditional sched_domain iteration; with the intent of finding an idle core (on SMT hardware). The problems which this patch tries to address are: - its pointless to look for idle cores if the machine is real busy; at which point you're just wasting cycles. - it's behaviour is inconsistent between SMT and !SMT hardware in that !SMT hardware ends up doing a scan for any idle CPU in the LLC domain, while SMT hardware does a scan for idle cores and if that fails, falls back to a scan for idle threads on the 'target' core. The new code replaces the sched_domain scan with 3 explicit scans: 1) search for an idle core in the LLC 2) search for an idle CPU in the LLC 3) search for an idle thread in the 'target' core where 1 and 3 are conditional on SMT support and 1 and 2 have runtime heuristics to skip the step. Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT siblings of the CPU going idle. Similarly, we clear sd_llc_shared->has_idle_cores when we fail to find an idle core. Step 2) tracks the average cost of the scan and compares this to the average idle time guestimate for the CPU doing the wakeup. There is a significant fudge factor involved to deal with the variability of the averages. Esp. hackbench was sensitive to this. Step 3) is unconditional; we assume (also per step 1) that scanning all SMT siblings in a core is 'cheap'. With this; SMT systems gain step 2, which cures a few benchmarks -- notably one from Facebook. One 'feature' of the sched_domain iteration, which we preserve in the new code, is that it would start scanning from the 'target' CPU, instead of scanning the cpumask in cpu id order. This avoids multiple CPUs in the LLC scanning for idle to gang up and find the same CPU quite as much. The down side is that tasks can end up hopping across the LLC for no apparent reason. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc location into the much more natural sched_domain_shared location. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 8月, 2016 2 次提交
-
-
由 Dietmar Eggemann 提交于
To be able to compare the capacity of the target CPU with the highest available CPU capacity, store the maximum per-CPU capacity in the root domain. The max per-CPU capacity should be 1024 for all systems except SMT, where the capacity is currently based on smt_gain and the number of hardware threads and is <1024. If SMT can be brought to work with a per-thread capacity of 1024, this patch can be dropped and replaced by a hard-coded max capacity of 1024 (=SCHED_CAPACITY_SCALE). Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/26c69258-9947-f830-a53e-0c54e7750646@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The nohz_stamp member of struct rq has been unused since 2010, when this commit removed the code that referenced it: 396e894d ("sched: Revert nohz_ratelimit() for now") Signed-off-by: NRik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160815121410.5ea1c98f@annuminas.surriel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into account that the function is now called from a thread running on the outgoing CPU. As a result a cpu unplug leakes a load of 1 into the global load accounting mechanism. Fix it by adjusting for the currently running thread which calls calc_load_migrate(). Reported-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: rt@linutronix.de Cc: shreyas@linux.vnet.ibm.com Fixes: e9cd8fa4: ("sched/migration: Move calc_load_migrate() into CPU_DYING") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 6月, 2016 3 次提交
-
-
由 Peter Zijlstra 提交于
Since we already take rq->lock when creating a cgroup, use it to also sync the throttle_count and avoid the extra state and enqueue path branch. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: linux-kernel@vger.kernel.org [ Fixed build warning. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
A future patch needs rq->lock held _after_ we link the task_group into the hierarchy. In order to avoid taking every rq->lock twice, reorder things a little and create online_fair_sched_group() to be called after we link the task_group. All this code is still ran from css_alloc() so css_online() isn't in fact used for this. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
A new fair task is detached and attached from/to task_group with: cgroup_post_fork() ss->fork(child) := cpu_cgroup_fork() sched_move_task() task_move_group_fair() Which is wrong, because at this point in fork() the task isn't fully initialized and it cannot 'move' to another group, because its not attached to any group as yet. In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we can just call this small part directly instead sched_move_task(). And the task doesn't really migrate because it is not yet attached so we need the following sequence: do_fork() sched_fork() __set_task_cpu() cgroup_post_fork() set_task_rq() # set task group and runqueue wake_up_new_task() select_task_rq() can select a new cpu __set_task_cpu post_init_entity_util_avg attach_task_cfs_rq() activate_task enqueue_task This patch makes that happen. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> [ Added TASK_SET_GROUP to set depth properly. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 6月, 2016 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Cgroup created inside throttled group must inherit current throttle_count. Broken throttle_count allows to nominate throttled entries as a next buddy, later this leads to null pointer dereference in pick_next_task_fair(). This patch initialize cfs_rq->throttle_count at first enqueue: laziness allows to skip locking all rq at group creation. Lazy approach also allows to skip full sub-tree scan at throttling hierarchy (not in this patch). Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 6月, 2016 2 次提交
-
-
由 Peter Zijlstra 提交于
This new form allows using hardware assisted waiting. Some hardware (ARM64 and x86) allow monitoring an address for changes, so by providing a pointer we can use this to replace the cpu_relax() with hardware optimized methods in the future. Requested-by: NWill Deacon <will.deacon@arm.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
Commit: e9532e69 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") ... set rq->prev_* to 0 after a CPU hotplug comes back, in order to fix the case where (after CPU hotplug) steal time is smaller than rq->prev_steal_time. However, this should never happen. Steal time was only smaller because of the KVM-specific bug fixed by the previous patch. Worse, the previous patch triggers a bug on CPU hot-unplug/plug operation: because rq->prev_steal_time is cleared, all of the CPU's past steal time will be accounted again on hot-plug. Since the root cause has been fixed, we can just revert commit e9532e69. Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 'commit e9532e69 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 5月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
With sched_class::task_waking being called only when we do set_task_cpu(), we can make sched_class::migrate_task_rq() do the work and eliminate sched_class::task_waking entirely. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Pavan Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 5月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
No need for an extra notifier. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160310120025.693720241@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 05 5月, 2016 6 次提交
-
-
由 Yuyang Du 提交于
After cleaning up the sched metrics, there are two definitions that are ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT. Resolve this: - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what it is. - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE. Suggested-by: NBen Segall <bsegall@google.com> Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com [ Rewrote the changelog and fixed the build on 32-bit kernels. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yuyang Du 提交于
Integer metric needs fixed point arithmetic. In sched/fair, a few metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity, may have different fixed point ranges, which makes their update and usage error-prone. In order to avoid the errors relating to the fixed point range, we definie a basic fixed point range, and then formalize all metrics to base on the basic range. The basic range is 1024 or (1 << 10). Further, one can recursively apply the basic range to have larger range. Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has 1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they must be well calibrated. Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mike ran into the low load resolution limitation on his big machine. So reenable these bits; nobody could ever reproduce/analyze the reported power usage claim and Google has been running with this for years as well. Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com> Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The problem with the existing lock pinning is that each pin is of value 1; this mean you can simply unpin if you know its pinned, without having any extra information. This scheme generates a random (16 bit) cookie for each pin and requires this same cookie to unpin. This means you have to keep the cookie in context. No objsize difference for !LOCKDEP kernels. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In order to be able to pass around more than just the IRQ flags in the future, add a rq_flags structure. No difference in code generation for the x86_64-defconfig build I tested. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Its a rather large function, inline doesn't seems to make much sense: $ size defconfig-build/kernel/sched/core.o{.orig,} text data bss dec hex filename 56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig 55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o The 'perf bench sched messaging' micro-benchmark shows a visible improvement of 4-5%: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000 pre: 4.582798193 seconds time elapsed ( +- 1.41% ) 4.733374877 seconds time elapsed ( +- 2.10% ) 4.560955136 seconds time elapsed ( +- 1.43% ) 4.631062303 seconds time elapsed ( +- 1.40% ) post: 4.364765213 seconds time elapsed ( +- 0.91% ) 4.454442734 seconds time elapsed ( +- 1.18% ) 4.448893817 seconds time elapsed ( +- 1.41% ) 4.424346872 seconds time elapsed ( +- 0.97% ) Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 4月, 2016 2 次提交
-
-
由 Frederic Weisbecker 提交于
Some code in CPU load update only concern NO_HZ configs but it is built on all configurations. When NO_HZ isn't built, that code is harmless but just happens to take some useless ressources in CPU and memory: 1) one useless field in struct rq 2) jiffies record on every tick that is never used (cpu_load_update_periodic) 3) decay_load_missed is called two times on every tick to eventually return immediately with no action taken. And that function is dead code. For pure optimization purposes, lets conditionally build the NO_HZ related code. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
The CPU load update related functions have a weak naming convention currently, starting with update_cpu_load_*() which isn't ideal as "update" is a very generic concept. Since two of these functions are public already (and a third is to come) that's enough to introduce a more conventional naming scheme. So let's do the following rename instead: update_cpu_load_*() -> cpu_load_update_*() Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 4月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Add a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit 34e2c555 (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is relatively simple. The frequency selection formula used by it depends on whether or not the utilization is frequency-invariant. In the frequency-invariant case the new CPU frequency is given by next_freq = 1.25 * max_freq * util / max where util and max are the last two arguments of cpufreq_update_util(). In turn, if util is not frequency-invariant, the maximum frequency in the above formula is replaced with the current frequency of the CPU: next_freq = 1.25 * curr_freq * util / max The coefficient 1.25 corresponds to the frequency tipping point at (util / max) = 0.8. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The governor supports fast frequency switching if that is supported by the cpufreq driver in use and possible for the given policy. In the fast switching case, all operations of the governor take place in its utilization update handlers. If fast switching cannot be used, the frequency switch operations are carried out with the help of a work item which only calls __cpufreq_driver_target() (under a mutex) to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more subtle and specifically targeted at RT and DL tasks. The governor shares some tunables management code with the "ondemand" and "conservative" governors and uses some common definitions from cpufreq_governor.h, but apart from that it is stand-alone. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 31 3月, 2016 1 次提交
-
-
由 Yuyang Du 提交于
A new task's util_avg is set to full utilization of a CPU (100% time running). This accelerates a new task's utilization ramp-up, useful to boost its execution in early time. However, it may result in (insanely) high utilization for a transient time period when a flood of tasks are spawned. Importantly, it violates the "fundamentally bounded" CPU utilization, and its side effect is negative if we don't take any measure to bound it. This patch proposes an algorithm to address this issue. It has two methods to approach a sensible initial util_avg: (1) An expected (or average) util_avg based on its cfs_rq's util_avg: util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight (2) A trajectory of how successive new tasks' util develops, which gives 1/2 of the left utilization budget to a new task such that the additional util is noticeably large (when overall util is low) or unnoticeably small (when overall util is high enough). In the meantime, the aggregate utilization is well bounded: util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n where n denotes the nth task. If util_avg is larger than util_avg_cap, then the effective util is clamped to the util_avg_cap. Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Link: http://lkml.kernel.org/r/1459283456-21682-1-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 3月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 09 3月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Introduce a mechanism by which parts of the cpufreq subsystem ("setpolicy" drivers or the core) can register callbacks to be executed from cpufreq_update_util() which is invoked by the scheduler's update_load_avg() on CPU utilization changes. This allows the "setpolicy" drivers to dispense with their timers and do all of the computations they need and frequency/voltage adjustments in the update_load_avg() code path, among other things. The update_load_avg() changes were suggested by Peter Zijlstra. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NIngo Molnar <mingo@kernel.org>
-
- 05 3月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit 095c0aa8 "sched: adjust scheduler cpu power for stolen time" Fixes: commit aa483808 "sched: Remove irq time from available CPU power" Fixes: commit e6e6685a "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 3月, 2016 2 次提交
-
-
由 Frederic Weisbecker 提交于
Instead of providing asynchronous checks for the nohz subsystem to verify sched tick dependency, migrate sched to the new mask. Everytime a task is enqueued or dequeued, we evaluate the state of the tick dependency on top of the policy of the tasks in the runqueue, by order of priority: SCHED_DEADLINE: Need the tick in order to periodically check for runtime SCHED_FIFO : Don't need the tick (no round-robin) SCHED_RR : Need the tick if more than 1 task of the same priority for round robin (simplified with checking if more than one SCHED_RR task no matter what priority). SCHED_NORMAL : Need the tick if more than 1 task for round-robin. We could optimize that further with one flag per sched policy on the tick dependency mask and perform only the checks relevant to the policy concerned by an enqueue/dequeue operation. Since the checks aren't based on the current task anymore, we could get rid of the task switch hook but it's still needed for posix cpu timers. Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
In order to evaluate the scheduler tick dependency without probing context switches, we need to know how much SCHED_RR and SCHED_FIFO tasks are enqueued as those policies don't have the same preemption requirements. To prepare for that, let's account SCHED_RR tasks, we'll be able to deduce SCHED_FIFO tasks as well from it and the total RT tasks in the runqueue. Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 29 2月, 2016 4 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is configured. As debug.c is only compiled when SCHED_DEBUG is configured as well, move the setup of sched_domain_sysctl into that file. Note, the (un)register_sched_domain_sysctl() functions had to be changed from static to allow access to them from core.c. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Clark Williams <williams@redhat.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160222212825.599278093@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Reported-by: NAndrea Parri <parri.andrea@gmail.com> Tested-by: NAndrea Parri <parri.andrea@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dongsheng Yang 提交于
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452674558-31897-1-git-send-email-yangds.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
When a cgroup's CPU runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (->css_free() will be, but we're also in an RCU callback). Put it in the ->css_offline() callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The ->css_offline() callbacks are called in hierarchical order after the following v4.4 commit: aa226ff4 ("cgroup: make sure a parent css isn't offlined before its children") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 2月, 2016 1 次提交
-
-
由 Mel Gorman 提交于
schedstats is very useful during debugging and performance tuning but it incurs overhead to calculate the stats. As such, even though it can be disabled at build time, it is often enabled as the information is useful. This patch adds a kernel command-line and sysctl tunable to enable or disable schedstats on demand (when it's built in). It is disabled by default as someone who knows they need it can also learn to enable it when necessary. The benefits are dependent on how scheduler-intensive the workload is. If it is then the patch reduces the number of cycles spent calculating the stats with a small benefit from reducing the cache footprint of the scheduler. These measurements were taken from a 48-core 2-socket machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a single socket machine 8-core machine with Intel i7-3770 processors. netperf-tcp 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%) Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%) Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%) Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%) Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%) Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%) Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%) Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%) Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%) Small gains here, UDP_STREAM showed nothing intresting and neither did the TCP_RR tests. The gains on the 8-core machine were very similar. tbench4 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%) Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%) Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%) Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%) Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%) Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%) Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%) Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%) Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%) Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%) Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%) Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%) Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%) Small gains of 2-4% at low thread counts and otherwise flat. The gains on the 8-core machine were slightly different tbench4 on 8-core i7-3770 single socket machine Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%) Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%) Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%) Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%) Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%) Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%) Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%) Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%) Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%) Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%) In constract, this shows a relatively steady 2-3% gain at higher thread counts. Due to the nature of the patch and the type of workload, it's not a surprise that the result will depend on the CPU used. hackbench-pipes 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%) Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%) Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%) Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%) Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%) Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%) Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%) Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%) Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%) Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%) Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%) Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%) Some small gains and losses and while the variance data is not included, it's close to the noise. The UMA machine did not show anything particularly different pipetest 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v2r2 Min Time 4.13 ( 0.00%) 3.99 ( 3.39%) 1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%) 2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%) 3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%) Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%) Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%) Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%) Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%) Max Time 4.93 ( 0.00%) 4.83 ( 2.03%) Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%) Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%) Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%) Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%) Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%) Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%) Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%) Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%) Small improvement and similar gains were seen on the UMA machine. The gain is small but it stands to reason that doing less work in the scheduler is a good thing. The downside is that the lack of schedstats and tracepoints may be surprising to experts doing performance analysis until they find the existence of the schedstats= parameter or schedstats sysctl. It will be automatically activated for latencytop and sleep profiling to alleviate the problem. For tracepoints, there is a simple warning as it's not safe to activate schedstats in the context when it's known the tracepoint may be wanted but is unavailable. Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 12月, 2015 5 次提交
-
-
由 Waiman Long 提交于
If a system with large number of sockets was driven to full utilization, it was found that the clock tick handling occupied a rather significant proportion of CPU time when fair group scheduling and autogroup were enabled. Running a java benchmark on a 16-socket IvyBridge-EX system, the perf profile looked like: 10.52% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt 9.66% 0.05% java [kernel.vmlinux] [k] hrtimer_interrupt 8.65% 0.03% java [kernel.vmlinux] [k] tick_sched_timer 8.56% 0.00% java [kernel.vmlinux] [k] update_process_times 8.07% 0.03% java [kernel.vmlinux] [k] scheduler_tick 6.91% 1.78% java [kernel.vmlinux] [k] task_tick_fair 5.24% 5.04% java [kernel.vmlinux] [k] update_cfs_shares In particular, the high CPU time consumed by update_cfs_shares() was mostly due to contention on the cacheline that contained the task_group's load_avg statistical counter. This cacheline may also contains variables like shares, cfs_rq & se which are accessed rather frequently during clock tick processing. This patch moves the load_avg variable into another cacheline separated from the other frequently accessed variables. It also creates a cacheline aligned kmemcache for task_group to make sure that all the allocated task_group's are cacheline aligned. By doing so, the perf profile became: 9.44% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt 8.74% 0.01% java [kernel.vmlinux] [k] hrtimer_interrupt 7.83% 0.03% java [kernel.vmlinux] [k] tick_sched_timer 7.74% 0.00% java [kernel.vmlinux] [k] update_process_times 7.27% 0.03% java [kernel.vmlinux] [k] scheduler_tick 5.94% 1.74% java [kernel.vmlinux] [k] task_tick_fair 4.15% 3.92% java [kernel.vmlinux] [k] update_cfs_shares The %cpu time is still pretty high, but it is better than before. The benchmark results before and after the patch was as follows: Before patch - Max-jOPs: 907533 Critical-jOps: 134877 After patch - Max-jOPs: 916011 Critical-jOps: 142366 Signed-off-by: NWaiman Long <Waiman.Long@hpe.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ben Segall <bsegall@google.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/1449081710-20185-3-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
When building a kernel with a gcc 6 snapshot the compiler complains about unused const static variables for prio_to_weight and prio_to_mult for multiple scheduler files (all but core.c and autogroup.c) The way the array is currently declared it will be duplicated in every scheduler file that includes sched.h, which seems rather wasteful. Move the array out of line into core.c. I also added a sched_ prefix to avoid any potential name space collisions. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448859583-3252-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
The current code accounts for the time a task was absent from the fair class (per ATTACH_AGE_LOAD). However it does not work correctly when a task got migrated or moved to another cgroup while outside of the fair class. This patch tries to address that by aging on migration. We locklessly read the 'last_update_time' stamp from both the old and new cfs_rq, ages the load upto the old time, and sets it to the new time. These timestamps should in general not be more than 1 tick apart from one another, so there is a definite bound on things. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Changelog, a few edits and !SMP build fix ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Introduce smp_cond_acquire() which combines a control dependency and a read barrier to form acquire semantics. This primitive has two benefits: - it documents control dependencies, - its typically cheaper than using smp_load_acquire() in a loop. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Explain how the control dependency and smp_rmb() end up providing ACQUIRE semantics and pair with smp_store_release() in finish_lock_switch(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 11月, 2015 1 次提交
-
-
由 Dietmar Eggemann 提交于
Commit cd126afe ("sched/fair: Remove rq's runnable avg") got rid of rq->avg and so there is no need to update it any more when entering or exiting idle. Remove the now empty functions idle_{enter|exit}_fair(). Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/1445342681-17171-1-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 10月, 2015 1 次提交
-
-
由 xiaofeng.yan 提交于
The parameter "int next_cpu" in the following function is unused: migrate_task_rq(struct task_struct *p, int next_cpu) Remove it. Signed-off-by: Nxiaofeng.yan <yanxiaofeng@inspur.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1442991360-31945-1-git-send-email-yanxiaofeng@inspur.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-