- 06 5月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
No need for an extra notifier. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160310120025.693720241@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 05 5月, 2016 6 次提交
-
-
由 Yuyang Du 提交于
After cleaning up the sched metrics, there are two definitions that are ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT. Resolve this: - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what it is. - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE. Suggested-by: NBen Segall <bsegall@google.com> Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-3-git-send-email-yuyang.du@intel.com [ Rewrote the changelog and fixed the build on 32-bit kernels. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yuyang Du 提交于
Integer metric needs fixed point arithmetic. In sched/fair, a few metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity, may have different fixed point ranges, which makes their update and usage error-prone. In order to avoid the errors relating to the fixed point range, we definie a basic fixed point range, and then formalize all metrics to base on the basic range. The basic range is 1024 or (1 << 10). Further, one can recursively apply the basic range to have larger range. Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has 1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they must be well calibrated. Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1459829551-21625-2-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mike ran into the low load resolution limitation on his big machine. So reenable these bits; nobody could ever reproduce/analyze the reported power usage claim and Google has been running with this for years as well. Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com> Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The problem with the existing lock pinning is that each pin is of value 1; this mean you can simply unpin if you know its pinned, without having any extra information. This scheme generates a random (16 bit) cookie for each pin and requires this same cookie to unpin. This means you have to keep the cookie in context. No objsize difference for !LOCKDEP kernels. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In order to be able to pass around more than just the IRQ flags in the future, add a rq_flags structure. No difference in code generation for the x86_64-defconfig build I tested. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Its a rather large function, inline doesn't seems to make much sense: $ size defconfig-build/kernel/sched/core.o{.orig,} text data bss dec hex filename 56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig 55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o The 'perf bench sched messaging' micro-benchmark shows a visible improvement of 4-5%: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000 pre: 4.582798193 seconds time elapsed ( +- 1.41% ) 4.733374877 seconds time elapsed ( +- 2.10% ) 4.560955136 seconds time elapsed ( +- 1.43% ) 4.631062303 seconds time elapsed ( +- 1.40% ) post: 4.364765213 seconds time elapsed ( +- 0.91% ) 4.454442734 seconds time elapsed ( +- 1.18% ) 4.448893817 seconds time elapsed ( +- 1.41% ) 4.424346872 seconds time elapsed ( +- 0.97% ) Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 4月, 2016 2 次提交
-
-
由 Frederic Weisbecker 提交于
Some code in CPU load update only concern NO_HZ configs but it is built on all configurations. When NO_HZ isn't built, that code is harmless but just happens to take some useless ressources in CPU and memory: 1) one useless field in struct rq 2) jiffies record on every tick that is never used (cpu_load_update_periodic) 3) decay_load_missed is called two times on every tick to eventually return immediately with no action taken. And that function is dead code. For pure optimization purposes, lets conditionally build the NO_HZ related code. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
The CPU load update related functions have a weak naming convention currently, starting with update_cpu_load_*() which isn't ideal as "update" is a very generic concept. Since two of these functions are public already (and a third is to come) that's enough to introduce a more conventional naming scheme. So let's do the following rename instead: update_cpu_load_*() -> cpu_load_update_*() Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 3月, 2016 1 次提交
-
-
由 Yuyang Du 提交于
A new task's util_avg is set to full utilization of a CPU (100% time running). This accelerates a new task's utilization ramp-up, useful to boost its execution in early time. However, it may result in (insanely) high utilization for a transient time period when a flood of tasks are spawned. Importantly, it violates the "fundamentally bounded" CPU utilization, and its side effect is negative if we don't take any measure to bound it. This patch proposes an algorithm to address this issue. It has two methods to approach a sensible initial util_avg: (1) An expected (or average) util_avg based on its cfs_rq's util_avg: util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight (2) A trajectory of how successive new tasks' util develops, which gives 1/2 of the left utilization budget to a new task such that the additional util is noticeably large (when overall util is low) or unnoticeably small (when overall util is high enough). In the meantime, the aggregate utilization is well bounded: util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n where n denotes the nth task. If util_avg is larger than util_avg_cap, then the effective util is clamped to the util_avg_cap. Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Link: http://lkml.kernel.org/r/1459283456-21682-1-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 3月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 09 3月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Introduce a mechanism by which parts of the cpufreq subsystem ("setpolicy" drivers or the core) can register callbacks to be executed from cpufreq_update_util() which is invoked by the scheduler's update_load_avg() on CPU utilization changes. This allows the "setpolicy" drivers to dispense with their timers and do all of the computations they need and frequency/voltage adjustments in the update_load_avg() code path, among other things. The update_load_avg() changes were suggested by Peter Zijlstra. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NIngo Molnar <mingo@kernel.org>
-
- 05 3月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit 095c0aa8 "sched: adjust scheduler cpu power for stolen time" Fixes: commit aa483808 "sched: Remove irq time from available CPU power" Fixes: commit e6e6685a "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 3月, 2016 2 次提交
-
-
由 Frederic Weisbecker 提交于
Instead of providing asynchronous checks for the nohz subsystem to verify sched tick dependency, migrate sched to the new mask. Everytime a task is enqueued or dequeued, we evaluate the state of the tick dependency on top of the policy of the tasks in the runqueue, by order of priority: SCHED_DEADLINE: Need the tick in order to periodically check for runtime SCHED_FIFO : Don't need the tick (no round-robin) SCHED_RR : Need the tick if more than 1 task of the same priority for round robin (simplified with checking if more than one SCHED_RR task no matter what priority). SCHED_NORMAL : Need the tick if more than 1 task for round-robin. We could optimize that further with one flag per sched policy on the tick dependency mask and perform only the checks relevant to the policy concerned by an enqueue/dequeue operation. Since the checks aren't based on the current task anymore, we could get rid of the task switch hook but it's still needed for posix cpu timers. Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
In order to evaluate the scheduler tick dependency without probing context switches, we need to know how much SCHED_RR and SCHED_FIFO tasks are enqueued as those policies don't have the same preemption requirements. To prepare for that, let's account SCHED_RR tasks, we'll be able to deduce SCHED_FIFO tasks as well from it and the total RT tasks in the runqueue. Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 29 2月, 2016 4 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is configured. As debug.c is only compiled when SCHED_DEBUG is configured as well, move the setup of sched_domain_sysctl into that file. Note, the (un)register_sched_domain_sysctl() functions had to be changed from static to allow access to them from core.c. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Clark Williams <williams@redhat.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160222212825.599278093@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Reported-by: NAndrea Parri <parri.andrea@gmail.com> Tested-by: NAndrea Parri <parri.andrea@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dongsheng Yang 提交于
Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452674558-31897-1-git-send-email-yangds.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
When a cgroup's CPU runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (->css_free() will be, but we're also in an RCU callback). Put it in the ->css_offline() callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The ->css_offline() callbacks are called in hierarchical order after the following v4.4 commit: aa226ff4 ("cgroup: make sure a parent css isn't offlined before its children") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 2月, 2016 1 次提交
-
-
由 Mel Gorman 提交于
schedstats is very useful during debugging and performance tuning but it incurs overhead to calculate the stats. As such, even though it can be disabled at build time, it is often enabled as the information is useful. This patch adds a kernel command-line and sysctl tunable to enable or disable schedstats on demand (when it's built in). It is disabled by default as someone who knows they need it can also learn to enable it when necessary. The benefits are dependent on how scheduler-intensive the workload is. If it is then the patch reduces the number of cycles spent calculating the stats with a small benefit from reducing the cache footprint of the scheduler. These measurements were taken from a 48-core 2-socket machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a single socket machine 8-core machine with Intel i7-3770 processors. netperf-tcp 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%) Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%) Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%) Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%) Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%) Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%) Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%) Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%) Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%) Small gains here, UDP_STREAM showed nothing intresting and neither did the TCP_RR tests. The gains on the 8-core machine were very similar. tbench4 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%) Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%) Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%) Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%) Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%) Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%) Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%) Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%) Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%) Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%) Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%) Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%) Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%) Small gains of 2-4% at low thread counts and otherwise flat. The gains on the 8-core machine were slightly different tbench4 on 8-core i7-3770 single socket machine Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%) Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%) Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%) Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%) Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%) Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%) Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%) Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%) Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%) Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%) In constract, this shows a relatively steady 2-3% gain at higher thread counts. Due to the nature of the patch and the type of workload, it's not a surprise that the result will depend on the CPU used. hackbench-pipes 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%) Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%) Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%) Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%) Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%) Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%) Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%) Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%) Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%) Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%) Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%) Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%) Some small gains and losses and while the variance data is not included, it's close to the noise. The UMA machine did not show anything particularly different pipetest 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v2r2 Min Time 4.13 ( 0.00%) 3.99 ( 3.39%) 1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%) 2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%) 3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%) Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%) Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%) Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%) Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%) Max Time 4.93 ( 0.00%) 4.83 ( 2.03%) Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%) Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%) Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%) Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%) Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%) Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%) Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%) Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%) Small improvement and similar gains were seen on the UMA machine. The gain is small but it stands to reason that doing less work in the scheduler is a good thing. The downside is that the lack of schedstats and tracepoints may be surprising to experts doing performance analysis until they find the existence of the schedstats= parameter or schedstats sysctl. It will be automatically activated for latencytop and sleep profiling to alleviate the problem. For tracepoints, there is a simple warning as it's not safe to activate schedstats in the context when it's known the tracepoint may be wanted but is unavailable. Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 12月, 2015 5 次提交
-
-
由 Waiman Long 提交于
If a system with large number of sockets was driven to full utilization, it was found that the clock tick handling occupied a rather significant proportion of CPU time when fair group scheduling and autogroup were enabled. Running a java benchmark on a 16-socket IvyBridge-EX system, the perf profile looked like: 10.52% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt 9.66% 0.05% java [kernel.vmlinux] [k] hrtimer_interrupt 8.65% 0.03% java [kernel.vmlinux] [k] tick_sched_timer 8.56% 0.00% java [kernel.vmlinux] [k] update_process_times 8.07% 0.03% java [kernel.vmlinux] [k] scheduler_tick 6.91% 1.78% java [kernel.vmlinux] [k] task_tick_fair 5.24% 5.04% java [kernel.vmlinux] [k] update_cfs_shares In particular, the high CPU time consumed by update_cfs_shares() was mostly due to contention on the cacheline that contained the task_group's load_avg statistical counter. This cacheline may also contains variables like shares, cfs_rq & se which are accessed rather frequently during clock tick processing. This patch moves the load_avg variable into another cacheline separated from the other frequently accessed variables. It also creates a cacheline aligned kmemcache for task_group to make sure that all the allocated task_group's are cacheline aligned. By doing so, the perf profile became: 9.44% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt 8.74% 0.01% java [kernel.vmlinux] [k] hrtimer_interrupt 7.83% 0.03% java [kernel.vmlinux] [k] tick_sched_timer 7.74% 0.00% java [kernel.vmlinux] [k] update_process_times 7.27% 0.03% java [kernel.vmlinux] [k] scheduler_tick 5.94% 1.74% java [kernel.vmlinux] [k] task_tick_fair 4.15% 3.92% java [kernel.vmlinux] [k] update_cfs_shares The %cpu time is still pretty high, but it is better than before. The benchmark results before and after the patch was as follows: Before patch - Max-jOPs: 907533 Critical-jOps: 134877 After patch - Max-jOPs: 916011 Critical-jOps: 142366 Signed-off-by: NWaiman Long <Waiman.Long@hpe.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ben Segall <bsegall@google.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/1449081710-20185-3-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
When building a kernel with a gcc 6 snapshot the compiler complains about unused const static variables for prio_to_weight and prio_to_mult for multiple scheduler files (all but core.c and autogroup.c) The way the array is currently declared it will be duplicated in every scheduler file that includes sched.h, which seems rather wasteful. Move the array out of line into core.c. I also added a sched_ prefix to avoid any potential name space collisions. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448859583-3252-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
The current code accounts for the time a task was absent from the fair class (per ATTACH_AGE_LOAD). However it does not work correctly when a task got migrated or moved to another cgroup while outside of the fair class. This patch tries to address that by aging on migration. We locklessly read the 'last_update_time' stamp from both the old and new cfs_rq, ages the load upto the old time, and sets it to the new time. These timestamps should in general not be more than 1 tick apart from one another, so there is a definite bound on things. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Changelog, a few edits and !SMP build fix ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Introduce smp_cond_acquire() which combines a control dependency and a read barrier to form acquire semantics. This primitive has two benefits: - it documents control dependencies, - its typically cheaper than using smp_load_acquire() in a loop. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Explain how the control dependency and smp_rmb() end up providing ACQUIRE semantics and pair with smp_store_release() in finish_lock_switch(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 11月, 2015 1 次提交
-
-
由 Dietmar Eggemann 提交于
Commit cd126afe ("sched/fair: Remove rq's runnable avg") got rid of rq->avg and so there is no need to update it any more when entering or exiting idle. Remove the now empty functions idle_{enter|exit}_fair(). Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/1445342681-17171-1-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 10月, 2015 3 次提交
-
-
由 xiaofeng.yan 提交于
The parameter "int next_cpu" in the following function is unused: migrate_task_rq(struct task_struct *p, int next_cpu) Remove it. Signed-off-by: Nxiaofeng.yan <yanxiaofeng@inspur.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1442991360-31945-1-git-send-email-yanxiaofeng@inspur.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mike Meyer reported the following bug: > During evaluation of some performance data, it was discovered thread > and run queue run_delay accounting data was inconsistent with the other > accounting data that was collected. Further investigation found under > certain circumstances execution time was leaking into the task and > run queue accounting of run_delay. > > Consider the following sequence: > > a. thread is running. > b. thread moves beween cgroups, changes scheduling class or priority. > c. thread sleeps OR > d. thread involuntarily gives up cpu. > > a. implies: > > thread->sched_info.last_queued = 0 > > a. and b. results in the following: > > 1. dequeue_task(rq, thread) > > sched_info_dequeued(rq, thread) > delta = 0 > > sched_info_reset_dequeued(thread) > thread->sched_info.last_queued = 0 > > thread->sched_info.run_delay += delta > > 2. enqueue_task(rq, thread) > > sched_info_queued(rq, thread) > > /* thread is still on cpu at this point. */ > thread->sched_info.last_queued = task_rq(thread)->clock; > > c. results in: > > dequeue_task(rq, thread) > > sched_info_dequeued(rq, thread) > > /* delta is execution time not run_delay. */ > delta = task_rq(thread)->clock - thread->sched_info.last_queued > > sched_info_reset_dequeued(thread) > thread->sched_info.last_queued = 0 > > thread->sched_info.run_delay += delta > > Since thread was running between enqueue_task(rq, thread) and > dequeue_task(rq, thread), the delta above is really execution > time and not run_delay. > > d. results in: > > __sched_info_switch(thread, next_thread) > > sched_info_depart(rq, thread) > > sched_info_queued(rq, thread) > > /* last_queued not updated due to being non-zero */ > return > > Since thread was running between enqueue_task(rq, thread) and > __sched_info_switch(thread, next_thread), the execution time > between enqueue_task(rq, thread) and > __sched_info_switch(thread, next_thread) now will become > associated with run_delay due to when last_queued was last updated. > This alternative patch solves the problem by not calling sched_info_{de,}queued() in {de,en}queue_task(). Therefore the sched_info state is preserved and things work as expected. By inlining the {de,en}queue_task() functions the new condition becomes (mostly) a compile-time constant and we'll not emit any new branch instructions. It even shrinks the code (due to inlining {en,de}queue_task()): $ size defconfig-build/kernel/sched/core.o defconfig-build/kernel/sched/core.o.orig text data bss dec hex filename 64019 23378 2344 89741 15e8d defconfig-build/kernel/sched/core.o 64149 23378 2344 89871 15f0f defconfig-build/kernel/sched/core.o.orig Reported-by: NMike Meyer <Mike.Meyer@Teradata.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150930154413.GO3604@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
So the problem this patch is trying to address is as follows: CPU0 CPU1 context_switch(A, B) ttwu(A) LOCK A->pi_lock A->on_cpu == 0 finish_task_switch(A) prev_state = A->state <-. WMB | A->on_cpu = 0; | UNLOCK rq0->lock | | context_switch(C, A) `-- A->state = TASK_DEAD prev_state == TASK_DEAD put_task_struct(A) context_switch(A, C) finish_task_switch(A) A->state == TASK_DEAD put_task_struct(A) The argument being that the WMB will allow the load of A->state on CPU0 to cross over and observe CPU1's store of A->state, which will then result in a double-drop and use-after-free. Now the comment states (and this was true once upon a long time ago) that we need to observe A->state while holding rq->lock because that will order us against the wakeup; however the wakeup will not in fact acquire (that) rq->lock; it takes A->pi_lock these days. We can obviously fix this by upgrading the WMB to an MB, but that is expensive, so we'd rather avoid that. The alternative this patch takes is: smp_store_release(&A->on_cpu, 0), which avoids the MB on some archs, but not important ones like ARM. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> # v3.1+ Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: manfred@colorfullife.com Cc: will.deacon@arm.com Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()") Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 9月, 2015 1 次提交
-
-
由 Juri Lelli 提交于
Move dl_time_before() static definition in include/linux/sched/deadline.h so that it can be used by different parties without being re-defined. Reported-by: NLuca Abeni <luca.abeni@unitn.it> Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441188096-23021-3-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 9月, 2015 1 次提交
-
-
由 Henrik Austad 提交于
Most of the policy-tests are done via the <class>_policy() helpers with the notable exception of idle. A new wrapper for valid_policy() has also been added to improve readability in set_load_weight(). This commit does not change the logical behavior of the scheduler core. Signed-off-by: NHenrik Austad <henrik@austad.us> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1441810841-4756-1-git-send-email-henrik@austad.usSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 9月, 2015 6 次提交
-
-
由 Dietmar Eggemann 提交于
Besides the existing frequency scale-invariance correction factor, apply CPU scale-invariance correction factor to utilization tracking to compensate for any differences in compute capacity. This could be due to micro-architectural differences (i.e. instructions per seconds) between cpus in HMP systems (e.g. big.LITTLE), and/or differences in the current maximum frequency supported by individual cpus in SMP systems. In the existing implementation utilization isn't comparable between cpus as it is relative to the capacity of each individual CPU. Each segment of the sched_avg.util_sum geometric series is now scaled by the CPU performance factor too so the sched_avg.util_avg of each sched entity will be invariant from the particular CPU of the HMP/SMP system on which the sched entity is scheduled. With this patch, the utilization of a CPU stays relative to the max CPU performance of the fastest CPU in the system. In contrast to utilization (sched_avg.util_sum), load (sched_avg.load_sum) should not be scaled by compute capacity. The utilization metric is based on running time which only makes sense when cpus are _not_ fully utilized (utilization cannot go beyond 100% even if more tasks are added), where load is runnable time which isn't limited by the capacity of the CPU and therefore is a better metric for overloaded scenarios. If we run two nice-0 busy loops on two cpus with different compute capacity their load should be similar since their compute demands are the same. We have to assume that the compute demand of any task running on a fully utilized CPU (no spare cycles = 100% utilization) is high and the same no matter of the compute capacity of its current CPU, hence we shouldn't scale load by CPU capacity. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/55CE7409.1000700@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Morten Rasmussen 提交于
Bring arch_scale_cpu_capacity() in line with the recent change of its arch_scale_freq_capacity() sibling in commit dfbca41f ("sched: Optimize freq invariant accounting") from weak function to #define to allow inlining of the function. While at it, remove the ARCH_CAPACITY sched_feature as well. With the change to #define there isn't a straightforward way to allow runtime switch between an arch implementation and the default implementation of arch_scale_cpu_capacity() using sched_feature. The default was to use the arch-specific implementation, but only the arm architecture provides one and that is essentially equivalent to the default implementation. Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <Dietmar.Eggemann@arm.com> Cc: Juri Lelli <Juri.Lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: daniel.lezcano@linaro.org Cc: mturquette@baylibre.com Cc: pang.xunlei@zte.com.cn Cc: rjw@rjwysocki.net Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1439569394-11974-3-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Variable sched_numa_balancing toggles numa_balancing feature. Hence moving from a simple read mostly variable to a more apt static_branch. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439310261-16124-1-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Commit 2a1ed24c ("sched/numa: Prefer NUMA hotness over cache hotness") sets sched feature NUMA to true. However this can enable NUMA hinting faults on a UMA system. This commit ensures that NUMA hinting faults occur only on a NUMA system by setting/resetting sched_numa_balancing. This commit: - Makes sched_numa_balancing common to CONFIG_SCHED_DEBUG and !CONFIG_SCHED_DEBUG. Earlier it was only in !CONFIG_SCHED_DEBUG. - Checks for sched_numa_balancing instead of sched_feat(NUMA). Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439290813-6683-3-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Simple rename of the 'numabalancing_enabled' variable to 'sched_numa_balancing'. No functional changes. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439290813-6683-2-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The previous patches made the second argument go unused, remove it. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 8月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
Give every class a set_cpus_allowed() method, this enables some small optimization in the RT,DL implementation by avoiding a double cpumask_weight() call. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.614517487@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 8月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
One less arch hook.. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 8月, 2015 1 次提交
-
-
由 Yuyang Du 提交于
The cfs_rq's load_avg is composed of runnable_load_avg and blocked_load_avg. Before this series, sometimes the runnable_load_avg is used, and sometimes the load_avg is used. Completely replacing all uses of runnable_load_avg with load_avg may be too big a leap, i.e., the blocked_load_avg is concerned to result in overrated load. Therefore, we get runnable_load_avg back. The new cfs_rq's runnable_load_avg is improved to be updated with all of the runnable sched_eneities at the same time, so the one sched_entity updated and the others stale problem is solved. Signed-off-by: NYuyang Du <yuyang.du@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan@linux.intel.com Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: fengguang.wu@intel.com Cc: len.brown@intel.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: rafael.j.wysocki@intel.com Cc: umgwanakikbuti@gmail.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1436918682-4971-7-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-