- 25 8月, 2017 4 次提交
-
-
由 Peter Zijlstra 提交于
Currently we unconditionally destroy all sysctl bits and regenerate them after we've rebuild the domains (even if that rebuild is a no-op). And since we unconditionally (re)build the sysctl for all possible CPUs, onlining all CPUs gets us O(n^2) time. Instead change this to only rebuild the bits for CPUs we've actually installed new domains on. Reported-by: NOfer Levi(SW) <oferle@mellanox.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Fix partition_sched_domains() to try and preserve the existing machine wide domain instead of unconditionally destroying it. We do this by attempting to allocate the new single domain, only when that fails to we reuse the fallback_doms. When using fallback_doms we need to first destroy and then recreate because both the old and new could be backed by it. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ofer Levi(SW) <oferle@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet.Gupta1@synopsys.com <Vineet.Gupta1@synopsys.com> Cc: rusty@rustcorp.com.au <rusty@rustcorp.com.au> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mike provided a better comment for destroy_sched_domain() ... Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Shu Wang 提交于
Found this issue by kmemleak: the 'sg' and 'sgc' pointers from __sdt_alloc() might be leaked as each domain holds many groups' ref, but in destroy_sched_domain(), it only declined the first group ref. Onlining and offlining a CPU can trigger this leak, and cause OOM. Reproducer for my 6 CPUs machine: while true do echo 0 > /sys/devices/system/cpu/cpu5/online; echo 1 > /sys/devices/system/cpu/cpu5/online; done unreferenced object 0xffff88007d772a80 (size 64): comm "cpuhp/5", pid 39, jiffies 4294719962 (age 35.251s) hex dump (first 32 bytes): c0 22 77 7d 00 88 ff ff 02 00 00 00 01 00 00 00 ."w}............ 40 2a 77 7d 00 88 ff ff 00 00 00 00 00 00 00 00 @*w}............ backtrace: [<ffffffff8176525a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8121efe1>] __kmalloc_node+0xf1/0x280 [<ffffffff810d94a8>] build_sched_domains+0x1e8/0xf20 [<ffffffff810da674>] partition_sched_domains+0x304/0x360 [<ffffffff81139557>] cpuset_update_active_cpus+0x17/0x40 [<ffffffff810bdb2e>] sched_cpu_activate+0xae/0xc0 [<ffffffff810900e0>] cpuhp_invoke_callback+0x90/0x400 [<ffffffff81090597>] cpuhp_up_callbacks+0x37/0xb0 [<ffffffff81090887>] cpuhp_thread_fun+0xd7/0xf0 [<ffffffff810b37e0>] smpboot_thread_fn+0x110/0x160 [<ffffffff810af5d9>] kthread+0x109/0x140 [<ffffffff81770e45>] ret_from_fork+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff88007d772a40 (size 64): comm "cpuhp/5", pid 39, jiffies 4294719962 (age 35.251s) hex dump (first 32 bytes): 03 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 ................ 00 04 00 00 00 00 00 00 4f 3c fc ff 00 00 00 00 ........O<...... backtrace: [<ffffffff8176525a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8121efe1>] __kmalloc_node+0xf1/0x280 [<ffffffff810da16d>] build_sched_domains+0xead/0xf20 [<ffffffff810da674>] partition_sched_domains+0x304/0x360 [<ffffffff81139557>] cpuset_update_active_cpus+0x17/0x40 [<ffffffff810bdb2e>] sched_cpu_activate+0xae/0xc0 [<ffffffff810900e0>] cpuhp_invoke_callback+0x90/0x400 [<ffffffff81090597>] cpuhp_up_callbacks+0x37/0xb0 [<ffffffff81090887>] cpuhp_thread_fun+0xd7/0xf0 [<ffffffff810b37e0>] smpboot_thread_fn+0x110/0x160 [<ffffffff810af5d9>] kthread+0x109/0x140 [<ffffffff81770e45>] ret_from_fork+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffff Reported-by: NChunyu Hu <chuhu@redhat.com> Signed-off-by: NShu Wang <shuwang@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NChunyu Hu <chuhu@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: liwang@redhat.com Link: http://lkml.kernel.org/r/1502351536-9108-1-git-send-email-shuwang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 8月, 2017 1 次提交
-
-
由 Steven Rostedt 提交于
The complete_all() function modifies the completion's "done" variable to UINT_MAX, and no other caller (wait_for_completion(), etc) will modify it back to zero. That means that any call to complete_all() must have a reinit_completion() before that completion can be used again. Document this fact by the complete_all() function. Also document that completion_done() will always return true if complete_all() is called. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170816131202.195c2f4b@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 8月, 2017 20 次提交
-
-
由 Anshuman Khandual 提交于
Its kzalloc() not kmalloc() which has failed earlier. While here remove a redundant empty line. Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170802084300.29487-1-khandual@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In commit: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()") Rik changed wake_affine to consider NUMA information when balancing between LLC domains. There are a number of problems here which this patch tries to address: - LLC < NODE; in this case we'd use the wrong information to balance - !NUMA_BALANCING: in this case, the new code doesn't do any balancing at all - re-computes the NUMA data for every wakeup, this can mean iterating up to 64 CPUs for every wakeup. - default affine wakeups inside a cache We address these by saving the load/capacity values for each sched_domain during regular load-balance and using these values in wake_affine_llc(). The obvious down-side to using cached values is that they can be too old and poorly reflect reality. But this way we can use LLC wide information and thus not rely on assuming LLC matches NODE. We also don't rely on NUMA_BALANCING nor do we have to aggegate two nodes (or even cache domains) worth of CPUs for each wakeup. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()") [ Minor readability improvements. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Xie XiuQi 提交于
Now that we have more than one place to get the task state, intruduce the task_state_to_char() helper function to save some code. No functionality changed. Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <cj.chengjian@huawei.com> Cc: <huawei.libin@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1502095463-160172-3-git-send-email-xiexiuqi@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Xie XiuQi 提交于
Currently we print the runnable task in /proc/sched_debug, but there is no task state information. We don't know which task is in the runqueue and which task is sleeping. Add task state in the runnable task list, like this: runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- S watchdog/239 1452 -11.917445 2811 0 0.000000 8.949306 0.000000 7 0 / S migration/239 1453 20686.367740 8 0 0.000000 16215.720897 0.000000 7 0 / S ksoftirqd/239 1454 115383.841071 12 120 0.000000 0.200683 0.000000 7 0 / >R test 21287 4872.190970 407 120 0.000000 4874.911790 0.000000 7 0 /autogroup-150 R test 21288 4868.385454 401 120 0.000000 3672.341489 0.000000 7 0 /autogroup-150 R test 21289 4868.326776 384 120 0.000000 3424.934159 0.000000 7 0 /autogroup-150 Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <cj.chengjian@huawei.com> Cc: <huawei.libin@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1502095463-160172-2-git-send-email-xiexiuqi@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Aleksa Sarai 提交于
It appears as though the addition of the PID namespace did not update the output code for /proc/*/sched, which resulted in it providing PIDs that were not self-consistent with the /proc mount. This additionally made it trivial to detect whether a process was inside &init_pid_ns from userspace, making container detection trivial: https://github.com/jessfraz/amicontained This leads to situations such as: % unshare -pmf % mount -t proc proc /proc % head -n1 /proc/1/sched head (10047, #threads: 1) Fix this by just using task_pid_nr_ns for the output of /proc/*/sched. All of the other uses of task_pid_nr in kernel/sched/debug.c are from a sysctl context and thus don't need to be namespaced. Signed-off-by: NAleksa Sarai <asarai@suse.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Jess Frazelle <acidburn@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cyphar@cyphar.com Link: http://lkml.kernel.org/r/20170806044141.5093-1-asarai@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Cheng Jian 提交于
init_idle_bootup_task( ) is called in rest_init( ) to switch the scheduling class of the boot thread to the idle class. the function only sets: idle->sched_class = &idle_sched_class; which has been set in init_idle() called by sched_init(): /* * The idle tasks have their own, simple scheduling class: */ idle->sched_class = &idle_sched_class; We've already set the boot thread to idle class in start_kernel()->sched_init()->init_idle() so it's unnecessary to set it again in start_kernel()->rest_init()->init_idle_bootup_task() Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <huawei.libin@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1501838377-109720-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
cpudl_find() users are only interested in knowing if suitable CPU(s) were found or not (and then they look at later_mask to know which). Change cpudl_find() return type accordingly. Aligns with rt code. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <bristot@redhat.com> Cc: <juri.lelli@gmail.com> Cc: <kernel-team@lge.com> Cc: <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1495504859-10960-3-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
When cpudl_find() returns any among free_cpus, the CPU might not be closer than others, considering sched domain. For example: this_cpu: 15 free_cpus: 0, 1,..., 14 (== later_mask) best_cpu: 0 topology: 0 --+ +--+ 1 --+ | +-- ... --+ 2 --+ | | +--+ | 3 --+ | ... ... 12 --+ | +--+ | 13 --+ | | +-- ... -+ 14 --+ | +--+ 15 --+ In this case, it would be best to select 14 since it's a free CPU and closest to 15 (this_cpu). However, currently the code selects 0 (best_cpu) even though that's just any among free_cpus. Fix it. This (re)aligns the deadline behaviour with the rt behaviour. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <bristot@redhat.com> Cc: <juri.lelli@gmail.com> Cc: <kernel-team@lge.com> Cc: <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1495504859-10960-2-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Running 80 tasks in the same group, or as threads of the same process, results in the memory getting scanned 80x as fast as it would be if a single task was using the memory. This really hurts some workloads. Scale the scan period by the number of tasks in the numa group, and the shared / private ratio, so the average rate at which memory in the group is scanned corresponds roughly to the rate at which a single task would scan its memory. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: lvenanci@redhat.com Link: http://lkml.kernel.org/r/20170731192847.23050-3-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The comment above update_task_scan_period() says the scan period should be increased (scanning slows down) if the majority of memory accesses are on the local node, or if the majority of the page accesses are shared with other tasks. However, with the current code, all a high ratio of shared accesses does is slow down the rate at which scanning is made faster. This patch changes things so either lots of shared accesses or lots of local accesses will slow down scanning, and numa scanning is sped up only when there are lots of private faults on remote memory pages. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: lvenanci@redhat.com Link: http://lkml.kernel.org/r/20170731192847.23050-2-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
The running state is a subset of runnable state which means that running can't be set if runnable (weight) is cleared. There are corner cases where the current sched_entity has been already dequeued but cfs_rq->curr has not been updated yet and still points to the dequeued sched_entity. If ___update_load_avg() is called at that time, weight will be 0 and running will be set which is not possible. This case happens during pick_next_task_fair() when a cfs_rq becomes idles. The current sched_entity has been dequeued so se->on_rq is cleared and cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be cleared when picking the idle thread). Because the cfs_rq becomes idle, idle_balance() is called and ends up to call update_blocked_averages() with these wrong running and runnable states. Add a test in ___update_load_avg() to correct the running state in this case. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Link: http://lkml.kernel.org/r/1498885573-18984-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
pick_next_task_dl() and build_sched_domain() aren't used outside deadline.c and topology.c. Make them static. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/36e4cbb6210002cadae89920ae97e19e7e513008.1493281605.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
The 'struct cpupri' passed to cpupri_init() is already initialized to zero. Don't do that again. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/8a71d48c5a077500b6ddc1a41484c0ac8d3aad94.1492065513.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
The 'struct cpudl' passed to cpudl_init() is already initialized to zero. Don't do that again. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/bd4c229806bc96694b15546207afcc221387d2f5.1492065513.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
There are only two callers of init_rootdomain(). One of them passes a global to it and another one sends dynamically allocated root-domain. There is no need to memset the root-domain in the first case as the structure is already reset. Update alloc_rootdomain() to allocate the memory with kzalloc() and remove the memset() call from init_rootdomain(). Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/fc2f6cc90b098040970c85a97046512572d765bc.1492065513.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
update_freq is always true and there is no need to pass it to update_cfs_rq_load_avg(). Remove it. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/2d28d295f3f591ede7e931462bce1bda5aaa4896.1495603536.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
Rearrange pick_next_task_fair() a bit to avoid checking cfs_rq->nr_running twice for the case where FAIR_GROUP_SCHED is enabled and the previous task doesn't belong to the fair class. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/000903ab3df3350943d3271c53615893a230dc95.1495603536.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
weighted_cpuload() uses the cpu number passed to it get pointer to the runqueue. Almost all callers of weighted_cpuload() already have the rq pointer with them and can send that directly to weighted_cpuload(). In some cases the callers actually get the CPU number by doing cpu_of(rq). It would be simpler to pass rq to weighted_cpuload(). Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/b7720627e0576dc29b4ba3f9b6edbc913bb4f684.1495603536.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
Reuse put_prev_task() instead of copying its implementation. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/e2e50578223d05c5e90a9feb964fe1ec5d09a052.1495603536.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Viresh Kumar 提交于
For SMP systems, update_load_avg() calls the cpufreq update util handlers only for the top level cfs_rq (i.e. rq->cfs). But that is not the case for UP systems. update_load_avg() calls util handler for any cfs_rq for which it is called. This would result in way too many calls from the scheduler to the cpufreq governors when CONFIG_FAIR_GROUP_SCHED is enabled. Reduce the frequency of these calls by copying the behavior from the SMP case, i.e. Only call util handlers for the top level cfs_rq. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org Fixes: 536bd00c ("sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage") Link: http://lkml.kernel.org/r/6abf69a2107525885b616a2c1ec03d9c0946171c.1495603536.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 7月, 2017 1 次提交
-
-
由 Jonathan Corbet 提交于
The kerneldoc comments for try_to_wake_up_local() were out of date, leading to these documentation build warnings: ./kernel/sched/core.c:2080: warning: No description found for parameter 'rf' ./kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local' Update the comment to reflect current reality and give us some peace and quiet. Signed-off-by: NJonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20170724135628.695cecfc@lwn.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2017 2 次提交
-
-
由 Joel Fernandes 提交于
This comment in the code is incomplete, and I believe it begs a definition of dl_boosted to make sense of the condition that follows. Rewrite the comment and also rearrange the condition that follows to reflect the first condition "we have a top pi-waiter which is a SCHED_DEADLINE task" in that order. Also fix a typo that follows. Signed-off-by: NJoel Fernandes <joelaf@google.com> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NJuri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170713022429.10307-1-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
Recent kernels trigger this warning: BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181 caller is debug_smp_processor_id+0x17/0x19 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb95 #1 Call Trace: dump_stack+0x82/0xb8 check_preemption_disabled() debug_smp_processor_id() vtime_delta() task_cputime() thread_group_cputime() thread_group_cputime_adjusted() wait_consider_task() do_wait() SYSC_wait4() do_syscall_64() entry_SYSCALL64_slow_path() As Frederic pointed out: | Although those sched_clock_cpu() things seem to only matter when the | sched_clock() is unstable. And that stability is a condition for nohz_full | to work anyway. So probably sched_clock() alone would be enough. This patch fixes it by replacing sched_clock_cpu() with sched_clock() to avoid calling smp_processor_id() in a preemptible context. Reported-by: NXiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com [ Prettified the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 7月, 2017 1 次提交
-
-
由 Vikram Mulukutla 提交于
With a shared policy in place, when one of the CPUs in the policy is hotplugged out and then brought back online, sugov_stop() and sugov_start() are called in order. sugov_stop() removes utilization hooks for each CPU in the policy and does nothing else in the for_each_cpu() loop. sugov_start() on the other hand iterates through the CPUs in the policy and re-initializes the per-cpu structure _and_ adds the utilization hook. This implies that the scheduler is allowed to invoke a CPU's utilization update hook when the rest of the per-cpu structures have yet to be re-inited. Apart from some strange values in tracepoints this doesn't cause a problem, but if we do end up accessing a pointer from the per-cpu sugov_cpu structure somewhere in the sugov_update_shared() path, we will likely see crashes since the memset for another CPU in the policy is free to race with sugov_update_shared from the CPU that is ready to go. So let's fix this now to first init all per-cpu structures, and then add the per-cpu utilization update hooks all at once. Signed-off-by: NVikram Mulukutla <markivx@codeaurora.org> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 05 7月, 2017 6 次提交
-
-
由 Jeffrey Hugo 提交于
If load_balance() fails to migrate any tasks because all tasks were affined, load_balance() removes the source CPU from consideration and attempts to redo and balance among the new subset of CPUs. There is a bug in this code path where the algorithm considers all active CPUs in the system (minus the source that was just masked out). This is not valid for two reasons: some active CPUs may not be in the current scheduling domain and one of the active CPUs is dst_cpu. These CPUs should not be considered, as we cannot pull load from them. Instead of failing out of load_balance(), we may end up redoing the search with no valid CPUs and incorrectly concluding the domain is balanced. Additionally, if the group_imbalance flag was just set, it may also be incorrectly unset, thus the flag will not be seen by other CPUs in future load_balance() runs as that algorithm intends. Fix the check by removing CPUs not in the current domain and the dst_cpu from considertation, thus limiting the evaluation to valid remaining CPUs from which load might be migrated. Co-authored-by: NAustin Christ <austinwc@codeaurora.org> Co-authored-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NTyler Baicar <tbaicar@codeaurora.org> Signed-off-by: NJeffrey Hugo <jhugo@codeaurora.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Austin Christ <austinwc@codeaurora.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Timur Tabi <timur@codeaurora.org> Link: http://lkml.kernel.org/r/1496863138-11322-2-git-send-email-jhugo@codeaurora.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
Currently the cputime source used by vtime is jiffies. When we cross a context boundary and jiffies have changed since the last snapshot, the pending cputime is accounted to the switching out context. This system works ok if the ticks are not aligned across CPUs. If they instead are aligned (ie: all fire at the same time) and the CPUs run in userspace, the jiffies change is only observed on tick exit and therefore the user cputime is accounted as system cputime. This is because the CPU that maintains timekeeping fires its tick at the same time as the others. It updates jiffies in the middle of the tick and the other CPUs see that update on IRQ exit: CPU 0 (timekeeper) CPU 1 ------------------- ------------- jiffies = N ... run in userspace for a jiffy tick entry tick entry (sees jiffies = N) set jiffies = N + 1 tick exit tick exit (sees jiffies = N + 1) account 1 jiffy as stime Fix this with using a nanosec clock source instead of jiffies. The cputime is then accumulated and flushed everytime the pending delta reaches a jiffy in order to mitigate the accounting overhead. [ fweisbec: changelog, rebase on struct vtime, field renames, add delta on cputime readers, keep idle vtime as-is (low overhead accounting), harmonize clock sources. ] Suggested-by: NThomas Gleixner <tglx@linutronix.de> Reported-by: NLuiz Capitulino <lcapitulino@redhat.com> Tested-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
We are about to add vtime accumulation fields to the task struct. Let's avoid more bloatification and gather vtime information to their own struct. Tested-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
The current "snapshot" based naming on vtime fields suggests we record some past event but that's a low level picture of their actual purpose which comes out blurry. The real point of these fields is to run a basic state machine that tracks down cputime entry while switching between contexts. So lets reflect that with more meaningful names. Tested-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
Even though it doesn't have functional consequences, setting the task's new context state after we actually accounted the pending vtime from the old context state makes more sense from a review perspective. vtime_user_exit() is the only function that doesn't follow that rule and that can bug the reviewer for a little while until he realizes there is no reason for this special case. Tested-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
It's an unnecessary function between vtime_user_exit() and account_user_time(). Tested-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 7月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
This reverts commit 72298e5c. As Peter explains: > Argh, no... That code was perfectly fine. The new code otoh is > convoluted. > > The old code had the following form: > > if (exception1) > deal with exception1 > > if (execption2) > deal with exception2 > > do normal stuff > > Which is as simple and straight forward as it gets. > > The new code otoh reads like: > > if (!exception1) { > if (exception2) > deal with exception 2 > else > do normal stuff > } So restore the old form. Also fix the comment describing the logic, as it was confusing. Requested-by: NPeter Zijlstra <peterz@infradead.org> Cc: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Frans Klaver <fransklaver@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 6月, 2017 2 次提交
-
-
由 Gustavo A. R. Silva 提交于
Address a Coverity false positive, which is caused by overly convoluted code: Value assigned to variable 'utime' at line 619:utime = rtime; is overwritten at line 642:utime = rtime - stime; before it can be used. This makes such variable assignment useless. Remove this variable assignment and refactor the code related. Addresses-Coverity-ID: 1371643 Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com> Cc: Frans Klaver <fransklaver@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/20170629184128.GA5271@embeddedgusSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
Add the value of the rt_rq.rt_nr_migratory and dl_rq.dl_nr_migratory to the sched_debug output, for instance: rt_rq[0]: .rt_nr_running : 2 .rt_nr_migratory : 1 <--- Like this .rt_throttled : 0 .rt_time : 828.645877 .rt_runtime : 1000.000000 This is useful to debug problems related to the RT/DL schedulers. This also fixes the format of some variables, that were unsigned, rather than signed. Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/7896f71cada54ee7dd8507bb666063a2e051c3d4.1498482127.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 6月, 2017 1 次提交
-
-
由 Thomas Gleixner 提交于
Stephen reported the following build warning in UP: kernel/sched/fair.c:2657:9: warning: 'struct sched_domain' declared inside parameter list ^ /home/sfr/next/next/kernel/sched/fair.c:2657:9: warning: its scope is only this definition or declaration, which is probably not what you want Hide the numa_wake_affine() inline stub on UP builds to get rid of it. Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()") Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 24 6月, 2017 1 次提交
-
-
由 Rik van Riel 提交于
The effective_load() function was only used by the NUMA balancing code, and not by the regular load balancing code. Now that the NUMA balancing code no longer uses it either, get rid of it. Signed-off-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20170623165530.22514-5-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-