- 07 5月, 2013 1 次提交
-
-
由 Paul Gortmaker 提交于
This large chunk of load calculation code can be easily divorced from the main core.c scheduler file, with only a couple prototypes and externs added to a kernel/sched header. Some recent commits expanded the code and the documentation of it, making it large enough to warrant separation. For example, see: 556061b0, "sched/nohz: Fix rq->cpu_load[] calculations" 5aaa0b7a, "sched/nohz: Fix rq->cpu_load calculations some more" 5167e8d5, "sched/nohz: Rewrite and fix load-avg computation -- again" More importantly, it helps reduce the size of the main sched/core.c by yet another significant amount (~600 lines). Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1366398650-31599-2-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 5月, 2013 1 次提交
-
-
由 Frederic Weisbecker 提交于
The scheduler doesn't yet fully support environments with a single task running without a periodic tick. In order to ensure we still maintain the duties of scheduler_tick(), keep at least 1 tick per second. This makes sure that we keep the progression of various scheduler accounting and background maintainance even with a very low granularity. Examples include cpu load, sched average, CFS entity vruntime, avenrun and events such as load balancing, amongst other details handled in sched_class::task_tick(). This limitation will be removed in the future once we get these individual items to work in full dynticks CPUs. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 01 5月, 2013 4 次提交
-
-
由 Tejun Heo 提交于
One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. This patch implements set_worker_desc() which can be called from any workqueue work function to set its description. When the worker task is dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion and so on - the description will be printed out together with the workqueue name and the worker function pointer. The printing side is implemented by print_worker_info() which is called from functions in task dump paths - sched_show_task() and dump_stack_print_info(). print_worker_info() can be safely called on any task in any state as long as the task struct itself is accessible. It uses probe_*() functions to access worker fields. It may print garbage if something went very wrong, but it wouldn't cause (another) oops. The description is currently limited to 24bytes including the terminating \0. worker->desc_valid and workder->desc[] are added and the 64 bytes marker which was already incorrect before adding the new fields is moved to the correct position. Here's an example dump with writeback updated to set the bdi name as worker desc. Hardware name: Bochs Modules linked in: Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1 Workqueue: writeback bdi_writeback_workfn (flush-8:0) ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8 ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0 ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08 Call Trace: [<ffffffff81c61845>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81200150>] bdi_writeback_workfn+0x2a0/0x3b0 ... Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: NJan Kara <jack@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stanislaw Gruszka 提交于
Dave Hansen reported strange utime/stime values on his system: https://lkml.org/lkml/2013/4/4/435 This happens because prev->stime value is bigger than rtime value. Root of the problem are non-monotonic rtime values (i.e. current rtime is smaller than previous rtime) and that should be debugged and fixed. But since problem did not manifest itself before commit 62188451 "cputime: Avoid multiplication overflow on utime scaling", it should be threated as regression, which we can easily fixed on cputime_adjust() function. For now, let's apply this fix, but further work is needed to fix root of the problem. Reported-and-tested-by: NDave Hansen <dave@sr71.net> Cc: <stable@vger.kernel.org> # 3.9+ Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: rostedt@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1367314507-9728-3-git-send-email-sgruszka@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stanislaw Gruszka 提交于
Due to rounding in scale_stime(), for big numbers, scaled stime values will grow in chunks. Since rtime grow in jiffies and we calculate utime like below: prev->stime = max(prev->stime, stime); prev->utime = max(prev->utime, rtime - prev->stime); we could erroneously account stime values as utime. To prevent that only update prev->{u,s}time values when they are smaller than current rtime. Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: rostedt@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1367314507-9728-2-git-send-email-sgruszka@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stanislaw Gruszka 提交于
Here is patch, which adds Linus's cputime scaling algorithm to the kernel. This is a follow up (well, fix) to commit d9a3c982 ("sched: Lower chances of cputime scaling overflow") which commit tried to avoid multiplication overflow, but did not guarantee that the overflow would not happen. Linus crated a different algorithm, which completely avoids the multiplication overflow by dropping precision when numbers are big. It was tested by me and it gives good relative error of scaled numbers. Testing method is described here: http://marc.info/?l=linux-kernel&m=136733059505406&w=2 Originally-From: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: rostedt@goodmis.org Cc: Dave Hansen <dave@sr71.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130430151441.GC10465@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 4月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 4月, 2013 1 次提交
-
-
由 Vincent Guittot 提交于
On my SMP platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle - which makes the scheduler unhappy. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu() between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. A new nohz_idle field is added to sched_domain so both status and sched_domain will share the same RCU lifecycle and will be always synchronized. In addition, there is no more need to protect nohz_idle against concurrent access as it is only modified by 2 exclusive functions called by local cpu. This solution has been prefered to the creation of a new struct with an extra pointer indirection for sched_domain. The synchronization is done at the cost of : - An additional indirection and a rcu_dereference for accessing nohz_idle. - We use only the nohz_idle field of the top sched_domain. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: linaro-kernel@lists.linaro.org Cc: peterz@infradead.org Cc: fweisbec@gmail.com Cc: pjt@google.com Cc: rostedt@goodmis.org Cc: efault@gmx.de Link: http://lkml.kernel.org/r/1366729142-14662-1-git-send-email-vincent.guittot@linaro.org [ Fixed !NO_HZ build bug. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 4月, 2013 6 次提交
-
-
由 Joonsoo Kim 提交于
Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, in that, there is no code for preventing to re-select dst-cpu. So, same dst-cpu can be selected over and over. This patch add functionality to load_balance() in order to exclude cpu which is selected once. We prevent to re-select dst_cpu via env's cpus, so now, env's cpus is a candidate not only for src_cpus, but also dst_cpus. With this patch, we can remove lb_iterations and max_lb_iterations, because we decide whether we can go ahead or not via env's cpus. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: NJason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-7-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joonsoo Kim 提交于
This name doesn't represent specific meaning. So rename it to imply it's purpose. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: NJason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joonsoo Kim 提交于
Currently, LBF_ALL_PINNED is cleared after affinity check is passed. So, if task migration is skipped by small load value or small imbalance value in move_tasks(), we don't clear LBF_ALL_PINNED. At last, we trigger 'redo' in load_balance(). Imbalance value is often so small that any tasks cannot be moved to other cpus and, of course, this situation may be continued after we change the target cpu. So this patch move up affinity check code and clear LBF_ALL_PINNED before evaluating load value in order to mitigate useless redoing overhead. In addition, re-order some comments correctly. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: NJason Low <jason.low2@hp.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-5-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joonsoo Kim 提交于
Commit 88b8dac0 makes load_balance() consider other cpus in its group, regardless of idle type. When we do NEWLY_IDLE balancing, we should not consider it, because a motivation of NEWLY_IDLE balancing is to turn this cpu to non idle state if needed. This is not the case of other cpus. So, change code not to consider other cpus for NEWLY_IDLE balancing. With this patch, assign 'if (pulled_task) this_rq->idle_stamp = 0' in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't consider other cpus. Assigning to 'this_rq->idle_stamp' is now valid. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: NJason Low <jason.low2@hp.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joonsoo Kim 提交于
After commit 88b8dac0, dst-cpu can be changed in load_balance(), then we can't know cpu_idle_type of dst-cpu when load_balance() return positive. So, add explicit cpu_idle_type checking. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: NJason Low <jason.low2@hp.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joonsoo Kim 提交于
cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK. So, there is possibility that we miss doing resched_cpu(). Correct it as changing position of resched_cpu() before checking LBF_NEED_BREAK. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: NJason Low <jason.low2@hp.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366705662-3587-2-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 4月, 2013 4 次提交
-
-
由 Frederic Weisbecker 提交于
When a task is scheduled in, it may have some properties of its own that could make the CPU reconsider the need for the tick: posix cpu timers, perf events, ... So notify the full dynticks subsystem when a task gets scheduled in and re-check the tick dependency at this stage. This is done through a self IPI to avoid messing up with any current lock scenario. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
由 Frederic Weisbecker 提交于
The scheduler IPI is used by the scheduler to kick full dynticks CPUs asynchronously when more than one task are running or when a new timer list timer is enqueued. This way the destination CPU can decide to restart the tick to handle this new situation. Now let's call that kick in the scheduler IPI. (Reusing the scheduler IPI rather than implementing a new IPI was suggested by Peter Zijlstra a while ago) Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
由 Frederic Weisbecker 提交于
Provide a new helper to be called from the full dynticks engine before stopping the tick in order to make sure we don't stop it when there is more than one task running on the CPU. This way we make sure that the tick stays alive to maintain fairness. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
由 Frederic Weisbecker 提交于
Kick the tick on full dynticks CPUs when they get more than one task running on their queue. This makes sure that local fairness is maintained by the tick on the destination. This is done regardless of these tasks' class. We should be able to be more clever in the future depending on these. eg: a CPU that runs a SCHED_FIFO task doesn't need to maintain fairness against local pending tasks of the fair class. But keep things simple for now. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 21 4月, 2013 1 次提交
-
-
由 Vincent Guittot 提交于
The current update of the rq's load can be erroneous when RT tasks are involved. The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: linaro-kernel@lists.linaro.org Cc: peterz@infradead.org Cc: pjt@google.com Cc: fweisbec@gmail.com Cc: efault@gmx.de Link: http://lkml.kernel.org/r/1366302867-5055-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 4月, 2013 1 次提交
-
-
由 Waiman Long 提交于
As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler feature bit was really just an early hack to make with/without mutex-spinning testable. So it is no longer necessary. This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and move the mutex spinning code from kernel/sched/core.c back to kernel/mutex.c which is where they should belong. Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 4月, 2013 1 次提交
-
-
由 Frederic Weisbecker 提交于
"Extended nohz" was used as a naming base for the full dynticks API and Kconfig symbols. It reflects the fact the system tries to stop the tick in more places than just idle. But that "extended" name is a bit opaque and vague. Rename it to "full" makes it clearer what the system tries to do under this config: try to shutdown the tick anytime it can. The various constraints that prevent that to happen shouldn't be considered as fundamental properties of this feature but rather technical issues that may be solved in the future. Reported-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 10 4月, 2013 14 次提交
-
-
由 Ingo Molnar 提交于
The cpuacct split caused this build failure on UML: kernel/sched/cpuacct.c:94:2: error: implicit declaration of function 'ERR_PTR' Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Now we're guaranteed when cpuacct_charge() and cpuacct_account_field() are called, cpuacct has already been properly initialized, so we no longer need those checks. Signed-off-by: NLi Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155384C.7000508@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Initialize cpuacct before the scheduler is functioning, so when cpuacct_charge() and cpuacct_account_field() are called, task_ca() won't return NULL. Signed-off-by: NLi Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155383F.8000005@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Now we don't need cpuacct_init(), and instead we just initialize root_cpuacct when it's defined. Signed-off-by: NLi Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553834.9090701@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
This is a preparation, so later we can initialize cpuacct earlier. Signed-off-by: NLi Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553822.5000403@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Now most of the code in cpuacct.h can be moved to cpuacct.c Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536D5.2080401@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
This is a micro optimazation for a hot path. - We don't need to check if @CA returned from task_ca() is NULL. - We don't need to check if @CA returned from parent_ca() is NULL. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536B7.6060602@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
This is a micro optimization for the hot path. - We don't need to check if @CA is NULL in parent_ca(). - We don't need to check if @CA is NULL in the beginning of the for loop. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/515536A9.5000700@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
So we can remove open-coded cpuacct code in cputime.c. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553692.9060008@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
So we don't open-coded initialization of cpuacct in core.c. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51553687.1060906@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Add cpuacct.h and let sched.h include it. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155367B.2060506@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155366F.5060404@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Libin 提交于
A comment in function rebalance_domains() mentions arch_init_sched_domains(), but that function does not exist anymore. The proper function is init_sched_domains(). Signed-off-by: NLibin <huawei.libin@huawei.com> Cc: <peterz@infradead.org> Link: http://lkml.kernel.org/r/1364814841-49156-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Zhang Hang 提交于
At this point tsk_cache_hot is always true, so no need to check it. Signed-off-by: NZhang Hang <bob.zhanghang@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51650107.9040606@huawei.com [ Also remove unnecessary schedstat #ifdefs. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 4月, 2013 5 次提交
-
-
由 Stanislaw Gruszka 提交于
Recent commit 6fac4829 ("cputime: Use accessors to read task cputime stats") introduced a bug, where we account many times the cputime of the first thread, instead of cputimes of all the different threads. Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130404085740.GA2495@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Move it to a common place. Preparatory patch for implementing set/clear for the idle need_resched poll implementation. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: NCc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Viresh Kumar 提交于
Fix typo: sched_domains_nume_distance -> sched_domains_numa_distance Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: patches@linaro.org Cc: robin.randhawa@arm.com Cc: Steve.Bannister@arm.com Cc: Liviu.Dudau@arm.com Cc: charles.garcia-tobin@arm.com Cc: arvind.chauhan@arm.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/cd8084746ac932106d6fa6be388b8f2d6aa9617c.1365159023.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 libin 提交于
Commit 201c373e ("sched/debug: Limit sd->*_idx range on sysctl") was an incomplete bug fix. This patch fixes sd->*_idx limit range to [0 ~ CPU_LOAD_IDX_MAX-1] avoiding array overflow caused by setting sd->*_idx to CPU_LOAD_IDX_MAX on sysctl. Signed-off-by: NLibin <huawei.libin@huawei.com> Cc: <jiang.liu@huawei.com> Cc: <guohanjun@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51626610.2040607@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The sched_clock_remote() implementation has the following inatomicity problem on 32bit systems when accessing the remote scd->clock, which is a 64bit value. CPU0 CPU1 sched_clock_local() sched_clock_remote(CPU0) ... remote_clock = scd[CPU0]->clock read_low32bit(scd[CPU0]->clock) cmpxchg64(scd->clock,...) read_high32bit(scd[CPU0]->clock) While the update of scd->clock is using an atomic64 mechanism, the readout on the remote cpu is not, which can cause completely bogus readouts. It is a quite rare problem, because it requires the update to hit the narrow race window between the low/high readout and the update must go across the 32bit boundary. The resulting misbehaviour is, that CPU1 will see the sched_clock on CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value to this bogus timestamp. This stays that way due to the clamping implementation for about 4 seconds until the synchronization with CLOCK_MONOTONIC undoes the problem. The issue is hard to observe, because it might only result in a less accurate SCHED_OTHER timeslicing behaviour. To create observable damage on realtime scheduling classes, it is necessary that the bogus update of CPU1 sched_clock happens in the context of an realtime thread, which then gets charged 4 seconds of RT runtime, which results in the RT throttler mechanism to trigger and prevent scheduling of RT tasks for a little less than 4 seconds. So this is quite unlikely as well. The issue was quite hard to decode as the reproduction time is between 2 days and 3 weeks and intrusive tracing makes it less likely, but the following trace recorded with trace_clock=global, which uses sched_clock_local(), gave the final hint: <idle>-0 0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80 <idle>-0 0d..30 400269.477151: hrtimer_start: hrtimer=0xf7061e80 ... irq/20-S-587 1d..32 400273.772118: sched_wakeup: comm= ... target_cpu=0 <idle>-0 0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80 What happens is that CPU0 goes idle and invokes sched_clock_idle_sleep_event() which invokes sched_clock_local() and CPU1 runs a remote wakeup for CPU0 at the same time, which invokes sched_remote_clock(). The time jump gets propagated to CPU0 via sched_remote_clock() and stays stale on both cores for ~4 seconds. There are only two other possibilities, which could cause a stale sched clock: 1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic wrong value. 2) sched_clock() which reads the TSC returns a sporadic wrong value. #1 can be excluded because sched_clock would continue to increase for one jiffy and then go stale. #2 can be excluded because it would not make the clock jump forward. It would just result in a stale sched_clock for one jiffy. After quite some brain twisting and finding the same pattern on other traces, sched_clock_remote() remained the only place which could cause such a problem and as explained above it's indeed racy on 32bit systems. So while on 64bit systems the readout is atomic, we need to verify the remote readout on 32bit machines. We need to protect the local->clock readout in sched_clock_remote() on 32bit as well because an NMI could hit between the low and the high readout, call sched_clock_local() and modify local->clock. Thanks to Siegfried Wulsch for bearing with my debug requests and going through the tedious tasks of running a bunch of reproducer systems to generate the debug information which let me decode the issue. Reported-by: NSiegfried Wulsch <Siegfried.Wulsch@rovema.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-