- 20 8月, 2018 1 次提交
-
-
由 Rafael J. Wysocki 提交于
If the tick has been stopped already, but the governor has not asked to stop it (which it can do sometimes), the idle loop should invoke tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care of this case properly. Fixes: 554c8aa8 (sched: idle: Select idle state before stopping the tick) Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 09 4月, 2018 1 次提交
-
-
由 Rafael J. Wysocki 提交于
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_stop_tick(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. Namely, tick_nohz_get_sleep_length() can be made call tick_nohz_next_event(), introduced earlier, to get the time to the next non-highres timer event. If that happens, tick_nohz_next_event() need not be called by __tick_nohz_idle_stop_tick() again. If it turns out that the scheduler tick cannot be stopped going forward or the next timer event is too close for the tick to be stopped, tick_nohz_get_sleep_length() can simply return the time to the next event currently programmed into the corresponding clock event device. In addition to knowing the return value of tick_nohz_next_event(), however, tick_nohz_get_sleep_length() needs to know the time to the next highres timer event, but with the scheduler tick timer excluded, which can be computed with the help of hrtimer_get_next_event(). That minimum of that number and the tick_nohz_next_event() return value is the total time to the next timer event with the assumption that the tick will be stopped. It can be returned to the idle governor which can use it for predicting idle duration (under the assumption that the tick will be stopped) and deciding whether or not it makes sense to stop the tick before putting the CPU into the selected idle state. With the above, the sleep_length field in struct tick_sched is not necessary any more, so drop it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227Reported-by: NDoug Smythies <dsmythies@telus.net> Reported-by: NThomas Ilsche <thomas.ilsche@tu-dresden.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
-
- 06 4月, 2018 4 次提交
-
-
由 Rafael J. Wysocki 提交于
Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Rafael J. Wysocki 提交于
Make cpuidle_idle_call() decide whether or not to stop the tick. First, the cpuidle_enter_s2idle() path deals with the tick (and with the entire timekeeping for that matter) by itself and it doesn't need the tick to be stopped beforehand. Second, to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, it will be necessary to change the ordering of cpuidle_select() with respect to tick_nohz_idle_stop_tick(). To prepare for that, put a tick_nohz_idle_stop_tick() call in the same branch in which cpuidle_select() is called. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Rafael J. Wysocki 提交于
Push the decision whether or not to stop the tick somewhat deeper into the idle loop. Stopping the tick upfront leads to unpleasant outcomes in case the idle governor doesn't agree with the nohz code on the duration of the upcoming idle period. Specifically, if the tick has been stopped and the idle governor predicts short idle, the situation is bad regardless of whether or not the prediction is accurate. If it is accurate, the tick has been stopped unnecessarily which means excessive overhead. If it is not accurate, the CPU is likely to spend too much time in the (shallow, because short idle has been predicted) idle state selected by the governor [1]. As the first step towards addressing this problem, change the code to make the tick stopping decision inside of the loop in do_idle(). In particular, do not stop the tick in the cpu_idle_poll() code path. Also don't do that in tick_nohz_irq_exit() which doesn't really have enough information on whether or not to stop the tick. Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1] Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdfSuggested-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Rafael J. Wysocki 提交于
Prepare the scheduler tick code for reworking the idle loop to avoid stopping the tick in some cases. The idea is to split the nohz idle entry call to decouple the idle time stats accounting and preparatory work from the actual tick stop code, in order to later be able to delay the tick stop once we reach more power-knowledgeable callers. Move away the tick_nohz_start_idle() invocation from __tick_nohz_idle_enter(), rename the latter to __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick() as a wrapper around it for calling it from the outside. Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead of calling the entire __tick_nohz_idle_enter(), add another wrapper disabling and enabling interrupts around tick_nohz_idle_stop_tick() and make the current callers of tick_nohz_idle_enter() call it too to retain their current functionality. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 04 3月, 2018 2 次提交
-
-
由 Ingo Molnar 提交于
Merge these two small .c modules as they implement two aspects of idle task handling. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 3月, 2018 1 次提交
-
-
由 Ingo Molnar 提交于
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details: - fix speling in comments, - use curly braces for multi-line statements, - remove unnecessary parentheses from integer literals, - capitalize consistently, - remove stray newlines, - add comments where necessary, - remove invalid/unnecessary comments, - align structure definitions and other data types vertically, - add missing newlines for increased readability, - fix vertical tabulation where it's misaligned, - harmonize preprocessor conditional block labeling and vertical alignment, - remove line-breaks where they uglify the code, - add newline after local variable definitions, No change in functionality: md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 10月, 2017 1 次提交
-
-
由 Cheng Jian 提交于
Move the loop-invariant calculation of 'cpu' in do_idle() out of the loop body, because the current CPU is always constant. This improves the generated code both on x86-64 and ARM64: x86-64: Before patch (execution in loop): 864: 0f ae e8 lfence 867: 65 8b 05 c2 38 f1 7e mov %gs:0x7ef138c2(%rip),%eax 86e: 89 c0 mov %eax,%eax 870: 48 0f a3 05 68 19 08 bt %rax,0x1081968(%rip) 877: 01 After patch (execution in loop): 872: 0f ae e8 lfence 875: 4c 0f a3 25 63 19 08 bt %r12,0x1081963(%rip) 87c: 01 ARM64: Before patch (execution in loop): c58: d5033d9f dsb ld c5c: d538d080 mrs x0, tpidr_el1 c60: b8606a61 ldr w1, [x19,x0] c64: 1100fc20 add w0, w1, #0x3f c68: 7100003f cmp w1, #0x0 c6c: 1a81b000 csel w0, w0, w1, lt c70: 13067c00 asr w0, w0, #6 c74: 93407c00 sxtw x0, w0 c78: f8607a80 ldr x0, [x20,x0,lsl #3] c7c: 9ac12401 lsr x1, x0, x1 c80: 36000581 tbz w1, #0, d30 <do_idle+0x128> After patch (execution in loop): c84: d5033d9f dsb ld c88: f9400260 ldr x0, [x19] c8c: ea14001f tst x0, x20 c90: 54000580 b.eq d40 <do_idle+0x138> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> [ Rewrote the title and the changelog. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: huawei.libin@huawei.com Cc: xiexiuqi@huawei.com Link: http://lkml.kernel.org/r/1508930907-107755-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 10月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
quiet_vmstat() is an expensive function that only makes sense when we go into NOHZ. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: aubrey.li@linux.intel.com Cc: cl@linux.com Cc: fweisbec@gmail.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 8月, 2017 2 次提交
-
-
由 Rafael J. Wysocki 提交于
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle to make it clear that it is used for entering suspend-to-idle and rename the related functions, variables and so on accordingly. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
Rename the freeze_state enum representing the suspend-to-idle state machine states to s2idle_states and rename the related variables and functions accordingly. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 08 6月, 2017 1 次提交
-
-
由 Aubrey Li 提交于
Deferrable vmstat_updater was missing in commit: c1de45ca ("sched/idle: Add support for tasks that inject idle") Add it back. Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Aubrey Li <aubrey.li@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496803742-38274-1-git-send-email-aubrey.li@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 5月, 2017 1 次提交
-
-
由 Steven Rostedt (VMware) 提交于
I finally got around to creating trampolines for dynamically allocated ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace function hook callbacks, like perf, that allocate the ftrace_ops descriptor via kmalloc() and friends, ftrace was not able to optimize the functions being traced to use a trampoline because they would also need to be allocated dynamically. The problem is that they cannot be freed when CONFIG_PREEMPT is set, as there's no way to tell if a task was preempted on the trampoline. That was before Paul McKenney implemented synchronize_rcu_tasks() that would make sure all tasks (except idle) have scheduled out or have entered user space. While testing this, I triggered this bug: BUG: unable to handle kernel paging request at ffffffffa0230077 ... RIP: 0010:0xffffffffa0230077 ... Call Trace: schedule+0x5/0xe0 schedule_preempt_disabled+0x18/0x30 do_idle+0x172/0x220 What happened was that the idle task was preempted on the trampoline. As synchronize_rcu_tasks() ignores the idle thread, there's nothing that lets ftrace know that the idle task was preempted on a trampoline. The idle task shouldn't need to ever enable preemption. The idle task is simply a loop that calls schedule or places the cpu into idle mode. In fact, having preemption enabled is inefficient, because it can happen when idle is just about to call schedule anyway, which would cause schedule to be called twice. Once for when the interrupt came in and was returning back to normal context, and then again in the normal path that the idle loop is running in, which would be pointless, as it had already scheduled. The only reason schedule_preempt_disable() enables preemption is to be able to call sched_submit_work(), which requires preemption enabled. As this is a nop when the task is in the RUNNING state, and idle is always in the running state, there's no reason that idle needs to enable preemption. But that means it cannot use schedule_preempt_disable() as other callers of that function require calling sched_submit_work(). Adding a new function local to kernel/sched/ that allows idle to call the scheduler without enabling preemption, fixes the synchronize_rcu_tasks() issue, as well as removes the pointless spurious schedule calls caused by interrupts happening in the brief window where preemption is enabled just before it calls schedule. Reviewed: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 3月, 2017 1 次提交
-
-
由 Josh Poimboeuf 提交于
Change livepatch to use a basic per-task consistency model. This is the foundation which will eventually enable us to patch those ~10% of security patches which change function or data semantics. This is the biggest remaining piece needed to make livepatch more generally useful. This code stems from the design proposal made by Vojtech [1] in November 2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Patches are applied on a per-task basis, when the task is deemed safe to switch over. When a patch is enabled, livepatch enters into a transition state where tasks are converging to the patched state. Usually this transition state can complete in a few seconds. The same sequence occurs when a patch is disabled, except the tasks converge from the patched state to the unpatched state. An interrupt handler inherits the patched state of the task it interrupts. The same is true for forked tasks: the child inherits the patched state of the parent. Livepatch uses several complementary approaches to determine when it's safe to patch tasks: 1. The first and most effective approach is stack checking of sleeping tasks. If no affected functions are on the stack of a given task, the task is patched. In most cases this will patch most or all of the tasks on the first try. Otherwise it'll keep trying periodically. This option is only available if the architecture has reliable stacks (HAVE_RELIABLE_STACKTRACE). 2. The second approach, if needed, is kernel exit switching. A task is switched when it returns to user space from a system call, a user space IRQ, or a signal. It's useful in the following cases: a) Patching I/O-bound user tasks which are sleeping on an affected function. In this case you have to send SIGSTOP and SIGCONT to force it to exit the kernel and be patched. b) Patching CPU-bound user tasks. If the task is highly CPU-bound then it will get patched the next time it gets interrupted by an IRQ. c) In the future it could be useful for applying patches for architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In this case you would have to signal most of the tasks on the system. However this isn't supported yet because there's currently no way to patch kthreads without HAVE_RELIABLE_STACKTRACE. 3. For idle "swapper" tasks, since they don't ever exit the kernel, they instead have a klp_update_patch_state() call in the idle loop which allows them to be patched before the CPU enters the idle state. (Note there's not yet such an approach for kthreads.) All the above approaches may be skipped by setting the 'immediate' flag in the 'klp_patch' struct, which will disable per-task consistency and patch all tasks immediately. This can be useful if the patch doesn't change any function or data semantics. Note that, even with this flag set, it's possible that some tasks may still be running with an old version of the function, until that function returns. There's also an 'immediate' flag in the 'klp_func' struct which allows you to specify that certain functions in the patch can be applied without per-task consistency. This might be useful if you want to patch a common function like schedule(), and the function change doesn't need consistency but the rest of the patch does. For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user must set patch->immediate which causes all tasks to be patched immediately. This option should be used with care, only when the patch doesn't change any function or data semantics. In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE may be allowed to use per-task consistency if we can come up with another way to patch kthreads. The /sys/kernel/livepatch/<patch>/transition file shows whether a patch is in transition. Only a single patch (the topmost patch on the stack) can be in transition at a given time. A patch can remain in transition indefinitely, if any of the tasks are stuck in the initial patch state. A transition can be reversed and effectively canceled by writing the opposite value to the /sys/kernel/livepatch/<patch>/enabled file while the transition is in progress. Then all the tasks will attempt to converge back to the original patch state. [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.czSigned-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Acked-by: NMiroslav Benes <mbenes@suse.cz> Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/idle.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/idle.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 11月, 2016 2 次提交
-
-
由 Peter Zijlstra 提交于
Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use realtime tasks to take control of CPU then inject idle. There are two issues with this approach: 1. Low efficiency: injected idle task is treated as busy so sched ticks do not stop during injected idle period, the result of these unwanted wakeups can be ~20% loss in power savings. 2. Idle accounting: injected idle time is presented to user as busy. This patch addresses the issues by introducing a new PF_IDLE flag which allows any given task to be treated as idle task while the flag is set. Therefore, idle injection tasks can run through the normal flow of NOHZ idle enter/exit to get the correct accounting as well as tick stop when possible. The implication is that idle task is then no longer limited to PID == 0. Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Jacob Pan 提交于
When idle injection is used to cap power, we need to override the governor's choice of idle states. For this reason, make it possible the deepest idle state selection to be enforced by setting a flag on a given CPU to achieve the maximum potential power draw reduction. Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com> [ rjw: Subject & changelog ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 08 10月, 2016 1 次提交
-
-
由 Chris Metcalf 提交于
When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. This commit suitably tags x86 and tile idle routines, and only adds in the minimal framework for other architectures. Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: NPetr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 6月, 2016 2 次提交
-
-
Currently, smp_processor_id() is used to fetch the current CPU in cpu_idle_loop(). Every time the idle thread runs, it fetches the current CPU using smp_processor_id(). Since the idle thread is per CPU, the current CPU is constant, so we can lift the load out of the loop, saving execution cycles/time in the loop. x86-64: Before patch (execution in loop): 148: 0f ae e8 lfence 14b: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax 152: 00 153: 89 c0 mov %eax,%eax 155: 49 0f a3 04 24 bt %rax,(%r12) After patch (execution in loop): 150: 0f ae e8 lfence 153: 4d 0f a3 34 24 bt %r14,(%r12) ARM64: Before patch (execution in loop): 168: d5033d9f dsb ld 16c: b9405661 ldr w1,[x19,#84] 170: 1100fc20 add w0,w1,#0x3f 174: 6b1f003f cmp w1,wzr 178: 1a81b000 csel w0,w0,w1,lt 17c: 130c7000 asr w0,w0,#6 180: 937d7c00 sbfiz x0,x0,#3,#32 184: f8606aa0 ldr x0,[x21,x0] 188: 9ac12401 lsr x1,x0,x1 18c: 36000e61 tbz w1,#0,358 After patch (execution in loop): 1a8: d50339df dsb ld 1ac: f8776ac0 ldr x0,[x22,x23] ab0: ea18001f tst x0,x24 1b4: 54000ea0 b.eq 388 Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is called 8672 times. Shifting the code will save instructions executed in loop and eventually time as well. Signed-off-by: NGaurav Jindal <gaurav.jindal@spreadtrum.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSanjeev Yadav <sanjeev.yadav@spreadtrum.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160512101330.GA488@gauravjindalubtnb.del.spreadtrum.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Catalin Marinas 提交于
The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is enabled. Commit c8cc7d4d ("sched/idle: Reorganize the idle loop") removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the compiler optimising away __this_cpu_read(cpuidle_devices). However, with CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens and the kernel fails to link since cpuidle_devices is not defined. This patch introduces an accessor function for the current CPU cpuidle device (returning NULL when !CONFIG_CPU_IDLE) and uses it in cpuidle_idle_call(). Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 02 3月, 2016 3 次提交
-
-
由 Thomas Gleixner 提交于
Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets invoked at the proper place. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Kill the busy spinning on the control side and just wait for the hotplugged cpu to tell that it reached the dead state. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.776157858@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Let the upcoming cpu kick the hotplug thread and let itself complete the bringup. That way the controll side can just wait for the completion or later when we made the hotplug machinery async not care at all. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.697655464@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 22 1月, 2016 1 次提交
-
-
由 Sudeep Holla 提交于
Commit 51164251 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()" made find_deepest_state() return non-negative value and check all the states with index > 0. Also as a result, find_deepest_state() returns 0 even when enter_freeze callbacks are not implemented and enter_freeze_proper() is called which ends up crashing the kernel. This patch updates the check for index > 0 in cpuidle_enter_freeze and cpuidle_idle_call(when idle_should_freeze is true) to restore the suspend-to-idle functionality in absence of enter_freeze callback. Fixes: 51164251 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()" Signed-off-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 19 1月, 2016 1 次提交
-
-
由 Rafael J. Wysocki 提交于
After commit 9c4b2867 (cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0) it is clear that menu_select() cannot return negative values. Moreover, ladder_select_state() will never return a negative value too, so make find_deepest_state() return non-negative values too and drop the default_idle_call() fallback from call_cpuidle(). This eliminates one branch from the idle loop and makes the governors and find_deepest_state() handle the case when all states have been disabled from sysfs consistently. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NIngo Molnar <mingo@kernel.org> Tested-by: NSudeep Holla <sudeep.holla@arm.com>
-
- 15 1月, 2016 1 次提交
-
-
由 Christoph Lameter 提交于
Currently the vmstat updater is not deferrable as a result of commit ba4877b9 ("vmstat: do not use deferrable delayed work for vmstat_update"). This in turn can cause multiple interruptions of the applications because the vmstat updater may run at Make vmstate_update deferrable again and provide a function that folds the differentials when the processor is going to idle mode thus addressing the issue of the above commit in a clean way. Note that the shepherd thread will continue scanning the differentials from another processor and will reenable the vmstat workers if it detects any changes. Fixes: ba4877b9 ("vmstat: do not use deferrable delayed work for vmstat_update") Signed-off-by: NChristoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 10月, 2015 1 次提交
-
-
When using idle=poll, the preemptoff tracer is always showing the idle task as the culprit for long latencies. That happens because critical timings are not stopped before idle loop. This patch stops critical timings before entering the idle loop, starting it again after the idle loop. This problem does not affect the irqsoff tracer because interruptions are enabled before entering the idle loop. Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: NLuis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 7月, 2015 1 次提交
-
-
由 Lucas Stach 提交于
Make sure to stop tracing only once we are past a point where all latency tracing events have been processed (irqs are not enabled again). This has the slight advantage of capturing more latency related events in the idle path, but most importantly it makes sure that latency tracing doesn't get re-enabled inadvertently when new events are coming in. This makes the irqsoff latency tracer useful again, as we stop capturing CPU sleep time as IRQ latency. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel@pengutronix.de Cc: patchwork-lst@pengutronix.de Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 5月, 2015 2 次提交
-
-
由 Rafael J. Wysocki 提交于
The check of the cpuidle_enter() return value against -EBUSY made in call_cpuidle() will not be necessary any more if cpuidle_enter_state() calls default_idle_call() directly when it is about to return -EBUSY, so make that happen and eliminate the check. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: NSudeep Holla <sudeep.holla@arm.com> Acked-by: NKevin Hilman <khilman@linaro.org>
-
由 Rafael J. Wysocki 提交于
Introduce a wrapper function around idle_set_state() called sched_idle_set_state() that will pass this_rq() to it as the first argument and make cpuidle_enter_state() call the new function before and after entering the target state. At the same time, remove direct invocations of idle_set_state() from call_cpuidle(). This will allow the invocation of default_idle_call() to be moved from call_cpuidle() to cpuidle_enter_state() safely and call_cpuidle() to be simplified a bit as a result. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: NSudeep Holla <sudeep.holla@arm.com> Acked-by: NKevin Hilman <khilman@linaro.org>
-
- 05 5月, 2015 2 次提交
-
-
由 Rafael J. Wysocki 提交于
Since cpuidle_reflect() should only be called if the idle state to enter was selected by cpuidle_select(), there is the "reflect" variable in cpuidle_idle_call() whose value is used to determine whether or not that is the case. However, if the entire code run between the conditional setting "reflect" and the call to cpuidle_reflect() is moved to a separate function, it will be possible to call that new function in both branches of the conditional, in which case cpuidle_reflect() will only need to be called from one of them too and the "reflect" variable won't be necessary any more. This eliminates one check made by cpuidle_idle_call() on the majority of its invocations, so change the code as described. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Rafael J. Wysocki 提交于
Move the code under the "use_default" label in cpuidle_idle_call() into a separate (new) function. This just allows the subsequent changes to be more stratightforward. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 29 4月, 2015 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit 335f4919 (sched/idle: Use explicit broadcast oneshot control function) replaced clockevents_notify() invocations in cpuidle_idle_call() with direct calls to tick_broadcast_enter() and tick_broadcast_exit(), but it overlooked the fact that interrupts were already enabled before calling the latter which led to functional breakage on systems using idle states with the CPUIDLE_FLAG_TIMER_STOP flag set. Fix that by moving the invocations of tick_broadcast_enter() and tick_broadcast_exit() down into cpuidle_enter_state() where interrupts are still disabled when tick_broadcast_exit() is called. Also ensure that interrupts will be disabled before running tick_broadcast_exit() even if they have been enabled by the idle state's ->enter callback. Trigger a WARN_ON_ONCE() in that case, as we generally don't want that to happen for states with CPUIDLE_FLAG_TIMER_STOP set. Fixes: 335f4919 (sched/idle: Use explicit broadcast oneshot control function) Reported-and-tested-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 03 4月, 2015 1 次提交
-
-
由 Thomas Gleixner 提交于
Replace the clockevents_notify() call with an explicit function call. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/6422336.RMm7oUHcXh@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 3月, 2015 2 次提交
-
-
由 Paul E. McKenney 提交于
This commit informs RCU of an outgoing CPU just before that CPU invokes arch_cpu_idle_dead() during its last pass through the idle loop (via a new CPU_DYING_IDLE notifier value). This change means that RCU need not deal with outgoing CPUs passing through the scheduler after informing RCU that they are no longer online. Note that removing the CPU from the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time, and orphaning callbacks is still done at CPU_DEAD time, the reason being that at CPU_DEAD time we have another CPU that can adopt them. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit uses a per-CPU variable to make the CPU-offline code path through the idle loop more precise, so that the outgoing CPU is guaranteed to make it into the idle loop before it is powered off. This commit is in preparation for putting the RCU offline-handling code on this code path, which will eliminate the magic one-jiffy wait that RCU uses as the maximum time for an outgoing CPU to get all the way through the scheduler. The magic one-jiffy wait for incoming CPUs remains a separate issue. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 06 3月, 2015 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit 38106313 (PM / sleep: Re-implement suspend-to-idle handling) overlooked the fact that entering some sufficiently deep idle states by CPUs may cause their local timers to stop and in those cases it is necessary to switch over to a broadcast timer prior to entering the idle state. If the cpuidle driver in use does not provide the new ->enter_freeze callback for any of the idle states, that problem affects suspend-to-idle too, but it is not taken into account after the changes made by commit 38106313. Fix that by changing the definition of cpuidle_enter_freeze() and re-arranging of the code in cpuidle_idle_call(), so the former does not call cpuidle_enter() any more and the fallback case is handled by cpuidle_idle_call() directly. Fixes: 38106313 (PM / sleep: Re-implement suspend-to-idle handling) Reported-and-tested-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 03 3月, 2015 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Move the fallback code path in cpuidle_idle_call() to the end of the function to avoid jumping to a label in an if () branch. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-