1. 20 8月, 2018 1 次提交
  2. 09 4月, 2018 1 次提交
    • R
      sched: idle: Select idle state before stopping the tick · 554c8aa8
      Rafael J. Wysocki 提交于
      In order to address the issue with short idle duration predictions
      by the idle governor after the scheduler tick has been stopped,
      reorder the code in cpuidle_idle_call() so that the governor idle
      state selection runs before tick_nohz_idle_go_idle() and use the
      "nohz" hint returned by cpuidle_select() to decide whether or not
      to stop the tick.
      
      This isn't straightforward, because menu_select() invokes
      tick_nohz_get_sleep_length() to get the time to the next timer
      event and the number returned by the latter comes from
      __tick_nohz_idle_stop_tick().  Fortunately, however, it is possible
      to compute that number without actually stopping the tick and with
      the help of the existing code.
      
      Namely, tick_nohz_get_sleep_length() can be made call
      tick_nohz_next_event(), introduced earlier, to get the time to the
      next non-highres timer event.  If that happens, tick_nohz_next_event()
      need not be called by __tick_nohz_idle_stop_tick() again.
      
      If it turns out that the scheduler tick cannot be stopped going
      forward or the next timer event is too close for the tick to be
      stopped, tick_nohz_get_sleep_length() can simply return the time to
      the next event currently programmed into the corresponding clock
      event device.
      
      In addition to knowing the return value of tick_nohz_next_event(),
      however, tick_nohz_get_sleep_length() needs to know the time to the
      next highres timer event, but with the scheduler tick timer excluded,
      which can be computed with the help of hrtimer_get_next_event().
      
      That minimum of that number and the tick_nohz_next_event() return
      value is the total time to the next timer event with the assumption
      that the tick will be stopped.  It can be returned to the idle
      governor which can use it for predicting idle duration (under the
      assumption that the tick will be stopped) and deciding whether or
      not it makes sense to stop the tick before putting the CPU into the
      selected idle state.
      
      With the above, the sleep_length field in struct tick_sched is not
      necessary any more, so drop it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227Reported-by: NDoug Smythies <dsmythies@telus.net>
      Reported-by: NThomas Ilsche <thomas.ilsche@tu-dresden.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      554c8aa8
  3. 06 4月, 2018 4 次提交
    • R
      cpuidle: Return nohz hint from cpuidle_select() · 45f1ff59
      Rafael J. Wysocki 提交于
      Add a new pointer argument to cpuidle_select() and to the ->select
      cpuidle governor callback to allow a boolean value indicating
      whether or not the tick should be stopped before entering the
      selected state to be returned from there.
      
      Make the ladder governor ignore that pointer (to preserve its
      current behavior) and make the menu governor return 'false" through
      it if:
       (1) the idle exit latency is constrained at 0, or
       (2) the selected state is a polling one, or
       (3) the expected idle period duration is within the tick period
           range.
      
      In addition to that, the correction factor computations in the menu
      governor need to take the possibility that the tick may not be
      stopped into account to avoid artificially small correction factor
      values.  To that end, add a mechanism to record tick wakeups, as
      suggested by Peter Zijlstra, and use it to modify the menu_update()
      behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
      the tick and the return value of tick_nohz_get_sleep_length() is not
      within the tick boundary, the predicted idle duration is likely too
      short, so make menu_update() try to compensate for that by updating
      the governor statistics as though the CPU was idle for a long time.
      
      Since the value returned through the new argument pointer of
      cpuidle_select() is not used by its caller yet, this change by
      itself is not expected to alter the functionality of the code.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      45f1ff59
    • R
      sched: idle: Do not stop the tick before cpuidle_idle_call() · ed98c349
      Rafael J. Wysocki 提交于
      Make cpuidle_idle_call() decide whether or not to stop the tick.
      
      First, the cpuidle_enter_s2idle() path deals with the tick (and with
      the entire timekeeping for that matter) by itself and it doesn't need
      the tick to be stopped beforehand.
      
      Second, to address the issue with short idle duration predictions
      by the idle governor after the tick has been stopped, it will be
      necessary to change the ordering of cpuidle_select() with respect
      to tick_nohz_idle_stop_tick().  To prepare for that, put a
      tick_nohz_idle_stop_tick() call in the same branch in which
      cpuidle_select() is called.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      ed98c349
    • R
      sched: idle: Do not stop the tick upfront in the idle loop · 2aaf709a
      Rafael J. Wysocki 提交于
      Push the decision whether or not to stop the tick somewhat deeper
      into the idle loop.
      
      Stopping the tick upfront leads to unpleasant outcomes in case the
      idle governor doesn't agree with the nohz code on the duration of the
      upcoming idle period.  Specifically, if the tick has been stopped and
      the idle governor predicts short idle, the situation is bad regardless
      of whether or not the prediction is accurate.  If it is accurate, the
      tick has been stopped unnecessarily which means excessive overhead.
      If it is not accurate, the CPU is likely to spend too much time in
      the (shallow, because short idle has been predicted) idle state
      selected by the governor [1].
      
      As the first step towards addressing this problem, change the code
      to make the tick stopping decision inside of the loop in do_idle().
      In particular, do not stop the tick in the cpu_idle_poll() code path.
      Also don't do that in tick_nohz_irq_exit() which doesn't really have
      enough information on whether or not to stop the tick.
      
      Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
      Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdfSuggested-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      2aaf709a
    • R
      time: tick-sched: Reorganize idle tick management code · 0e776768
      Rafael J. Wysocki 提交于
      Prepare the scheduler tick code for reworking the idle loop to
      avoid stopping the tick in some cases.
      
      The idea is to split the nohz idle entry call to decouple the idle
      time stats accounting and preparatory work from the actual tick stop
      code, in order to later be able to delay the tick stop once we reach
      more power-knowledgeable callers.
      
      Move away the tick_nohz_start_idle() invocation from
      __tick_nohz_idle_enter(), rename the latter to
      __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
      as a wrapper around it for calling it from the outside.
      
      Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
      of calling the entire __tick_nohz_idle_enter(), add another wrapper
      disabling and enabling interrupts around tick_nohz_idle_stop_tick()
      and make the current callers of tick_nohz_idle_enter() call it too
      to retain their current functionality.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      0e776768
  4. 04 3月, 2018 2 次提交
    • I
      sched/idle: Merge kernel/sched/idle.c and kernel/sched/idle_task.c · a92057e1
      Ingo Molnar 提交于
      Merge these two small .c modules as they implement two aspects
      of idle task handling.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a92057e1
    • I
      sched/headers: Simplify and clean up header usage in the scheduler · 325ea10c
      Ingo Molnar 提交于
      Do the following cleanups and simplifications:
      
       - sched/sched.h already includes <asm/paravirt.h>, so no need to
         include it in sched/core.c again.
      
       - order the <linux/sched/*.h> headers alphabetically
      
       - add all <linux/sched/*.h> headers to kernel/sched/sched.h
      
       - remove all unnecessary includes from the .c files that
         are already included in kernel/sched/sched.h.
      
      Finally, make all scheduler .c files use a single common header:
      
        #include "sched.h"
      
      ... which now contains a union of the relied upon headers.
      
      This makes the various .c files easier to read and easier to handle.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      325ea10c
  5. 03 3月, 2018 1 次提交
    • I
      sched: Clean up and harmonize the coding style of the scheduler code base · 97fb7a0a
      Ingo Molnar 提交于
      A good number of small style inconsistencies have accumulated
      in the scheduler core, so do a pass over them to harmonize
      all these details:
      
       - fix speling in comments,
      
       - use curly braces for multi-line statements,
      
       - remove unnecessary parentheses from integer literals,
      
       - capitalize consistently,
      
       - remove stray newlines,
      
       - add comments where necessary,
      
       - remove invalid/unnecessary comments,
      
       - align structure definitions and other data types vertically,
      
       - add missing newlines for increased readability,
      
       - fix vertical tabulation where it's misaligned,
      
       - harmonize preprocessor conditional block labeling
         and vertical alignment,
      
       - remove line-breaks where they uglify the code,
      
       - add newline after local variable definitions,
      
      No change in functionality:
      
        md5:
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      97fb7a0a
  6. 26 10月, 2017 1 次提交
    • C
      sched/idle: Micro-optimize the idle loop · 54b933c6
      Cheng Jian 提交于
      Move the loop-invariant calculation of 'cpu' in do_idle() out of the loop body,
      because the current CPU is always constant.
      
      This improves the generated code both on x86-64 and ARM64:
      
      x86-64:
      
      Before patch (execution in loop):
      	864:       0f ae e8                lfence
      	867:       65 8b 05 c2 38 f1 7e    mov %gs:0x7ef138c2(%rip),%eax
      	86e:       89 c0                   mov %eax,%eax
      	870:       48 0f a3 05 68 19 08    bt  %rax,0x1081968(%rip)
      	877:	   01
      
      After patch (execution in loop):
      	872:       0f ae e8                lfence
      	875:       4c 0f a3 25 63 19 08    bt  %r12,0x1081963(%rip)
      	87c:       01
      
      ARM64:
      
      Before patch (execution in loop):
      	c58:       d5033d9f        dsb     ld
      	c5c:       d538d080        mrs     x0, tpidr_el1
      	c60:       b8606a61        ldr     w1, [x19,x0]
      	c64:       1100fc20        add     w0, w1, #0x3f
      	c68:       7100003f        cmp     w1, #0x0
      	c6c:       1a81b000        csel    w0, w0, w1, lt
      	c70:       13067c00        asr     w0, w0, #6
      	c74:       93407c00        sxtw    x0, w0
      	c78:       f8607a80        ldr     x0, [x20,x0,lsl #3]
      	c7c:       9ac12401        lsr     x1, x0, x1
      	c80:       36000581        tbz     w1, #0, d30 <do_idle+0x128>
      
      After patch (execution in loop):
      	c84:       d5033d9f        dsb     ld
      	c88:       f9400260        ldr     x0, [x19]
      	c8c:       ea14001f        tst     x0, x20
      	c90:       54000580        b.eq    d40 <do_idle+0x138>
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      [ Rewrote the title and the changelog. ]
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: huawei.libin@huawei.com
      Cc: xiexiuqi@huawei.com
      Link: http://lkml.kernel.org/r/1508930907-107755-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54b933c6
  7. 10 10月, 2017 1 次提交
  8. 11 8月, 2017 2 次提交
  9. 08 6月, 2017 1 次提交
  10. 15 5月, 2017 1 次提交
    • S
      sched/core: Call __schedule() from do_idle() without enabling preemption · 8663effb
      Steven Rostedt (VMware) 提交于
      I finally got around to creating trampolines for dynamically allocated
      ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
      function hook callbacks, like perf, that allocate the ftrace_ops
      descriptor via kmalloc() and friends, ftrace was not able to optimize
      the functions being traced to use a trampoline because they would also
      need to be allocated dynamically. The problem is that they cannot be
      freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
      was preempted on the trampoline. That was before Paul McKenney
      implemented synchronize_rcu_tasks() that would make sure all tasks
      (except idle) have scheduled out or have entered user space.
      
      While testing this, I triggered this bug:
      
       BUG: unable to handle kernel paging request at ffffffffa0230077
       ...
       RIP: 0010:0xffffffffa0230077
       ...
       Call Trace:
        schedule+0x5/0xe0
        schedule_preempt_disabled+0x18/0x30
        do_idle+0x172/0x220
      
      What happened was that the idle task was preempted on the trampoline.
      As synchronize_rcu_tasks() ignores the idle thread, there's nothing
      that lets ftrace know that the idle task was preempted on a trampoline.
      
      The idle task shouldn't need to ever enable preemption. The idle task
      is simply a loop that calls schedule or places the cpu into idle mode.
      In fact, having preemption enabled is inefficient, because it can
      happen when idle is just about to call schedule anyway, which would
      cause schedule to be called twice. Once for when the interrupt came in
      and was returning back to normal context, and then again in the normal
      path that the idle loop is running in, which would be pointless, as it
      had already scheduled.
      
      The only reason schedule_preempt_disable() enables preemption is to be
      able to call sched_submit_work(), which requires preemption enabled. As
      this is a nop when the task is in the RUNNING state, and idle is always
      in the running state, there's no reason that idle needs to enable
      preemption. But that means it cannot use schedule_preempt_disable() as
      other callers of that function require calling sched_submit_work().
      
      Adding a new function local to kernel/sched/ that allows idle to call
      the scheduler without enabling preemption, fixes the
      synchronize_rcu_tasks() issue, as well as removes the pointless spurious
      schedule calls caused by interrupts happening in the brief window where
      preemption is enabled just before it calls schedule.
      
      Reviewed: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8663effb
  11. 08 3月, 2017 1 次提交
    • J
      livepatch: change to a per-task consistency model · d83a7cb3
      Josh Poimboeuf 提交于
      Change livepatch to use a basic per-task consistency model.  This is the
      foundation which will eventually enable us to patch those ~10% of
      security patches which change function or data semantics.  This is the
      biggest remaining piece needed to make livepatch more generally useful.
      
      This code stems from the design proposal made by Vojtech [1] in November
      2014.  It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
      consistency and syscall barrier switching combined with kpatch's stack
      trace switching.  There are also a number of fallback options which make
      it quite flexible.
      
      Patches are applied on a per-task basis, when the task is deemed safe to
      switch over.  When a patch is enabled, livepatch enters into a
      transition state where tasks are converging to the patched state.
      Usually this transition state can complete in a few seconds.  The same
      sequence occurs when a patch is disabled, except the tasks converge from
      the patched state to the unpatched state.
      
      An interrupt handler inherits the patched state of the task it
      interrupts.  The same is true for forked tasks: the child inherits the
      patched state of the parent.
      
      Livepatch uses several complementary approaches to determine when it's
      safe to patch tasks:
      
      1. The first and most effective approach is stack checking of sleeping
         tasks.  If no affected functions are on the stack of a given task,
         the task is patched.  In most cases this will patch most or all of
         the tasks on the first try.  Otherwise it'll keep trying
         periodically.  This option is only available if the architecture has
         reliable stacks (HAVE_RELIABLE_STACKTRACE).
      
      2. The second approach, if needed, is kernel exit switching.  A
         task is switched when it returns to user space from a system call, a
         user space IRQ, or a signal.  It's useful in the following cases:
      
         a) Patching I/O-bound user tasks which are sleeping on an affected
            function.  In this case you have to send SIGSTOP and SIGCONT to
            force it to exit the kernel and be patched.
         b) Patching CPU-bound user tasks.  If the task is highly CPU-bound
            then it will get patched the next time it gets interrupted by an
            IRQ.
         c) In the future it could be useful for applying patches for
            architectures which don't yet have HAVE_RELIABLE_STACKTRACE.  In
            this case you would have to signal most of the tasks on the
            system.  However this isn't supported yet because there's
            currently no way to patch kthreads without
            HAVE_RELIABLE_STACKTRACE.
      
      3. For idle "swapper" tasks, since they don't ever exit the kernel, they
         instead have a klp_update_patch_state() call in the idle loop which
         allows them to be patched before the CPU enters the idle state.
      
         (Note there's not yet such an approach for kthreads.)
      
      All the above approaches may be skipped by setting the 'immediate' flag
      in the 'klp_patch' struct, which will disable per-task consistency and
      patch all tasks immediately.  This can be useful if the patch doesn't
      change any function or data semantics.  Note that, even with this flag
      set, it's possible that some tasks may still be running with an old
      version of the function, until that function returns.
      
      There's also an 'immediate' flag in the 'klp_func' struct which allows
      you to specify that certain functions in the patch can be applied
      without per-task consistency.  This might be useful if you want to patch
      a common function like schedule(), and the function change doesn't need
      consistency but the rest of the patch does.
      
      For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
      must set patch->immediate which causes all tasks to be patched
      immediately.  This option should be used with care, only when the patch
      doesn't change any function or data semantics.
      
      In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
      may be allowed to use per-task consistency if we can come up with
      another way to patch kthreads.
      
      The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
      is in transition.  Only a single patch (the topmost patch on the stack)
      can be in transition at a given time.  A patch can remain in transition
      indefinitely, if any of the tasks are stuck in the initial patch state.
      
      A transition can be reversed and effectively canceled by writing the
      opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
      the transition is in progress.  Then all the tasks will attempt to
      converge back to the original patch state.
      
      [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.czSigned-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: Ingo Molnar <mingo@kernel.org>        # for the scheduler changes
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      d83a7cb3
  12. 02 3月, 2017 1 次提交
  13. 29 11月, 2016 2 次提交
  14. 08 10月, 2016 1 次提交
  15. 03 6月, 2016 2 次提交
    • G
      sched/idle: Optimize the generic idle loop · df55f462
      Gaurav Jindal (Gaurav Jindal) 提交于
      Currently, smp_processor_id() is used to fetch the current CPU in
      cpu_idle_loop(). Every time the idle thread runs, it fetches the
      current CPU using smp_processor_id().
      
      Since the idle thread is per CPU, the current CPU is constant, so we
      can lift the load out of the loop, saving execution cycles/time in the
      loop.
      
      x86-64:
      
      Before patch (execution in loop):
      	148:    0f ae e8                lfence
      	14b:    65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
      	152:    00
      	153:    89 c0                   mov %eax,%eax
      	155:    49 0f a3 04 24          bt %rax,(%r12)
      
      After patch (execution in loop):
      	150:    0f ae e8                lfence
      	153:    4d 0f a3 34 24          bt %r14,(%r12)
      
      ARM64:
      
      Before patch (execution in loop):
      	168:    d5033d9f        dsb     ld
      	16c:    b9405661        ldr     w1,[x19,#84]
      	170:    1100fc20        add     w0,w1,#0x3f
      	174:    6b1f003f        cmp     w1,wzr
      	178:    1a81b000        csel    w0,w0,w1,lt
      	17c:    130c7000        asr     w0,w0,#6
      	180:    937d7c00        sbfiz   x0,x0,#3,#32
      	184:    f8606aa0        ldr     x0,[x21,x0]
      	188:    9ac12401        lsr     x1,x0,x1
      	18c:    36000e61        tbz     w1,#0,358
      
      After patch (execution in loop):
      	1a8:    d50339df        dsb     ld
      	1ac:    f8776ac0        ldr     x0,[x22,x23]
      	ab0:    ea18001f        tst     x0,x24
      	1b4:    54000ea0        b.eq    388
      
      Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
      called 8672 times. Shifting the code will save instructions executed
      in loop and eventually time as well.
      Signed-off-by: NGaurav Jindal <gaurav.jindal@spreadtrum.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NSanjeev Yadav <sanjeev.yadav@spreadtrum.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160512101330.GA488@gauravjindalubtnb.del.spreadtrum.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      df55f462
    • C
      cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE · 9bd616e3
      Catalin Marinas 提交于
      The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is
      enabled. Commit c8cc7d4d ("sched/idle: Reorganize the idle loop")
      removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the
      compiler optimising away __this_cpu_read(cpuidle_devices). However, with
      CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens
      and the kernel fails to link since cpuidle_devices is not defined.
      
      This patch introduces an accessor function for the current CPU cpuidle
      device (returning NULL when !CONFIG_CPU_IDLE) and uses it in
      cpuidle_idle_call().
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9bd616e3
  16. 02 3月, 2016 3 次提交
    • T
      rcu: Make CPU_DYING_IDLE an explicit call · 27d50c7e
      Thomas Gleixner 提交于
      Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
      invoked at the proper place.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27d50c7e
    • T
      cpu/hotplug: Make wait for dead cpu completion based · e69aab13
      Thomas Gleixner 提交于
      Kill the busy spinning on the control side and just wait for the hotplugged
      cpu to tell that it reached the dead state.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.776157858@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e69aab13
    • T
      cpu/hotplug: Let upcoming cpu bring itself fully up · 8df3e07e
      Thomas Gleixner 提交于
      Let the upcoming cpu kick the hotplug thread and let itself complete the
      bringup. That way the controll side can just wait for the completion or later
      when we made the hotplug machinery async not care at all.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.697655464@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8df3e07e
  17. 22 1月, 2016 1 次提交
  18. 19 1月, 2016 1 次提交
  19. 15 1月, 2016 1 次提交
    • C
      vmstat: make vmstat_updater deferrable again and shut down on idle · 0eb77e98
      Christoph Lameter 提交于
      Currently the vmstat updater is not deferrable as a result of commit
      ba4877b9 ("vmstat: do not use deferrable delayed work for
      vmstat_update").  This in turn can cause multiple interruptions of the
      applications because the vmstat updater may run at
      
      Make vmstate_update deferrable again and provide a function that folds
      the differentials when the processor is going to idle mode thus
      addressing the issue of the above commit in a clean way.
      
      Note that the shepherd thread will continue scanning the differentials
      from another processor and will reenable the vmstat workers if it
      detects any changes.
      
      Fixes: ba4877b9 ("vmstat: do not use deferrable delayed work for vmstat_update")
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eb77e98
  20. 12 10月, 2015 1 次提交
  21. 21 7月, 2015 1 次提交
  22. 15 5月, 2015 2 次提交
  23. 05 5月, 2015 2 次提交
  24. 29 4月, 2015 1 次提交
  25. 03 4月, 2015 1 次提交
  26. 13 3月, 2015 2 次提交
    • P
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney 提交于
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
    • P
      cpu: Make CPU-offline idle-loop transition point more precise · 528a25b0
      Paul E. McKenney 提交于
      This commit uses a per-CPU variable to make the CPU-offline code path
      through the idle loop more precise, so that the outgoing CPU is
      guaranteed to make it into the idle loop before it is powered off.
      This commit is in preparation for putting the RCU offline-handling
      code on this code path, which will eliminate the magic one-jiffy
      wait that RCU uses as the maximum time for an outgoing CPU to get
      all the way through the scheduler.
      
      The magic one-jiffy wait for incoming CPUs remains a separate issue.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      528a25b0
  27. 06 3月, 2015 1 次提交
  28. 03 3月, 2015 1 次提交