1. 09 3月, 2015 1 次提交
    • F
      context_tracking: Rename context symbols to prepare for transition state · c467ea76
      Frederic Weisbecker 提交于
      Current context tracking symbols are designed to express living state.
      As such they are prefixed with "IN_": IN_USER, IN_KERNEL.
      
      Now we are going to use these symbols to also express state transitions
      such as context_tracking_enter(IN_USER) or context_tracking_exit(IN_USER).
      But while the "IN_" prefix works well to express entering a context, it's
      confusing to depict a context exit: context_tracking_exit(IN_USER)
      could mean two things:
      	1) We are exiting the current context to enter user context.
      	2) We are exiting the user context
      We want 2) but the reviewer may be confused and understand 1)
      
      So lets disambiguate these symbols and rename them to CONTEXT_USER and
      CONTEXT_KERNEL.
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will deacon <will.deacon@arm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      c467ea76
  2. 18 2月, 2015 9 次提交
  3. 14 2月, 2015 2 次提交
    • T
      sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks · 333470ee
      Tejun Heo 提交于
      printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
      and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
      respectively which can be used to generate the two printf arguments
      necessary to format the specified cpu/nodemask.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      333470ee
    • R
      PM / sleep: Re-implement suspend-to-idle handling · 38106313
      Rafael J. Wysocki 提交于
      In preparation for adding support for quiescing timers in the final
      stage of suspend-to-idle transitions, rework the freeze_enter()
      function making the system wait on a wakeup event, the freeze_wake()
      function terminating the suspend-to-idle loop and the mechanism by
      which deep idle states are entered during suspend-to-idle.
      
      First of all, introduce a simple state machine for suspend-to-idle
      and make the code in question use it.
      
      Second, prevent freeze_enter() from losing wakeup events due to race
      conditions and ensure that the number of online CPUs won't change
      while it is being executed.  In addition to that, make it force
      all of the CPUs re-enter the idle loop in case they are in idle
      states already (so they can enter deeper idle states if possible).
      
      Next, drop cpuidle_use_deepest_state() and replace use_deepest_state
      checks in cpuidle_select() and cpuidle_reflect() with a single
      suspend-to-idle state check in cpuidle_idle_call().
      
      Finally, introduce cpuidle_enter_freeze() that will simply find the
      deepest idle state available to the given CPU and enter it using
      cpuidle_enter().
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      38106313
  4. 13 2月, 2015 1 次提交
    • C
      kernel/sched/clock.c: add another clock for use with the soft lockup watchdog · 545a2bf7
      Cyril Bur 提交于
      When the hypervisor pauses a virtualised kernel the kernel will observe a
      jump in timebase, this can cause spurious messages from the softlockup
      detector.
      
      Whilst these messages are harmless, they are accompanied with a stack
      trace which causes undue concern and more problematically the stack trace
      in the guest has nothing to do with the observed problem and can only be
      misleading.
      
      Futhermore, on POWER8 this is completely avoidable with the introduction
      of the Virtual Time Base (VTB) register.
      
      This patch (of 2):
      
      This permits the use of arch specific clocks for which virtualised kernels
      can use their notion of 'running' time, not the elpased wall time which
      will include host execution time.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Andrew Jones <drjones@redhat.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Ben Zhang <benzh@chromium.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      545a2bf7
  5. 04 2月, 2015 9 次提交
  6. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  7. 31 1月, 2015 4 次提交
  8. 29 1月, 2015 1 次提交
  9. 28 1月, 2015 2 次提交
    • M
      sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask · bb2bc55a
      Mike Galbraith 提交于
      While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink()
      an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it:
      
       CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19
       Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
       task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000
       RIP: 0010:[<ffffffff81073846>]  [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0
       [...]
       Call Trace:
        [<ffffffff810cb82a>] validate_change+0x18a/0x200
        [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720
        [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100
        [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180
        [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0
        [<ffffffff8116ed06>] SyS_write+0x46/0xb0
        [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Fixes: f82f8042 ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb2bc55a
    • J
      sched/fair: Avoid using uninitialized variable in preferred_group_nid() · 81907478
      Jan Beulich 提交于
      At least some gcc versions - validly afaict - warn about potentially
      using max_group uninitialized: There's no way the compiler can prove
      that the body of the conditional where it and max_faults get set/
      updated gets executed; in fact, without knowing all the details of
      other scheduler code, I can't prove this either.
      
      Generally the necessary change would appear to be to clear max_group
      prior to entering the inner loop, and break out of the outer loop when
      it ends up being all clear after the inner one. This, however, seems
      inefficient, and afaict the same effect can be achieved by exiting the
      outer loop when max_faults is still zero after the inner loop.
      
      [ mingo: changed the solution to zero initialization: uninitialized_var()
        needs to die, as it's an actively dangerous construct: if in the future
        a known-proven-good piece of code is changed to have a true, buggy
        uninitialized variable, the compiler warning is then supressed...
      
        The better long term solution is to clean up the code flow, so that
        even simple minded compilers (and humans!) are able to read it without
        getting a headache.  ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81907478
  10. 14 1月, 2015 9 次提交
  11. 09 1月, 2015 1 次提交
    • T
      sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group() · 7f1a169b
      Tetsuo Handa 提交于
      When alloc_fair_sched_group() in sched_create_group() fails,
      free_sched_group() is called, and free_fair_sched_group() is called by
      free_sched_group(). Since destroy_cfs_bandwidth() is called by
      free_fair_sched_group() without calling init_cfs_bandwidth(),
      RCU stall occurs at hrtimer_cancel():
      
        INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=13074 c=13073 q=0)
        Task dump for CPU 1:
        (fprintd)       R  running task        0  6249      1 0x00000088
        ...
        Call Trace:
         <IRQ>  [<ffffffff81094988>] sched_show_task+0xa8/0x110
         [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50
         [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0
         [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700
         [<ffffffff810cbf2b>] update_process_times+0x4b/0x80
         [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50
         [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70
         [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0
         [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50
         [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230
         [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70
         [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60
         [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80
         <EOI>  [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50
         [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
         [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0
         [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
         [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30
         [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0
         [<ffffffff8108df46>] free_sched_group+0x16/0x40
         [<ffffffff810971bb>] sched_create_group+0x4b/0x80
         [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0
         [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110
         [<ffffffff81647729>] system_call_fastpath+0x12/0x17
      
      Check whether init_cfs_bandwidth() was called before calling
      destroy_cfs_bandwidth().
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      [ Move the check into destroy_cfs_bandwidth() to aid compilability. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f1a169b