1. 18 2月, 2015 6 次提交
  2. 05 2月, 2015 1 次提交
  3. 04 2月, 2015 20 次提交
  4. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  5. 31 1月, 2015 4 次提交
  6. 28 1月, 2015 3 次提交
    • M
      sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask · bb2bc55a
      Mike Galbraith 提交于
      While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink()
      an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it:
      
       CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19
       Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
       task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000
       RIP: 0010:[<ffffffff81073846>]  [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0
       [...]
       Call Trace:
        [<ffffffff810cb82a>] validate_change+0x18a/0x200
        [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720
        [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100
        [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180
        [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0
        [<ffffffff8116ed06>] SyS_write+0x46/0xb0
        [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Fixes: f82f8042 ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb2bc55a
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
    • J
      sched/fair: Avoid using uninitialized variable in preferred_group_nid() · 81907478
      Jan Beulich 提交于
      At least some gcc versions - validly afaict - warn about potentially
      using max_group uninitialized: There's no way the compiler can prove
      that the body of the conditional where it and max_faults get set/
      updated gets executed; in fact, without knowing all the details of
      other scheduler code, I can't prove this either.
      
      Generally the necessary change would appear to be to clear max_group
      prior to entering the inner loop, and break out of the outer loop when
      it ends up being all clear after the inner one. This, however, seems
      inefficient, and afaict the same effect can be achieved by exiting the
      outer loop when max_faults is still zero after the inner loop.
      
      [ mingo: changed the solution to zero initialization: uninitialized_var()
        needs to die, as it's an actively dangerous construct: if in the future
        a known-proven-good piece of code is changed to have a true, buggy
        uninitialized variable, the compiler warning is then supressed...
      
        The better long term solution is to clean up the code flow, so that
        even simple minded compilers (and humans!) are able to read it without
        getting a headache.  ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81907478
  7. 27 1月, 2015 1 次提交
  8. 24 1月, 2015 4 次提交