1. 05 2月, 2015 1 次提交
  2. 04 2月, 2015 20 次提交
  3. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  4. 31 1月, 2015 4 次提交
  5. 28 1月, 2015 3 次提交
    • M
      sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask · bb2bc55a
      Mike Galbraith 提交于
      While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink()
      an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it:
      
       CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19
       Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
       task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000
       RIP: 0010:[<ffffffff81073846>]  [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0
       [...]
       Call Trace:
        [<ffffffff810cb82a>] validate_change+0x18a/0x200
        [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720
        [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100
        [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180
        [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0
        [<ffffffff8116ed06>] SyS_write+0x46/0xb0
        [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Fixes: f82f8042 ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb2bc55a
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
    • J
      sched/fair: Avoid using uninitialized variable in preferred_group_nid() · 81907478
      Jan Beulich 提交于
      At least some gcc versions - validly afaict - warn about potentially
      using max_group uninitialized: There's no way the compiler can prove
      that the body of the conditional where it and max_faults get set/
      updated gets executed; in fact, without knowing all the details of
      other scheduler code, I can't prove this either.
      
      Generally the necessary change would appear to be to clear max_group
      prior to entering the inner loop, and break out of the outer loop when
      it ends up being all clear after the inner one. This, however, seems
      inefficient, and afaict the same effect can be achieved by exiting the
      outer loop when max_faults is still zero after the inner loop.
      
      [ mingo: changed the solution to zero initialization: uninitialized_var()
        needs to die, as it's an actively dangerous construct: if in the future
        a known-proven-good piece of code is changed to have a true, buggy
        uninitialized variable, the compiler warning is then supressed...
      
        The better long term solution is to clean up the code flow, so that
        even simple minded compilers (and humans!) are able to read it without
        getting a headache.  ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81907478
  6. 27 1月, 2015 1 次提交
  7. 24 1月, 2015 4 次提交
  8. 23 1月, 2015 3 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
    • L
      smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread() · 4bee9686
      Lai Jiangshan 提交于
      The following race exists in the smpboot percpu threads management:
      
      CPU0	      	   	     CPU1
      cpu_up(2)
        get_online_cpus();
        smpboot_create_threads(2);
      			     smpboot_register_percpu_thread();
      			     for_each_online_cpu();
      			       __smpboot_create_thread();
        __cpu_up(2);
      
      This results in a missing per cpu thread for the newly onlined cpu2 and
      in a NULL pointer dereference on a consecutive offline of that cpu.
      
      Proctect smpboot_register_percpu_thread() with get_online_cpus() to
      prevent that.
      
      [ tglx: Massaged changelog and removed the change in
              smpboot_unregister_percpu_thread() because that's an
              optimization and therefor not stable material. ]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4bee9686
    • D
      x86, mpx: Strictly enforce empty prctl() args · e9d1b4f3
      Dave Hansen 提交于
      Description from Michael Kerrisk.  He suggested an identical patch
      to one I had already coded up and tested.
      
      commit fe3d197f "x86, mpx: On-demand kernel allocation of bounds
      tables" added two new prctl() operations, PR_MPX_ENABLE_MANAGEMENT and
      PR_MPX_DISABLE_MANAGEMENT.  However, no checks were included to ensure
      that unused arguments are zero, as is done in many existing prctl()s
      and as should be done for all new prctl()s. This patch adds the
      required checks.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Suggested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20150108223022.7F56FD13@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e9d1b4f3
  9. 22 1月, 2015 2 次提交
  10. 20 1月, 2015 1 次提交
    • R
      module: fix race in kallsyms resolution during module load success. · c7496379
      Rusty Russell 提交于
      The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
      preemption to walk the modules rather than taking the module_mutex:
      this is because they are used for symbol resolution during oopses.
      
      This works because there are synchronize_sched() and synchronize_rcu()
      in the unload and failure paths.  However, there's one case which doesn't
      have that: the normal case where module loading succeeds, and we free
      the init section.
      
      We don't want a synchronize_rcu() there, because it would slow down
      module loading: this bug was introduced in 2009 to speed module
      loading in the first place.
      
      Thus, we want to do the free in an RCU callback.  We do this in the
      simplest possible way by allocating a new rcu_head: if we put it in
      the module structure we'd have to worry about that getting freed.
      Reported-by: NRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c7496379