1. 05 2月, 2015 1 次提交
  2. 04 2月, 2015 20 次提交
  3. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  4. 31 1月, 2015 4 次提交
  5. 28 1月, 2015 3 次提交
    • M
      sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask · bb2bc55a
      Mike Galbraith 提交于
      While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink()
      an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it:
      
       CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19
       Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
       task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000
       RIP: 0010:[<ffffffff81073846>]  [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0
       [...]
       Call Trace:
        [<ffffffff810cb82a>] validate_change+0x18a/0x200
        [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720
        [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100
        [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180
        [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0
        [<ffffffff8116ed06>] SyS_write+0x46/0xb0
        [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Fixes: f82f8042 ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb2bc55a
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
    • J
      sched/fair: Avoid using uninitialized variable in preferred_group_nid() · 81907478
      Jan Beulich 提交于
      At least some gcc versions - validly afaict - warn about potentially
      using max_group uninitialized: There's no way the compiler can prove
      that the body of the conditional where it and max_faults get set/
      updated gets executed; in fact, without knowing all the details of
      other scheduler code, I can't prove this either.
      
      Generally the necessary change would appear to be to clear max_group
      prior to entering the inner loop, and break out of the outer loop when
      it ends up being all clear after the inner one. This, however, seems
      inefficient, and afaict the same effect can be achieved by exiting the
      outer loop when max_faults is still zero after the inner loop.
      
      [ mingo: changed the solution to zero initialization: uninitialized_var()
        needs to die, as it's an actively dangerous construct: if in the future
        a known-proven-good piece of code is changed to have a true, buggy
        uninitialized variable, the compiler warning is then supressed...
      
        The better long term solution is to clean up the code flow, so that
        even simple minded compilers (and humans!) are able to read it without
        getting a headache.  ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81907478
  6. 27 1月, 2015 1 次提交
  7. 23 1月, 2015 2 次提交
  8. 22 1月, 2015 2 次提交
  9. 20 1月, 2015 4 次提交
    • R
      module: fix race in kallsyms resolution during module load success. · c7496379
      Rusty Russell 提交于
      The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
      preemption to walk the modules rather than taking the module_mutex:
      this is because they are used for symbol resolution during oopses.
      
      This works because there are synchronize_sched() and synchronize_rcu()
      in the unload and failure paths.  However, there's one case which doesn't
      have that: the normal case where module loading succeeds, and we free
      the init section.
      
      We don't want a synchronize_rcu() there, because it would slow down
      module loading: this bug was introduced in 2009 to speed module
      loading in the first place.
      
      Thus, we want to do the free in an RCU callback.  We do this in the
      simplest possible way by allocating a new rcu_head: if we put it in
      the module structure we'd have to worry about that getting freed.
      Reported-by: NRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c7496379
    • R
      module: remove mod arg from module_free, rename module_memfree(). · be1f221c
      Rusty Russell 提交于
      Nothing needs the module pointer any more, and the next patch will
      call it from RCU, where the module itself might no longer exist.
      Removing the arg is the safest approach.
      
      This just codifies the use of the module_alloc/module_free pattern
      which ftrace and bpf use.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Cc: netdev@vger.kernel.org
      be1f221c
    • R
      module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded
      Rusty Russell 提交于
      Archs have been abusing module_free() to clean up their arch-specific
      allocations.  Since module_free() is also (ab)used by BPF and trace code,
      let's keep it to simple allocations, and provide a hook called before
      that.
      
      This means that avr32, ia64, parisc and s390 no longer need to implement
      their own module_free() at all.  avr32 doesn't need module_finalize()
      either.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      d453cded
    • R
      param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC · c772be52
      Rusty Russell 提交于
      ignore_lockdep is uninitialized, and sysfs_attr_init() doesn't initialize
      it, so memset to 0.
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c772be52
  10. 19 1月, 2015 1 次提交
    • M
      futex: Fix argument handling in futex_lock_pi() calls · 996636dd
      Michael Kerrisk 提交于
      This patch fixes two separate buglets in calls to futex_lock_pi():
      
        * Eliminate unused 'detect' argument
        * Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL
      
      The 'detect' argument of futex_lock_pi() seems never to have been
      used (when it was included with the initial PI mutex implementation
      in Linux 2.6.18, all checks against its value were disabled by
      ANDing against 0 (i.e., if (detect... && 0)), and with
      commit 778e9a9c, any mention of
      this argument in futex_lock_pi() went way altogether. Its presence
      now serves only to confuse readers of the code, by giving the
      impression that the futex() FUTEX_LOCK_PI operation actually does
      use the 'val' argument. This patch removes the argument.
      
      The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes
      'timeout' as one of its arguments. This misleads the reader into thinking
      that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible
      purpose; but it does not.  Indeed, it cannot, because the checks at the
      start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations
      that do copy_from_user() on the timeout argument. So, in the
      FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change
      'timeout' to 'NULL'. This patch does that.
      Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NDarren Hart <darren@dvhart.com>
      Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      996636dd
  11. 17 1月, 2015 1 次提交