1. 15 3月, 2008 1 次提交
    • H
      sched: fix race in schedule() · 0e1f3483
      Hiroshi Shimamoto 提交于
      Fix a hard to trigger crash seen in the -rt kernel that also affects
      the vanilla scheduler.
      
      There is a race condition between schedule() and some dequeue/enqueue
      functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
      
      When scheduling to idle, idle_balance() is called to pull tasks from
      other busy processor. It might drop the rq lock. It means that those 3
      functions encounter on_rq=0 and running=1. The current task should be
      put when running.
      
      Here is a possible scenario:
      
         CPU0                               CPU1
          |                              schedule()
          |                              ->deactivate_task()
          |                              ->idle_balance()
          |                              -->load_balance_newidle()
      rt_mutex_setprio()                     |
          |                              --->double_lock_balance()
          *get lock                          *rel lock
          * on_rq=0, ruuning=1               |
          * sched_class is changed           |
          *rel lock                          *get lock
          :                                  |
                                             :
                                         ->put_prev_task_rt()
                                         ->pick_next_task_fair()
                                             => panic
      
      The current process of CPU1(P1) is scheduling. Deactivated P1, and the
      scheduler looks for another process on other CPU's runqueue because CPU1
      will be idle. idle_balance(), load_balance_newidle() and
      double_lock_balance() are called and double_lock_balance() could drop
      the rq lock. On the other hand, CPU0 is trying to boost the priority of
      P1. The result of boosting only P1's prio and sched_class are changed to
      RT. The sched entities of P1 and P1's group are never put. It makes
      cfs_rq invalid, because the cfs_rq has curr and no leaf, but
      pick_next_task_fair() is called, then the kernel panics.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0e1f3483
  2. 13 3月, 2008 1 次提交
  3. 12 3月, 2008 1 次提交
    • R
      Hibernation: Fix mark_nosave_pages() · a82f7119
      Rafael J. Wysocki 提交于
      There is a problem in the hibernation code that triggers on some NUMA
      systems on which pfn_valid() returns 'true' for some PFNs that don't
      belong to any zone.  Namely, there is a BUG_ON() in
      memory_bm_find_bit() that triggers for PFNs not belonging to any
      zone and passing the pfn_valid() test.  On the affected systems it
      triggers when we mark PFNs reported by the platform as not saveable,
      because the PFNs in question belong to a region mapped directly using
      iorepam() (i.e. the ACPI data area) and they pass the pfn_valid()
      test.
      
      Modify memory_bm_find_bit() so that it returns an error if given PFN
      doesn't belong to any zone instead of crashing the kernel and ignore
      the result returned by it in mark_nosave_pages(), while marking the
      "nosave" memory regions.
      
      This doesn't affect the hibernation functionality, as we won't touch
      the PFNs in question anyway.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=9966 .
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a82f7119
  4. 11 3月, 2008 5 次提交
    • G
      keep rd->online and cpu_online_map in sync · 08f503b0
      Gregory Haskins 提交于
      It is possible to allow the root-domain cache of online cpus to
      become out of sync with the global cpu_online_map.  This is because we
      currently trigger removal of cpus too early in the notifier chain.
      Other DOWN_PREPARE handlers may in fact run and reconfigure the
      root-domain topology, thereby stomping on our own offline handling.
      
      The end result is that rd->online may become out of sync with
      cpu_online_map, which results in potential task misrouting.
      
      So change the offline handling to be more tightly coupled with the
      global offline process by triggering on CPU_DYING intead of
      CPU_DOWN_PREPARE.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08f503b0
    • G
      Revert "cpu hotplug: adjust root-domain->online span in response to hotplug event" · 1f94ef59
      Gregory Haskins 提交于
      This reverts commit 393d94d9.
      
      Lets fix this right.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f94ef59
    • P
      rcu: move PREEMPT_RCU config option back under PREEMPT · 21bbb39c
      Paul E. McKenney 提交于
      The original preemptible-RCU patch put the choice between classic and
      preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
      on machines not supporting CONFIG_PREEMPT.  This choice was therefore moved to
      init/Kconfig, which worked, but placed the choice between classic and
      preemptible RCU at the top level, a very obtuse choice indeed.
      
      This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
      only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
      kernel/Kconfig.preempt, where one would expect it to be.  The other
      (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
      architectures, hopefully avoiding build breakage.  Thanks to Roman Zippel for
      suggesting this approach.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21bbb39c
    • A
      modules: warn about suspicious return values from module's ->init() hook · e24e2e64
      Alexey Dobriyan 提交于
      Return value convention of module's init functions is 0/-E.  Sometimes,
      e.g.  during forward-porting mistakes happen and buggy module created,
      where result of comparison "workqueue != NULL" is propagated all the way up
      to sys_init_module.  What happens is that some other module created
      workqueue in question, our module created it again and module was
      successfully loaded.
      
      Or it could be some other bug.
      
      Let's make such mistakes much more visible.  In retrospective, such
      messages would noticeably shorten some of my head-scratching sessions.
      
      Note, that dump_stack() is just a way to get attention from user.  Sample
      message:
      
      sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention
      sys_init_module: loading module anyway...
      Pid: 4223, comm: modprobe Not tainted 2.6.24-25f66630 #5
      
      Call Trace:
       [<ffffffff80254b05>] sys_init_module+0xe5/0x1d0
       [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e24e2e64
    • R
      modules: fix module waiting for dependent modules' init · 6c5db22d
      Rusty Russell 提交于
      Commit c9a3ba55 (module: wait for dependent modules doing init.) didn't quite
      work because the waiter holds the module lock, meaning that the state of the
      module it's waiting for cannot change.
      
      Fortunately, it's fairly simple to update the state outside the lock and do
      the wakeup.
      
      Thanks to Jan Glauber for tracking this down and testing (qdio and qeth).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jan Glauber <jang@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c5db22d
  5. 10 3月, 2008 1 次提交
  6. 09 3月, 2008 4 次提交
  7. 07 3月, 2008 6 次提交
  8. 06 3月, 2008 1 次提交
  9. 05 3月, 2008 7 次提交
  10. 04 3月, 2008 4 次提交
    • R
      freezer vs stopped or traced · 13b1c3d4
      Roland McGrath 提交于
      This changes the "freezer" code used by suspend/hibernate in its treatment
      of tasks in TASK_STOPPED (job control stop) and TASK_TRACED (ptrace) states.
      
      As I understand it, the intent of the "freezer" is to hold all tasks
      from doing anything significant.  For this purpose, TASK_STOPPED and
      TASK_TRACED are "frozen enough".  It's possible the tasks might resume
      from ptrace calls (if the tracer were unfrozen) or from signals
      (including ones that could come via timer interrupts, etc).  But this
      doesn't matter as long as they quickly block again while "freezing" is
      in effect.  Some minor adjustments to the signal.c code make sure that
      try_to_freeze() very shortly follows all wakeups from both kinds of
      stop.  This lets the freezer code safely leave stopped tasks unmolested.
      
      Changing this fixes the longstanding bug of seeing after resuming from
      suspend/hibernate your shell report "[1] Stopped" and the like for all
      your jobs stopped by ^Z et al, as if you had freshly fg'd and ^Z'd them.
      It also removes from the freezer the arcane special case treatment for
      ptrace'd tasks, which relied on intimate knowledge of ptrace internals.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13b1c3d4
    • O
      exit_notify: fix kill_orphaned_pgrp() usage with mt exit · 821c7de7
      Oleg Nesterov 提交于
      1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we
         should do this only when the whole process exits.
      
      2. exit_notify() uses "current" as "ignored_task", obviously wrong.
         Use ->group_leader instead.
      
      Test case:
      
      	void hup(int sig)
      	{
      		printf("HUP received\n");
      	}
      
      	void *tfunc(void *arg)
      	{
      		sleep(2);
      		printf("sub-thread exited\n");
      		return NULL;
      	}
      
      	int main(int argc, char *argv[])
      	{
      		if (!fork()) {
      			signal(SIGHUP, hup);
      			kill(getpid(), SIGSTOP);
      			exit(0);
      		}
      
      		pthread_t thr;
      		pthread_create(&thr, NULL, tfunc, NULL);
      
      		sleep(1);
      		printf("main thread exited\n");
      		syscall(__NR_exit, 0);
      
      		return 0;
      	}
      
      output:
      
      	main thread exited
      	HUP received
      	Hangup
      
      With this patch the output is:
      
      	main thread exited
      	sub-thread exited
      	HUP received
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      821c7de7
    • O
      will_become_orphaned_pgrp: partially fix insufficient ->exit_state check · 05e83df6
      Oleg Nesterov 提交于
      p->exit_state != 0 doesn't mean this process is dead, it may have
      sub-threads.  Change the code to use "p->exit_state && thread_group_empty(p)"
      instead.
      
      Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process
      if the main thread has exited.
      
      However, the new check is not perfect either.  There is a window when
      exit_notify() drops tasklist and before release_task().  Suppose that
      the last (non-leader) thread exits.  This means that entire group exits,
      but thread_group_empty() is not true yet.
      
      As Eric pointed out, is_global_init() is wrong as well, but I did not
      dare to do other changes.
      
      Just for the record, has_stopped_jobs() is absolutely wrong too.  But we
      can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues.
      
      Even with this patch ^Z doesn't play well with the dead main thread.
      The task is stopped correctly but do_wait(WSTOPPED) won't see it.  This
      is another unrelated issue, will be (hopefully) fixed separately.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05e83df6
    • O
      introduce kill_orphaned_pgrp() helper · f49ee505
      Oleg Nesterov 提交于
      Factor out the common code in reparent_thread() and exit_notify().
      
      No functional changes.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f49ee505
  11. 01 3月, 2008 7 次提交
  12. 26 2月, 2008 2 次提交