1. 26 7月, 2008 4 次提交
    • P
      pidns: remove now unused kill_proc function · 19b0cfcc
      Pavel Emelyanov 提交于
      This function operated on a pid_t to kill a task, which is no longer valid
      in a containerized system.
      
      It has finally lost all its users and we can safely remove it from the
      tree.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19b0cfcc
    • O
      kill PF_BORROWED_MM in favour of PF_KTHREAD · 246bb0b1
      Oleg Nesterov 提交于
      Kill PF_BORROWED_MM.  Change use_mm/unuse_mm to not play with ->flags, and
      do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users.
      
      No functional changes yet.  But this allows us to do further
      fixes/cleanups.
      
      oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the
      kthreads, this is wrong because of use_mm().  The problem with
      PF_BORROWED_MM is that we need task_lock() to avoid races.  With this
      patch we can check PF_KTHREAD directly, or use a simple lockless helper:
      
      	/* The result must not be dereferenced !!! */
      	struct mm_struct *__get_task_mm(struct task_struct *tsk)
      	{
      		if (tsk->flags & PF_KTHREAD)
      			return NULL;
      		return tsk->mm;
      	}
      
      Note also ecard_task().  It runs with ->mm != NULL, but it's the kernel
      thread without PF_BORROWED_MM.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      246bb0b1
    • O
      introduce PF_KTHREAD flag · 7b34e428
      Oleg Nesterov 提交于
      Introduce the new PF_KTHREAD flag to mark the kernel threads.  It is set
      by INIT_TASK() and copied to the forked childs (we could set it in
      kthreadd() along with PF_NOFREEZE instead).
      
      daemonize() was changed as well.  In that case testing of PF_KTHREAD is
      racy, but daemonize() is hopeless anyway.
      
      This flag is cleared in do_execve(), before search_binary_handler().
      Probably not the best place, we can do this in exec_mmap() or in
      start_thread(), or clear it along with PF_FORKNOEXEC.  But I think this
      doesn't matter in practice, and if do_execve() fails kthread should die
      soon.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b34e428
    • O
      ptrace: give more respect to SIGKILL · 364d3c13
      Oleg Nesterov 提交于
      ptrace_stop() has some complicated checks to prevent the scheduling in the
      TASK_TRACED state with the pending SIGKILL, but these checks are racy, and
      they depend on arch_ptrace_stop_needed().
      
      This patch assumes that the traced task should die asap if it was killed by
      SIGKILL, in that case schedule()->signal_pending_state() has no reason to
      ignore the TASK_WAKEKILL part of TASK_TRACED, and we can kill this nasty
      special case.
      
      Note: do_exit()->ptrace_notify() is special, the killed task can already
      dequeue SIGKILL at this point. Another indication that fatal_signal_pending()
      is not exactly right.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      364d3c13
  2. 25 7月, 2008 1 次提交
  3. 18 7月, 2008 1 次提交
  4. 17 7月, 2008 2 次提交
    • R
      ptrace children revamp · f470021a
      Roland McGrath 提交于
      ptrace no longer fiddles with the children/sibling links, and the
      old ptrace_children list is gone.  Now ptrace, whether of one's own
      children or another's via PTRACE_ATTACH, just uses the new ptraced
      list instead.
      
      There should be no user-visible difference that matters.  The only
      change is the order in which do_wait() sees multiple stopped
      children and stopped ptrace attachees.  Since wait_task_stopped()
      was changed earlier so it no longer reorders the children list, we
      already know this won't cause any new problems.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      f470021a
    • R
      Freezer: Introduce PF_FREEZER_NOSIG · ebb12db5
      Rafael J. Wysocki 提交于
      The freezer currently attempts to distinguish kernel threads from
      user space tasks by checking if their mm pointer is unset and it
      does not send fake signals to kernel threads.  However, there are
      kernel threads, mostly related to networking, that behave like
      user space tasks and may want to be sent a fake signal to be frozen.
      
      Introduce the new process flag PF_FREEZER_NOSIG that will be set
      by default for all kernel threads and make the freezer only send
      fake signals to the tasks having PF_FREEZER_NOSIG unset.  Provide
      the set_freezable_with_signal() function to be called by the kernel
      threads that want to be sent a fake signal for freezing.
      
      This patch should not change the freezer's observable behavior.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ebb12db5
  5. 11 7月, 2008 1 次提交
    • S
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt 提交于
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af52a90a
  6. 27 6月, 2008 3 次提交
  7. 24 6月, 2008 1 次提交
    • R
      sched: add new API sched_setscheduler_nocheck: add a flag to control access checks · 961ccddd
      Rusty Russell 提交于
      Hidehiro Kawai noticed that sched_setscheduler() can fail in
      stop_machine: it calls sched_setscheduler() from insmod, which can
      have CAP_SYS_MODULE without CAP_SYS_NICE.
      
      Two cases could have failed, so are changed to sched_setscheduler_nocheck:
        kernel/softirq.c:cpu_callback()
      	- CPU hotplug callback
        kernel/stop_machine.c:__stop_machine_run()
      	- Called from various places, including modprobe()
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Cc: sugita <yumiko.sugita.yf@hitachi.com>
      Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      961ccddd
  8. 10 6月, 2008 2 次提交
    • D
      sched: prevent bound kthreads from changing cpus_allowed · 9985b0ba
      David Rientjes 提交于
      Kthreads that have called kthread_bind() are bound to specific cpus, so
      other tasks should not be able to change their cpus_allowed from under
      them.  Otherwise, it is possible to move kthreads, such as the migration
      or software watchdog threads, so they are not allowed access to the cpu
      they work on.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9985b0ba
    • O
      sched: fix TASK_WAKEKILL vs SIGKILL race · 16882c1e
      Oleg Nesterov 提交于
      schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
      this allows us to do
      
      	current->state = TASK_INTERRUPTIBLE;
      	schedule();
      
      without fear to sleep with pending signal.
      
      However, the code like
      
      	current->state = TASK_KILLABLE;
      	schedule();
      
      is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
      that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
      schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
      no effect).
      
      Introduce the new helper, signal_pending_state(), and change schedule() to
      use it. Hopefully it will have more users, that is why the task's state is
      passed separately.
      
      Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
      This is needed to preserve the current behaviour (ptrace_notify). I hope
      this check will be removed soon, but this (afaics good) change needs the
      separate discussion.
      
      The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
      basically the same that schedule() does now. However, this patch of course
      bloats schedule().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      16882c1e
  9. 06 6月, 2008 3 次提交
    • G
      sched: fix cpupri hotplug support · 1f11eb6a
      Gregory Haskins 提交于
      The RT folks over at RedHat found an issue w.r.t. hotplug support which
      was traced to problems with the cpupri infrastructure in the scheduler:
      
      https://bugzilla.redhat.com/show_bug.cgi?id=449676
      
      This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
      applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
      kernels mentioned above.
      
      It turned out that the issue was that offline cpus could get inadvertently
      registered with cpupri so that they were erroneously selected during
      migration decisions.  The end result would be an OOPS as the offline cpu
      had tasks routed to it.
      
      This patch generalizes the old join/leave domain interface into an
      online/offline interface, and adjusts the root-domain/hotplug code to
      utilize it.
      
      I was able to easily reproduce the issue prior to this patch, and am no
      longer able to reproduce it after this patch.  I can offline cpus
      indefinately and everything seems to be in working order.
      
      Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
      me in the right direction.  Also thank you to Peter for reviewing the
      early iterations of this patch.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1f11eb6a
    • R
      sched: reorder task_struct to reduce padding on 64bit builds · c7aceaba
      Richard Kennedy 提交于
      This patch removes 24 bytes of padding and allows 1 extra object per
      slab on my fedora based config.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c7aceaba
    • I
      namespacecheck: more sched.c fixes · 554ec22f
      Ingo Molnar 提交于
      [ Stephen Rothwell <sfr@canb.auug.org.au>: build fix ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      554ec22f
  10. 29 5月, 2008 1 次提交
  11. 27 5月, 2008 1 次提交
  12. 25 5月, 2008 2 次提交
  13. 24 5月, 2008 8 次提交
  14. 13 5月, 2008 2 次提交
    • L
      Make 'cond_resched()' nullification depend on PREEMPT_BKL · c714a534
      Linus Torvalds 提交于
      Because it's not correct with a non-preemptable BKL and just causes
      PREEMPT kernels to have longer latencies than non-PREEMPT ones (which is
      obviously not the point of it at all).
      
      Of course, that config option actually got removed as an option earlier,
      so for now this basically disables it entirely, but if BKL preemption is
      ever resurrected it will be a meaningful optimization.  And in the
      meantime, it at least documents the intent of the code, while not doing
      the wrong thing.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c714a534
    • L
      Fix up 'need_resched()' definition · 9404ef02
      Linus Torvalds 提交于
      We should not go through the task pointer to get at the thread info,
      since it's usually cheaper to just access the thread info directly.
      
      So don't make the code look up 'current', when we can just use the
      thread info accessor functions directly.  This generally avoids one
      level of indirection and tends to work better together with code that
      also looks at other thread flags (eg preempt_count).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9404ef02
  15. 12 5月, 2008 1 次提交
    • L
      Add new 'cond_resched_bkl()' helper function · c3921ab7
      Linus Torvalds 提交于
      It acts exactly like a regular 'cond_resched()', but will not get
      optimized away when CONFIG_PREEMPT is set.
      
      Normal kernel code is already preemptable in the presense of
      CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
      02b67cc3 "sched: do not do
      cond_resched() when CONFIG_PREEMPT").
      
      But when wanting to conditionally reschedule while holding a lock, you
      need to use "cond_sched_lock(lock)", and the new function is the BKL
      equivalent of that.
      
      Also make fs/locks.c use it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3921ab7
  16. 06 5月, 2008 3 次提交
  17. 30 4月, 2008 4 次提交