1. 23 8月, 2009 1 次提交
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
  2. 09 7月, 2009 1 次提交
  3. 20 6月, 2009 1 次提交
  4. 19 6月, 2009 11 次提交
  5. 22 5月, 2009 1 次提交
    • P
      perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34
      Paul Mackerras 提交于
      This replaces the struct perf_counter_context in the task_struct with
      a pointer to a dynamically allocated perf_counter_context struct.  The
      main reason for doing is this is to allow us to transfer a
      perf_counter_context from one task to another when we do lazy PMU
      switching in a later patch.
      
      This has a few side-benefits: the task_struct becomes a little smaller,
      we save some memory because only tasks that have perf_counters attached
      get a perf_counter_context allocated for them, and we can remove the
      inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
      end up recompiling nearly everything whenever perf_counter.h changes.
      
      The perf_counter_context structures are reference-counted and freed
      when the last reference is dropped.  A context can have references
      from its task and the counters on its task.  Counters can outlive the
      task so it is possible that a context will be freed well after its
      task has exited.
      
      Contexts are allocated on fork if the parent had a context, or
      otherwise the first time that a per-task counter is created on a task.
      In the latter case, we set the context pointer in the task struct
      locklessly using an atomic compare-and-exchange operation in case we
      raced with some other task in creating a context for the subject task.
      
      This also removes the task pointer from the perf_counter struct.  The
      task pointer was not used anywhere and would make it harder to move a
      context from one task to another.  Anything that needed to know which
      task a counter was attached to was already using counter->ctx->task.
      
      The __perf_counter_init_context function moves up in perf_counter.c
      so that it can be called from find_get_context, and now initializes
      the refcount, but is otherwise unchanged.
      
      We were potentially calling list_del_counter twice: once from
      __perf_counter_exit_task when the task exits and once from
      __perf_counter_remove_from_context when the counter's fd gets closed.
      This adds a check in list_del_counter so it doesn't do anything if
      the counter has already been removed from the lists.
      
      Since perf_counter_task_sched_in doesn't do anything if the task doesn't
      have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
      __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
      the case where the current task adds the first counter to itself and
      thus creates a context for itself.
      
      This also adds similar code to __perf_counter_enable to handle a
      similar situation which can arise when the counters have been disabled
      using prctl; that also leaves cpuctx->task_ctx = NULL.
      
      [ Impact: refactor counter context management to prepare for new feature ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a63eaf34
  6. 20 5月, 2009 1 次提交
  7. 17 5月, 2009 2 次提交
    • I
      perf_counter: fix threaded task exit · 0203026b
      Ingo Molnar 提交于
      Flushing counters in __exit_signal() with irqs disabled is not
      a good idea as perf_counter_exit_task() acquires mutexes. So
      flush it before acquiring the tasklist lock.
      
      (Note, we still need a fix for when the PID has been unhashed.)
      
      [ Impact: fix crash with inherited counters ]
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0203026b
    • P
      perf_counter: Fix counter inheritance · 856d56b9
      Peter Zijlstra 提交于
      Srivatsa Vaddagiri reported that a Java workload triggers this
      warning in kernel/exit.c:
      
         WARN_ON_ONCE(!list_empty(&tsk->perf_counter_ctx.counter_list));
      
      Add the inherited counter propagation on self-detach, this could
      cause counter leaks and incomplete stats in threaded code like
      the below:
      
        #include <pthread.h>
        #include <unistd.h>
      
        void *thread(void *arg)
        {
                sleep(5);
                return NULL;
        }
      
        void main(void)
        {
                pthread_t thr;
                pthread_create(&thr, NULL, thread, NULL);
        }
      Reported-by: NSrivatsa Vaddagiri <vatsa@in.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      856d56b9
  8. 01 5月, 2009 1 次提交
    • O
      do_wait: do take security_task_wait() into account · 78a3d9d5
      Oleg Nesterov 提交于
      I was never able to understand what should we actually do when
      security_task_wait() fails, but the current code doesn't look right.
      
      If ->task_wait() returns the error, we update *notask_error correctly.
      But then we either reap the child (despite the fact this was forbidden)
      or clear *notask_error (and hide the securiy policy problems).
      
      This patch assumes that "stolen by ptrace" doesn't matter. If selinux
      denies the child we should ignore it but make sure we report -EACCESS
      instead of -ECHLD if there are no other eligible children.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      78a3d9d5
  9. 15 4月, 2009 2 次提交
    • S
      tracing/events: move trace point headers into include/trace/events · ad8d75ff
      Steven Rostedt 提交于
      Impact: clean up
      
      Create a sub directory in include/trace called events to keep the
      trace point headers in their own separate directory. Only headers that
      declare trace points should be defined in this directory.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad8d75ff
    • S
      tracing: create automated trace defines · a8d154b0
      Steven Rostedt 提交于
      This patch lowers the number of places a developer must modify to add
      new tracepoints. The current method to add a new tracepoint
      into an existing system is to write the trace point macro in the
      trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
      DECLARE_TRACE, then they must add the same named item into the C file
      with the macro DEFINE_TRACE(name) and then add the trace point.
      
      This change cuts out the needing to add the DEFINE_TRACE(name).
      Every file that uses the tracepoint must still include the trace/<type>.h
      file, but the one C file must also add a define before the including
      of that file.
      
       #define CREATE_TRACE_POINTS
       #include <trace/mytrace.h>
      
      This will cause the trace/mytrace.h file to also produce the C code
      necessary to implement the trace point.
      
      Note, if more than one trace/<type>.h is used to create the C code
      it is best to list them all together.
      
       #define CREATE_TRACE_POINTS
       #include <trace/foo.h>
       #include <trace/bar.h>
       #include <trace/fido.h>
      
      Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
      the cleaner solution of the define above the includes over my first
      design to have the C code include a "special" header.
      
      This patch converts sched, irq and lockdep and skb to use this new
      method.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8d154b0
  10. 07 4月, 2009 1 次提交
  11. 03 4月, 2009 10 次提交
    • O
      pids: kill signal_struct-> __pgrp/__session and friends · 1b0f7ffd
      Oleg Nesterov 提交于
      We are wasting 2 words in signal_struct without any reason to implement
      task_pgrp_nr() and task_session_nr().
      
      task_session_nr() has no callers since
      2e2ba22e, we can remove it.
      
      task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
      fs/coda.
      
      This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
      __pgrp/__session and the related helpers.
      
      The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
      anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: Alan Cox <number6@the-village.bc.nu>		[tty parts]
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b0f7ffd
    • O
      pids: improve get_task_pid() to fix the unsafe sys_wait4()->task_pgrp() · 2ae448ef
      Oleg Nesterov 提交于
      sys_wait4() does get_pid(task_pgrp(current)), this is not safe.  We can
      add rcu lock/unlock around, but we already have get_task_pid() which can
      be improved to handle the special pids in more reliable manner.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ae448ef
    • O
      forget_original_parent: do not abuse child->ptrace_entry · 5dfc80be
      Oleg Nesterov 提交于
      By discussion with Roland.
      
      - Use ->sibling instead of ->ptrace_entry to chain the need to be
        release_task'd childs. Nobody else can use ->sibling, this task
        is EXIT_DEAD and nobody can find it on its own list.
      
      - rename ptrace_dead to dead_childs.
      
      - Now that we don't have the "parallel" untrace code, change back
        reparent_thread() to return void, pass dead_childs as an argument.
      
      Actually, I don't understand why do we notify /sbin/init when we
      reparent a zombie, probably it is better to reap it unconditionally.
      
      [akpm@linux-foundation.org: s/childs/children/]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Metzger, Markus T" <markus.t.metzger@intel.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dfc80be
    • O
      forget_original_parent: split out the un-ptrace part · 39c626ae
      Oleg Nesterov 提交于
      By discussion with Roland.
      
      - Rename ptrace_exit() to exit_ptrace(), and change it to do all the
        necessary work with ->ptraced list by its own.
      
      - Move this code from exit.c to ptrace.c
      
      - Update the comment in ptrace_detach() to explain the rechecking of
        the child->ptrace.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Metzger, Markus T" <markus.t.metzger@intel.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39c626ae
    • O
      reparent_thread: fix a zombie leak if /sbin/init ignores SIGCHLD · 7f5d3652
      Oleg Nesterov 提交于
      If /sbin/init ignores SIGCHLD and we re-parent a zombie, it is leaked.
      reparent_thread() does do_notify_parent() which sets ->exit_signal = -1 in
      this case.  This means that nobody except us can reap it, the detached
      task is not visible to do_wait().
      
      Change reparent_thread() to return a boolean (like __pthread_detach) to
      indicate that the thread is dead and must be released.  Also change
      forget_original_parent() to add the child to ptrace_dead list in this
      case.
      
      The naming becomes insane, the next patch does the cleanup.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f5d3652
    • O
      reparent_thread: fix the "is it traced" check · b1442b05
      Oleg Nesterov 提交于
      reparent_thread() uses ptrace_reparented() to check whether this thread is
      ptraced, in that case we should not notify the new parent.
      
      But ptrace_reparented() is not exactly correct when the reparented thread
      is traced by /sbin/init, because forget_original_parent() has already
      changed ->real_parent.
      
      Currently, the only problem is the false notification.  But with the next
      patch the kernel crash in this (yes, pathological) case.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1442b05
    • O
      reparent_thread: don't call kill_orphaned_pgrp() if task_detached() · 0a967a04
      Oleg Nesterov 提交于
      If task_detached(p) == T, then either
      
        a) p is not the main thread, we will find the group leader on the
           ->children list.
      
      or
      
        b) p is the group leader but its ->exit_state = EXIT_DEAD.  This
           can only happen when the last sub-thread has died, but in that case
           that thread has already called kill_orphaned_pgrp() from
           exit_notify().
      
      In both cases kill_orphaned_pgrp() looks bogus.
      
      Move the task_detached() check up and simplify the code, this is also
      right from the "common sense" pov: we should do nothing with the detached
      childs, except move them to the new parent's ->children list.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a967a04
    • O
      ptrace: reintroduce __ptrace_detach() as a callee of ptrace_exit() · b1b4c679
      Oleg Nesterov 提交于
      No functional changes, preparation for the next patch.
      
      Move the "should we release this child" logic into the separate handler,
      __ptrace_detach().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1b4c679
    • O
      ptrace: simplify ptrace_exit()->ignoring_children() path · 6d69cb87
      Oleg Nesterov 提交于
      ignoring_children() takes parent->sighand->siglock and checks
      k_sigaction[SIGCHLD] atomically.  But this buys nothing, we can't get the
      "really" wrong result even if we race with sigaction(SIGCHLD).  If we read
      the "stale" sa_handler/sa_flags we can pretend it was changed right after
      the check.
      
      Remove spin_lock(->siglock), and kill "int ign" which caches the result of
      ignoring_children() which becomes rather trivial.
      
      Perhaps it makes sense to export this helper, do_notify_parent() can use
      it too.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d69cb87
    • O
      do_wait: fix waiting for the group stop with the dead leader · 90bc8d8b
      Oleg Nesterov 提交于
      do_wait(WSTOPPED) assumes that p->state must be == TASK_STOPPED, this is
      not true if the leader is already dead.  Check SIGNAL_STOP_STOPPED instead
      and use signal->group_exit_code.
      
      Trivial test-case:
      
      	void *tfunc(void *arg)
      	{
      		pause();
      		return NULL;
      	}
      
      	int main(void)
      	{
      		pthread_t thr;
      		pthread_create(&thr, NULL, tfunc, NULL);
      		pthread_exit(NULL);
      		return 0;
      	}
      
      It doesn't react to ^Z (and then to ^C or ^\). The task is stopped, but
      bash can't see this.
      
      The bug is very old, and it was reported multiple times. This patch was sent
      more than a year ago (http://marc.info/?t=119713920000003) but it was ignored.
      
      This change also fixes other oddities (but not all) in this area.  For
      example, before this patch:
      
      	$ sleep 100
      	^Z
      	[1]+  Stopped                 sleep 100
      	$ strace -p `pidof sleep`
      	Process 11442 attached - interrupt to quit
      
      strace hangs in do_wait(), because ->exit_code was already consumed by
      bash.  After this patch, strace happily proceeds:
      
      	--- SIGTSTP (Stopped) @ 0 (0) ---
      	restart_syscall(<... resuming interrupted call ...>
      
      To me, this looks much more "natural" and correct.
      
      Another example.  Let's suppose we have the main thread M and sub-thread
      T, the process is stopped, and its parent did wait(WSTOPPED).  Now we can
      ptrace T but not M.  This looks at least strange to me.
      
      Imho, do_wait() should not confuse the per-thread ptrace stops with the
      per-process job control stops.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Kaz Kylheku <kkylheku@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90bc8d8b
  12. 01 4月, 2009 2 次提交
  13. 24 3月, 2009 1 次提交
    • T
      genirq: add threaded interrupt handler support · 3aa551c9
      Thomas Gleixner 提交于
      Add support for threaded interrupt handlers:
      
      A device driver can request that its main interrupt handler runs in a
      thread. To achive this the device driver requests the interrupt with
      request_threaded_irq() and provides additionally to the handler a
      thread function. The handler function is called in hard interrupt
      context and needs to check whether the interrupt originated from the
      device. If the interrupt originated from the device then the handler
      can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is
      returned when no further action is required. IRQ_WAKE_THREAD causes
      the genirq code to invoke the threaded (main) handler. When
      IRQ_WAKE_THREAD is returned handler must have disabled the interrupt
      on the device level. This is mandatory for shared interrupt handlers,
      but we need to do it as well for obscure x86 hardware where disabling
      an interrupt on the IO_APIC level redirects the interrupt to the
      legacy PIC interrupt lines.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      3aa551c9
  14. 05 2月, 2009 1 次提交
    • P
      signal: re-add dead task accumulation stats. · 32bd671d
      Peter Zijlstra 提交于
      We're going to split the process wide cpu accounting into two parts:
      
       - clocks; which can take all the time they want since they run
                 from user context.
      
       - timers; which need constant time tracing but can affort the overhead
                 because they're default off -- and rare.
      
      The clock readout will go back to a full sum of the thread group, for this
      we need to re-add the exit stats that were removed in the initial itimer
      rework (f06febc9: timers: fix itimer/many thread hang).
      
      Furthermore, since that full sum can be rather slow for large thread groups
      and we have the complete dead task stats, revert the do_notify_parent time
      computation.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32bd671d
  15. 14 1月, 2009 3 次提交
  16. 07 1月, 2009 1 次提交