1. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  2. 02 9月, 2009 1 次提交
    • D
      CRED: Add some configurable debugging [try #6] · e0e81739
      David Howells 提交于
      Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
      for credential management.  The additional code keeps track of the number of
      pointers from task_structs to any given cred struct, and checks to see that
      this number never exceeds the usage count of the cred struct (which includes
      all references, not just those from task_structs).
      
      Furthermore, if SELinux is enabled, the code also checks that the security
      pointer in the cred struct is never seen to be invalid.
      
      This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
      kernel thread on seeing cred->security be a NULL pointer (it appears that the
      credential struct has been previously released):
      
      	http://www.kerneloops.org/oops.php?number=252883Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      e0e81739
  3. 27 8月, 2009 1 次提交
    • O
      clone(): fix race between copy_process() and de_thread() · 4ab6c083
      Oleg Nesterov 提交于
      Spotted by Hiroshi Shimamoto who also provided the test-case below.
      
      copy_process() uses signal->count as a reference counter, but it is not.
      This test case
      
      	#include <sys/types.h>
      	#include <sys/wait.h>
      	#include <unistd.h>
      	#include <stdio.h>
      	#include <errno.h>
      	#include <pthread.h>
      
      	void *null_thread(void *p)
      	{
      		for (;;)
      			sleep(1);
      
      		return NULL;
      	}
      
      	void *exec_thread(void *p)
      	{
      		execl("/bin/true", "/bin/true", NULL);
      
      		return null_thread(p);
      	}
      
      	int main(int argc, char **argv)
      	{
      		for (;;) {
      			pid_t pid;
      			int ret, status;
      
      			pid = fork();
      			if (pid < 0)
      				break;
      
      			if (!pid) {
      				pthread_t tid;
      
      				pthread_create(&tid, NULL, exec_thread, NULL);
      				for (;;)
      					pthread_create(&tid, NULL, null_thread, NULL);
      			}
      
      			do {
      				ret = waitpid(pid, &status, 0);
      			} while (ret == -1 && errno == EINTR);
      		}
      
      		return 0;
      	}
      
      quickly creates an unkillable task.
      
      If copy_process(CLONE_THREAD) races with de_thread()
      copy_signal()->atomic(signal->count) breaks the signal->notify_count
      logic, and the execing thread can hang forever in kernel space.
      
      Change copy_process() to increment count/live only when we know for sure
      we can't fail.  In this case the forked thread will take care of its
      reference to signal correctly.
      
      If copy_process() fails, check CLONE_THREAD flag.  If it it set - do
      nothing, the counters were not changed and current belongs to the same
      thread group.  If it is not set, ->signal must be released in any case
      (and ->count must be == 1), the forked child is the only thread in the
      thread group.
      
      We need more cleanups here, in particular signal->count should not be used
      by de_thread/__exit_signal at all.  This patch only fixes the bug.
      Reported-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Tested-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ab6c083
  4. 23 8月, 2009 1 次提交
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
  5. 19 8月, 2009 1 次提交
    • K
      mm: revert "oom: move oom_adj value" · 0753ba01
      KOSAKI Motohiro 提交于
      The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
      the mm_struct.  It was a very good first step for sanitize OOM.
      
      However Paul Menage reported the commit makes regression to his job
      scheduler.  Current OOM logic can kill OOM_DISABLED process.
      
      Why? His program has the code of similar to the following.
      
      	...
      	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
      	...
      	if (vfork() == 0) {
      		set_oom_adj(0); /* Invoked child can be killed */
      		execve("foo-bar-cmd");
      	}
      	....
      
      vfork() parent and child are shared the same mm_struct.  then above
      set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
      change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
      lost OOM immune and it was killed.
      
      Actually, fork-setting-exec idiom is very frequently used in userland program.
      We must not break this assumption.
      
      Then, this patch revert commit 2ff05b2b and related commit.
      
      Reverted commit list
      ---------------------
      - commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
      - commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
      - commit 81236810 (oom: only oom kill exiting tasks with attached memory)
      - commit 933b787b (mm: copy over oom_adj value at fork time)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0753ba01
  6. 08 8月, 2009 1 次提交
    • E
      execve: must clear current->clear_child_tid · 9c8a8228
      Eric Dumazet 提交于
      While looking at Jens Rosenboom bug report
      (http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
      a dying "ps" program, we found following problem.
      
      clone() syscall has special support for TID of created threads.  This
      support includes two features.
      
      One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
      TID value.
      
      One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
      thread dies.
      
      The integer location is a user provided pointer, provided at clone()
      time.
      
      kernel keeps this pointer value into current->clear_child_tid.
      
      At execve() time, we should make sure kernel doesnt keep this user
      provided pointer, as full user memory is replaced by a new one.
      
      As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
      CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
      memory in forked processes.
      
      Following sequence could happen:
      
      1) bash (or any program) starts a new process, by a fork() call that
         glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
         ...) syscall
      
      2) When new process starts, its current->clear_child_tid is set to a
         location that has a meaning only in bash (or initial program) context
         (&THREAD_SELF->tid)
      
      3) This new process does the execve() syscall to start a new program.
         current->clear_child_tid is left unchanged (a non NULL value)
      
      4) If this new program creates some threads, and initial thread exits,
         kernel will attempt to clear the integer pointed by
         current->clear_child_tid from mm_release() :
      
              if (tsk->clear_child_tid
                  && !(tsk->flags & PF_SIGNALED)
                  && atomic_read(&mm->mm_users) > 1) {
                      u32 __user * tidptr = tsk->clear_child_tid;
                      tsk->clear_child_tid = NULL;
      
                      /*
                       * We don't check the error code - if userspace has
                       * not set up a proper pointer then tough luck.
                       */
      << here >>      put_user(0, tidptr);
                      sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
              }
      
      5) OR : if new program is not multi-threaded, but spied by /proc/pid
         users (ps command for example), mm_users > 1, and the exiting program
         could corrupt 4 bytes in a persistent memory area (shm or memory mapped
         file)
      
      If current->clear_child_tid points to a writeable portion of memory of the
      new program, kernel happily and silently corrupts 4 bytes of memory, with
      unexpected effects.
      
      Fix is straightforward and should not break any sane program.
      Reported-by: NJens Rosenboom <jens@mcbone.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c8a8228
  7. 02 8月, 2009 1 次提交
    • P
      perf_counter: Full task tracing · 9f498cc5
      Peter Zijlstra 提交于
      In order to be able to distinguish between no samples due to
      inactivity and no samples due to task ended, Arjan asked for
      PERF_EVENT_EXIT events. This is useful to the boot delay
      instrumentation (bootchart) app.
      
      This patch changes the PERF_EVENT_FORK to be emitted on every
      clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
      after the task's counters have been closed.
      
      This task tracing is controlled through: attr.comm || attr.mmap
      and through the new attr.task field.
      Suggested-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ cleaned up perf_counter.h a bit ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f498cc5
  8. 30 7月, 2009 1 次提交
  9. 18 7月, 2009 1 次提交
  10. 09 7月, 2009 1 次提交
  11. 19 6月, 2009 1 次提交
  12. 15 6月, 2009 1 次提交
    • V
      kmemcheck: add mm functions · 2dff4405
      Vegard Nossum 提交于
      With kmemcheck enabled, the slab allocator needs to do this:
      
      1. Tell kmemcheck to allocate the shadow memory which stores the status of
         each byte in the allocation proper, e.g. whether it is initialized or
         uninitialized.
      2. Tell kmemcheck which parts of memory that should be marked uninitialized.
         There are actually a few more states, such as "not yet allocated" and
         "recently freed".
      
      If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
      memory that can take page faults because of kmemcheck.
      
      If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
      request memory with the __GFP_NOTRACK flag. This does not prevent the page
      faults from occuring, however, but marks the object in question as being
      initialized so that no warnings will ever be produced for this object.
      
      In addition to (and in contrast to) __GFP_NOTRACK, the
      __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should
      not be tracked _because_ it would produce a false positive. Their values
      are identical, but need not be so in the future (for example, we could now
      enable/disable false positives with a config option).
      
      Parts of this patch were contributed by Pekka Enberg but merged for
      atomicity.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      2dff4405
  13. 05 6月, 2009 1 次提交
    • O
      ptrace: tracehook_report_clone: fix false positives · 087eb437
      Oleg Nesterov 提交于
      The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
      
      - If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
        we must not queue SIGSTOP.
      
      - If we forked the traced task, but the tracer exits and untraces both the
        forking task and the new child (after copy_process() drops tasklist_lock),
        we should not queue SIGSTOP too.
      
      Change the code to check task_ptrace() != 0 instead. This is still racy, but
      the race is harmless.
      
      We can race with another tracer attaching to this child, or the tracer can
      exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
      yet, the child must have the pending SIGSTOP anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      087eb437
  14. 04 6月, 2009 1 次提交
  15. 03 6月, 2009 2 次提交
    • P
      perf_counter: Add a comm hook for pure fork()s · 226f62fd
      Peter Zijlstra 提交于
      I noticed missing COMM events and found that we missed
      reporting them for pure forks.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      226f62fd
    • S
      function-graph: move initialization of new tasks up in fork · f7e8b616
      Steven Rostedt 提交于
      When the function graph tracer is enabled, all new tasks must allocate
      a ret_stack to place the return address of functions. This is because
      the function graph tracer will replace the real return address with a
      call to the tracing of the exit function.
      
      This initialization happens in fork, but it happens too late. If fork
      fails, then it will call free_task and that calls the freeing of this
      ret_stack. But before initialization happens, the new (failed) task
      points to its parents ret_stack. If a fork failure happens during
      the function trace, it would be catastrophic for the parent.
      
      Also, there's no need to call ftrace_graph_exit_task from fork, since
      it is called by free_task which fork calls on failure.
      
      [ Impact: prevent crash during failed fork running function graph tracer ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f7e8b616
  16. 29 5月, 2009 1 次提交
    • P
      perf_counter: Ammend cleanup in fork() fail · bbbee908
      Peter Zijlstra 提交于
      When fork() fails we cannot use perf_counter_exit_task() since that
      assumes to operate on current. Write a new helper that cleans up
      unused/clean contexts.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbbee908
  17. 25 5月, 2009 2 次提交
  18. 22 5月, 2009 1 次提交
    • P
      perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34
      Paul Mackerras 提交于
      This replaces the struct perf_counter_context in the task_struct with
      a pointer to a dynamically allocated perf_counter_context struct.  The
      main reason for doing is this is to allow us to transfer a
      perf_counter_context from one task to another when we do lazy PMU
      switching in a later patch.
      
      This has a few side-benefits: the task_struct becomes a little smaller,
      we save some memory because only tasks that have perf_counters attached
      get a perf_counter_context allocated for them, and we can remove the
      inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
      end up recompiling nearly everything whenever perf_counter.h changes.
      
      The perf_counter_context structures are reference-counted and freed
      when the last reference is dropped.  A context can have references
      from its task and the counters on its task.  Counters can outlive the
      task so it is possible that a context will be freed well after its
      task has exited.
      
      Contexts are allocated on fork if the parent had a context, or
      otherwise the first time that a per-task counter is created on a task.
      In the latter case, we set the context pointer in the task struct
      locklessly using an atomic compare-and-exchange operation in case we
      raced with some other task in creating a context for the subject task.
      
      This also removes the task pointer from the perf_counter struct.  The
      task pointer was not used anywhere and would make it harder to move a
      context from one task to another.  Anything that needed to know which
      task a counter was attached to was already using counter->ctx->task.
      
      The __perf_counter_init_context function moves up in perf_counter.c
      so that it can be called from find_get_context, and now initializes
      the refcount, but is otherwise unchanged.
      
      We were potentially calling list_del_counter twice: once from
      __perf_counter_exit_task when the task exits and once from
      __perf_counter_remove_from_context when the counter's fd gets closed.
      This adds a check in list_del_counter so it doesn't do anything if
      the counter has already been removed from the lists.
      
      Since perf_counter_task_sched_in doesn't do anything if the task doesn't
      have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
      __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
      the case where the current task adds the first counter to itself and
      thus creates a context for itself.
      
      This also adds similar code to __perf_counter_enable to handle a
      similar situation which can arise when the counters have been disabled
      using prctl; that also leaves cpuctx->task_ctx = NULL.
      
      [ Impact: refactor counter context management to prepare for new feature ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a63eaf34
  19. 15 4月, 2009 2 次提交
    • S
      tracing/events: move trace point headers into include/trace/events · ad8d75ff
      Steven Rostedt 提交于
      Impact: clean up
      
      Create a sub directory in include/trace called events to keep the
      trace point headers in their own separate directory. Only headers that
      declare trace points should be defined in this directory.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad8d75ff
    • S
      tracing: create automated trace defines · a8d154b0
      Steven Rostedt 提交于
      This patch lowers the number of places a developer must modify to add
      new tracepoints. The current method to add a new tracepoint
      into an existing system is to write the trace point macro in the
      trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
      DECLARE_TRACE, then they must add the same named item into the C file
      with the macro DEFINE_TRACE(name) and then add the trace point.
      
      This change cuts out the needing to add the DEFINE_TRACE(name).
      Every file that uses the tracepoint must still include the trace/<type>.h
      file, but the one C file must also add a define before the including
      of that file.
      
       #define CREATE_TRACE_POINTS
       #include <trace/mytrace.h>
      
      This will cause the trace/mytrace.h file to also produce the C code
      necessary to implement the trace point.
      
      Note, if more than one trace/<type>.h is used to create the C code
      it is best to list them all together.
      
       #define CREATE_TRACE_POINTS
       #include <trace/foo.h>
       #include <trace/bar.h>
       #include <trace/fido.h>
      
      Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
      the cleaner solution of the define above the includes over my first
      design to have the C code include a "special" header.
      
      This patch converts sched, irq and lockdep and skb to use this new
      method.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8d154b0
  20. 08 4月, 2009 1 次提交
  21. 07 4月, 2009 1 次提交
  22. 03 4月, 2009 4 次提交
    • O
      pids: kill signal_struct-> __pgrp/__session and friends · 1b0f7ffd
      Oleg Nesterov 提交于
      We are wasting 2 words in signal_struct without any reason to implement
      task_pgrp_nr() and task_session_nr().
      
      task_session_nr() has no callers since
      2e2ba22e, we can remove it.
      
      task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
      fs/coda.
      
      This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
      __pgrp/__session and the related helpers.
      
      The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
      anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: Alan Cox <number6@the-village.bc.nu>		[tty parts]
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b0f7ffd
    • S
      signals: protect cinit from blocked fatal signals · b3bfa0cb
      Sukadev Bhattiprolu 提交于
      Normally SIG_DFL signals to global and container-init are dropped early.
      But if a signal is blocked when it is posted, we cannot drop the signal
      since the receiver may install a handler before unblocking the signal.
      Once this signal is queued however, the receiver container-init has no way
      of knowing if the signal was sent from an ancestor or descendant
      namespace.  This patch ensures that contianer-init drops all SIG_DFL
      signals in get_signal_to_deliver() except SIGKILL/SIGSTOP.
      
      If SIGSTOP/SIGKILL originate from a descendant of container-init they are
      never queued (i.e dropped in sig_ignored() in an earler patch).
      
      If SIGSTOP/SIGKILL originate from parent namespace, the signal is queued
      and container-init processes the signal.
      
      IOW, if get_signal_to_deliver() sees a sig_kernel_only() signal for global
      or container-init, the signal must have been generated internally or must
      have come from an ancestor ns and we process the signal.
      
      Further, the signal_group_exit() check was needed to cover the case of a
      multi-threaded init sending SIGKILL to other threads when doing an exit()
      or exec().  But since the new sig_kernel_only() check covers the SIGKILL,
      the signal_group_exit() check is no longer needed and can be removed.
      
      Finally, now that we have all pieces in place, set SIGNAL_UNKILLABLE for
      container-inits.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3bfa0cb
    • A
      Simplify copy_thread() · 6f2c55b8
      Alexey Dobriyan 提交于
      First argument unused since 2.3.11.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f2c55b8
    • D
      nommu: fix a number of issues with the per-MM VMA patch · 33e5d769
      David Howells 提交于
      Fix a number of issues with the per-MM VMA patch:
      
       (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
           a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
           system.
      
       (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
           lest it overflow.
      
       (3) Move the allocation of the vm_area_struct slab back for fork.c.
      
       (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
      
       (5) Use BUG_ON() rather than if () BUG().
      
       (6) Make the default validate_nommu_regions() a static inline rather than a
           #define.
      
       (7) Make free_page_series()'s objection to pages with a refcount != 1 more
           informative.
      
       (8) Adjust the __put_nommu_region() banner comment to indicate that the
           semaphore must be held for writing.
      
       (9) Limit the number of warnings about munmaps of non-mmapped regions.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e5d769
  23. 01 4月, 2009 3 次提交
    • A
      Get rid of indirect include of fs_struct.h · 5ad4e53b
      Al Viro 提交于
      Don't pull it in sched.h; very few files actually need it and those
      can include directly.  sched.h itself only needs forward declaration
      of struct fs_struct;
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ad4e53b
    • A
      New locking/refcounting for fs_struct · 498052bb
      Al Viro 提交于
      * all changes of current->fs are done under task_lock and write_lock of
        old fs->lock
      * refcount is not atomic anymore (same protection)
      * its decrements are done when removing reference from current; at the
        same time we decide whether to free it.
      * put_fs_struct() is gone
      * new field - ->in_exec.  Set by check_unsafe_exec() if we are trying to do
        execve() and only subthreads share fs_struct.  Cleared when finishing exec
        (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
      * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
        is in progress.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      498052bb
    • A
      Take fs_struct handling to new file (fs/fs_struct.c) · 3e93cd67
      Al Viro 提交于
      Pure code move; two new helper functions for nfsd and daemonize
      (unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
      the same code as used to be in callers).  unshare_fs_struct()
      exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
      copy_fs_struct() and exit_fs() don't need exports anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3e93cd67
  24. 30 3月, 2009 1 次提交
  25. 10 3月, 2009 1 次提交
  26. 11 2月, 2009 1 次提交
    • O
      ptrace, x86: fix the usage of ptrace_fork() · 06eb23b1
      Oleg Nesterov 提交于
      I noticed by pure accident we have ptrace_fork() and friends. This was
      added by "x86, bts: add fork and exit handling", commit
      bf53de90.
      
      I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
      believe this needs the fix. I think something like this program
      
      	int main(void)
      	{
      		int pid = fork();
      
      		if (!pid) {
      			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      			kill(getpid(), SIGSTOP);
      			fork();
      		} else {
      			struct ptrace_bts_config bts = {
      				.flags = PTRACE_BTS_O_ALLOC,
      				.size  = 4 * 4096,
      			};
      
      			wait(NULL);
      
      			ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
      			ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
      			ptrace(PTRACE_CONT, pid, NULL, NULL);
      
      			sleep(1);
      		}
      
      		return 0;
      	}
      
      should crash the kernel.
      
      If the task is traced by its natural parent ptrace_reparented() returns 0
      but we should clear ->btsxxx anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06eb23b1
  27. 09 2月, 2009 1 次提交
  28. 07 2月, 2009 1 次提交
  29. 05 2月, 2009 1 次提交
    • P
      signal: re-add dead task accumulation stats. · 32bd671d
      Peter Zijlstra 提交于
      We're going to split the process wide cpu accounting into two parts:
      
       - clocks; which can take all the time they want since they run
                 from user context.
      
       - timers; which need constant time tracing but can affort the overhead
                 because they're default off -- and rare.
      
      The clock readout will go back to a full sum of the thread group, for this
      we need to re-add the exit stats that were removed in the initial itimer
      rework (f06febc9: timers: fix itimer/many thread hang).
      
      Furthermore, since that full sum can be rather slow for large thread groups
      and we have the complete dead task stats, revert the do_notify_parent time
      computation.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32bd671d
  30. 17 1月, 2009 1 次提交
  31. 14 1月, 2009 2 次提交