1. 23 8月, 2009 1 次提交
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
  2. 08 8月, 2009 1 次提交
    • E
      execve: must clear current->clear_child_tid · 9c8a8228
      Eric Dumazet 提交于
      While looking at Jens Rosenboom bug report
      (http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
      a dying "ps" program, we found following problem.
      
      clone() syscall has special support for TID of created threads.  This
      support includes two features.
      
      One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
      TID value.
      
      One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
      thread dies.
      
      The integer location is a user provided pointer, provided at clone()
      time.
      
      kernel keeps this pointer value into current->clear_child_tid.
      
      At execve() time, we should make sure kernel doesnt keep this user
      provided pointer, as full user memory is replaced by a new one.
      
      As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
      CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
      memory in forked processes.
      
      Following sequence could happen:
      
      1) bash (or any program) starts a new process, by a fork() call that
         glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
         ...) syscall
      
      2) When new process starts, its current->clear_child_tid is set to a
         location that has a meaning only in bash (or initial program) context
         (&THREAD_SELF->tid)
      
      3) This new process does the execve() syscall to start a new program.
         current->clear_child_tid is left unchanged (a non NULL value)
      
      4) If this new program creates some threads, and initial thread exits,
         kernel will attempt to clear the integer pointed by
         current->clear_child_tid from mm_release() :
      
              if (tsk->clear_child_tid
                  && !(tsk->flags & PF_SIGNALED)
                  && atomic_read(&mm->mm_users) > 1) {
                      u32 __user * tidptr = tsk->clear_child_tid;
                      tsk->clear_child_tid = NULL;
      
                      /*
                       * We don't check the error code - if userspace has
                       * not set up a proper pointer then tough luck.
                       */
      << here >>      put_user(0, tidptr);
                      sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
              }
      
      5) OR : if new program is not multi-threaded, but spied by /proc/pid
         users (ps command for example), mm_users > 1, and the exiting program
         could corrupt 4 bytes in a persistent memory area (shm or memory mapped
         file)
      
      If current->clear_child_tid points to a writeable portion of memory of the
      new program, kernel happily and silently corrupts 4 bytes of memory, with
      unexpected effects.
      
      Fix is straightforward and should not break any sane program.
      Reported-by: NJens Rosenboom <jens@mcbone.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c8a8228
  3. 02 8月, 2009 1 次提交
    • P
      perf_counter: Full task tracing · 9f498cc5
      Peter Zijlstra 提交于
      In order to be able to distinguish between no samples due to
      inactivity and no samples due to task ended, Arjan asked for
      PERF_EVENT_EXIT events. This is useful to the boot delay
      instrumentation (bootchart) app.
      
      This patch changes the PERF_EVENT_FORK to be emitted on every
      clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
      after the task's counters have been closed.
      
      This task tracing is controlled through: attr.comm || attr.mmap
      and through the new attr.task field.
      Suggested-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ cleaned up perf_counter.h a bit ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f498cc5
  4. 30 7月, 2009 1 次提交
  5. 18 7月, 2009 1 次提交
  6. 09 7月, 2009 1 次提交
  7. 19 6月, 2009 1 次提交
  8. 15 6月, 2009 1 次提交
    • V
      kmemcheck: add mm functions · 2dff4405
      Vegard Nossum 提交于
      With kmemcheck enabled, the slab allocator needs to do this:
      
      1. Tell kmemcheck to allocate the shadow memory which stores the status of
         each byte in the allocation proper, e.g. whether it is initialized or
         uninitialized.
      2. Tell kmemcheck which parts of memory that should be marked uninitialized.
         There are actually a few more states, such as "not yet allocated" and
         "recently freed".
      
      If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
      memory that can take page faults because of kmemcheck.
      
      If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
      request memory with the __GFP_NOTRACK flag. This does not prevent the page
      faults from occuring, however, but marks the object in question as being
      initialized so that no warnings will ever be produced for this object.
      
      In addition to (and in contrast to) __GFP_NOTRACK, the
      __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should
      not be tracked _because_ it would produce a false positive. Their values
      are identical, but need not be so in the future (for example, we could now
      enable/disable false positives with a config option).
      
      Parts of this patch were contributed by Pekka Enberg but merged for
      atomicity.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      2dff4405
  9. 05 6月, 2009 1 次提交
    • O
      ptrace: tracehook_report_clone: fix false positives · 087eb437
      Oleg Nesterov 提交于
      The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
      
      - If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
        we must not queue SIGSTOP.
      
      - If we forked the traced task, but the tracer exits and untraces both the
        forking task and the new child (after copy_process() drops tasklist_lock),
        we should not queue SIGSTOP too.
      
      Change the code to check task_ptrace() != 0 instead. This is still racy, but
      the race is harmless.
      
      We can race with another tracer attaching to this child, or the tracer can
      exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
      yet, the child must have the pending SIGSTOP anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      087eb437
  10. 04 6月, 2009 1 次提交
  11. 03 6月, 2009 2 次提交
    • P
      perf_counter: Add a comm hook for pure fork()s · 226f62fd
      Peter Zijlstra 提交于
      I noticed missing COMM events and found that we missed
      reporting them for pure forks.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      226f62fd
    • S
      function-graph: move initialization of new tasks up in fork · f7e8b616
      Steven Rostedt 提交于
      When the function graph tracer is enabled, all new tasks must allocate
      a ret_stack to place the return address of functions. This is because
      the function graph tracer will replace the real return address with a
      call to the tracing of the exit function.
      
      This initialization happens in fork, but it happens too late. If fork
      fails, then it will call free_task and that calls the freeing of this
      ret_stack. But before initialization happens, the new (failed) task
      points to its parents ret_stack. If a fork failure happens during
      the function trace, it would be catastrophic for the parent.
      
      Also, there's no need to call ftrace_graph_exit_task from fork, since
      it is called by free_task which fork calls on failure.
      
      [ Impact: prevent crash during failed fork running function graph tracer ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f7e8b616
  12. 29 5月, 2009 1 次提交
    • P
      perf_counter: Ammend cleanup in fork() fail · bbbee908
      Peter Zijlstra 提交于
      When fork() fails we cannot use perf_counter_exit_task() since that
      assumes to operate on current. Write a new helper that cleans up
      unused/clean contexts.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbbee908
  13. 25 5月, 2009 2 次提交
  14. 22 5月, 2009 1 次提交
    • P
      perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34
      Paul Mackerras 提交于
      This replaces the struct perf_counter_context in the task_struct with
      a pointer to a dynamically allocated perf_counter_context struct.  The
      main reason for doing is this is to allow us to transfer a
      perf_counter_context from one task to another when we do lazy PMU
      switching in a later patch.
      
      This has a few side-benefits: the task_struct becomes a little smaller,
      we save some memory because only tasks that have perf_counters attached
      get a perf_counter_context allocated for them, and we can remove the
      inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
      end up recompiling nearly everything whenever perf_counter.h changes.
      
      The perf_counter_context structures are reference-counted and freed
      when the last reference is dropped.  A context can have references
      from its task and the counters on its task.  Counters can outlive the
      task so it is possible that a context will be freed well after its
      task has exited.
      
      Contexts are allocated on fork if the parent had a context, or
      otherwise the first time that a per-task counter is created on a task.
      In the latter case, we set the context pointer in the task struct
      locklessly using an atomic compare-and-exchange operation in case we
      raced with some other task in creating a context for the subject task.
      
      This also removes the task pointer from the perf_counter struct.  The
      task pointer was not used anywhere and would make it harder to move a
      context from one task to another.  Anything that needed to know which
      task a counter was attached to was already using counter->ctx->task.
      
      The __perf_counter_init_context function moves up in perf_counter.c
      so that it can be called from find_get_context, and now initializes
      the refcount, but is otherwise unchanged.
      
      We were potentially calling list_del_counter twice: once from
      __perf_counter_exit_task when the task exits and once from
      __perf_counter_remove_from_context when the counter's fd gets closed.
      This adds a check in list_del_counter so it doesn't do anything if
      the counter has already been removed from the lists.
      
      Since perf_counter_task_sched_in doesn't do anything if the task doesn't
      have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
      __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
      the case where the current task adds the first counter to itself and
      thus creates a context for itself.
      
      This also adds similar code to __perf_counter_enable to handle a
      similar situation which can arise when the counters have been disabled
      using prctl; that also leaves cpuctx->task_ctx = NULL.
      
      [ Impact: refactor counter context management to prepare for new feature ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a63eaf34
  15. 15 4月, 2009 2 次提交
    • S
      tracing/events: move trace point headers into include/trace/events · ad8d75ff
      Steven Rostedt 提交于
      Impact: clean up
      
      Create a sub directory in include/trace called events to keep the
      trace point headers in their own separate directory. Only headers that
      declare trace points should be defined in this directory.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad8d75ff
    • S
      tracing: create automated trace defines · a8d154b0
      Steven Rostedt 提交于
      This patch lowers the number of places a developer must modify to add
      new tracepoints. The current method to add a new tracepoint
      into an existing system is to write the trace point macro in the
      trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
      DECLARE_TRACE, then they must add the same named item into the C file
      with the macro DEFINE_TRACE(name) and then add the trace point.
      
      This change cuts out the needing to add the DEFINE_TRACE(name).
      Every file that uses the tracepoint must still include the trace/<type>.h
      file, but the one C file must also add a define before the including
      of that file.
      
       #define CREATE_TRACE_POINTS
       #include <trace/mytrace.h>
      
      This will cause the trace/mytrace.h file to also produce the C code
      necessary to implement the trace point.
      
      Note, if more than one trace/<type>.h is used to create the C code
      it is best to list them all together.
      
       #define CREATE_TRACE_POINTS
       #include <trace/foo.h>
       #include <trace/bar.h>
       #include <trace/fido.h>
      
      Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
      the cleaner solution of the define above the includes over my first
      design to have the C code include a "special" header.
      
      This patch converts sched, irq and lockdep and skb to use this new
      method.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8d154b0
  16. 08 4月, 2009 1 次提交
  17. 07 4月, 2009 1 次提交
  18. 03 4月, 2009 4 次提交
    • O
      pids: kill signal_struct-> __pgrp/__session and friends · 1b0f7ffd
      Oleg Nesterov 提交于
      We are wasting 2 words in signal_struct without any reason to implement
      task_pgrp_nr() and task_session_nr().
      
      task_session_nr() has no callers since
      2e2ba22e, we can remove it.
      
      task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
      fs/coda.
      
      This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
      __pgrp/__session and the related helpers.
      
      The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
      anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: Alan Cox <number6@the-village.bc.nu>		[tty parts]
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b0f7ffd
    • S
      signals: protect cinit from blocked fatal signals · b3bfa0cb
      Sukadev Bhattiprolu 提交于
      Normally SIG_DFL signals to global and container-init are dropped early.
      But if a signal is blocked when it is posted, we cannot drop the signal
      since the receiver may install a handler before unblocking the signal.
      Once this signal is queued however, the receiver container-init has no way
      of knowing if the signal was sent from an ancestor or descendant
      namespace.  This patch ensures that contianer-init drops all SIG_DFL
      signals in get_signal_to_deliver() except SIGKILL/SIGSTOP.
      
      If SIGSTOP/SIGKILL originate from a descendant of container-init they are
      never queued (i.e dropped in sig_ignored() in an earler patch).
      
      If SIGSTOP/SIGKILL originate from parent namespace, the signal is queued
      and container-init processes the signal.
      
      IOW, if get_signal_to_deliver() sees a sig_kernel_only() signal for global
      or container-init, the signal must have been generated internally or must
      have come from an ancestor ns and we process the signal.
      
      Further, the signal_group_exit() check was needed to cover the case of a
      multi-threaded init sending SIGKILL to other threads when doing an exit()
      or exec().  But since the new sig_kernel_only() check covers the SIGKILL,
      the signal_group_exit() check is no longer needed and can be removed.
      
      Finally, now that we have all pieces in place, set SIGNAL_UNKILLABLE for
      container-inits.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3bfa0cb
    • A
      Simplify copy_thread() · 6f2c55b8
      Alexey Dobriyan 提交于
      First argument unused since 2.3.11.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f2c55b8
    • D
      nommu: fix a number of issues with the per-MM VMA patch · 33e5d769
      David Howells 提交于
      Fix a number of issues with the per-MM VMA patch:
      
       (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
           a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
           system.
      
       (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
           lest it overflow.
      
       (3) Move the allocation of the vm_area_struct slab back for fork.c.
      
       (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
      
       (5) Use BUG_ON() rather than if () BUG().
      
       (6) Make the default validate_nommu_regions() a static inline rather than a
           #define.
      
       (7) Make free_page_series()'s objection to pages with a refcount != 1 more
           informative.
      
       (8) Adjust the __put_nommu_region() banner comment to indicate that the
           semaphore must be held for writing.
      
       (9) Limit the number of warnings about munmaps of non-mmapped regions.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e5d769
  19. 01 4月, 2009 3 次提交
    • A
      Get rid of indirect include of fs_struct.h · 5ad4e53b
      Al Viro 提交于
      Don't pull it in sched.h; very few files actually need it and those
      can include directly.  sched.h itself only needs forward declaration
      of struct fs_struct;
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ad4e53b
    • A
      New locking/refcounting for fs_struct · 498052bb
      Al Viro 提交于
      * all changes of current->fs are done under task_lock and write_lock of
        old fs->lock
      * refcount is not atomic anymore (same protection)
      * its decrements are done when removing reference from current; at the
        same time we decide whether to free it.
      * put_fs_struct() is gone
      * new field - ->in_exec.  Set by check_unsafe_exec() if we are trying to do
        execve() and only subthreads share fs_struct.  Cleared when finishing exec
        (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
      * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
        is in progress.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      498052bb
    • A
      Take fs_struct handling to new file (fs/fs_struct.c) · 3e93cd67
      Al Viro 提交于
      Pure code move; two new helper functions for nfsd and daemonize
      (unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
      the same code as used to be in callers).  unshare_fs_struct()
      exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
      copy_fs_struct() and exit_fs() don't need exports anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3e93cd67
  20. 30 3月, 2009 1 次提交
  21. 10 3月, 2009 1 次提交
  22. 11 2月, 2009 1 次提交
    • O
      ptrace, x86: fix the usage of ptrace_fork() · 06eb23b1
      Oleg Nesterov 提交于
      I noticed by pure accident we have ptrace_fork() and friends. This was
      added by "x86, bts: add fork and exit handling", commit
      bf53de90.
      
      I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
      believe this needs the fix. I think something like this program
      
      	int main(void)
      	{
      		int pid = fork();
      
      		if (!pid) {
      			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      			kill(getpid(), SIGSTOP);
      			fork();
      		} else {
      			struct ptrace_bts_config bts = {
      				.flags = PTRACE_BTS_O_ALLOC,
      				.size  = 4 * 4096,
      			};
      
      			wait(NULL);
      
      			ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
      			ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
      			ptrace(PTRACE_CONT, pid, NULL, NULL);
      
      			sleep(1);
      		}
      
      		return 0;
      	}
      
      should crash the kernel.
      
      If the task is traced by its natural parent ptrace_reparented() returns 0
      but we should clear ->btsxxx anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06eb23b1
  23. 09 2月, 2009 1 次提交
  24. 07 2月, 2009 1 次提交
  25. 05 2月, 2009 1 次提交
    • P
      signal: re-add dead task accumulation stats. · 32bd671d
      Peter Zijlstra 提交于
      We're going to split the process wide cpu accounting into two parts:
      
       - clocks; which can take all the time they want since they run
                 from user context.
      
       - timers; which need constant time tracing but can affort the overhead
                 because they're default off -- and rare.
      
      The clock readout will go back to a full sum of the thread group, for this
      we need to re-add the exit stats that were removed in the initial itimer
      rework (f06febc9: timers: fix itimer/many thread hang).
      
      Furthermore, since that full sum can be rather slow for large thread groups
      and we have the complete dead task stats, revert the do_notify_parent time
      computation.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32bd671d
  26. 17 1月, 2009 1 次提交
  27. 14 1月, 2009 2 次提交
  28. 11 1月, 2009 1 次提交
  29. 09 1月, 2009 1 次提交
    • E
      pid: generalize task_active_pid_ns · 61bce0f1
      Eric W. Biederman 提交于
      Currently task_active_pid_ns is not safe to call after a task becomes a
      zombie and exit_task_namespaces is called, as nsproxy becomes NULL.  By
      reading the pid namespace from the pid of the task we can trivially solve
      this problem at the cost of one extra memory read in what should be the
      same cacheline as we read the namespace from.
      
      When moving things around I have made task_active_pid_ns out of line
      because keeping it in pid_namespace.h would require adding includes of
      pid.h and sched.h that I don't think we want.
      
      This change does make task_active_pid_ns unsafe to call during
      copy_process until we attach a pid on the task_struct which seems to be a
      reasonable trade off.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Bastian Blank <bastian@waldi.eu.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61bce0f1
  30. 08 1月, 2009 2 次提交
    • D
      NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131
      David Howells 提交于
      Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:
      
       (1) In SYSV SHM where nattch for a segment does not reflect the number of
           shmat's (and forks) done.
      
       (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
           exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
           that a VMA might be shared and already have its vm_mm assigned to another
           process or a dead process.
      
      A new struct (vm_region) is introduced to track a mapped region and to remember
      the circumstances under which it may be shared and the vm_list_struct structure
      is discarded as it's no longer required.
      
      This patch makes the following additional changes:
      
       (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
           with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
           each page has a reference on it held by the region.  Anything else that is
           interested in such a page will have to get a reference on it to retain it.
           When the pages are released due to unmapping, each page is passed to
           put_page() and will be freed when the page usage count reaches zero.
      
       (2) Excess pages are trimmed after an allocation as the allocation must be
           made as a power-of-2 quantity of pages.
      
       (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
           end up with overlapping VMAs within the tree, the VMA struct address is
           appended to the sort key.
      
       (4) Non-anonymous VMAs are now added to the backing inode's prio list.
      
       (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
           the backing region.  The VMA and region structs will be split if
           necessary.
      
       (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
           segment instead of all the attachments at that addresss.  Multiple
           shmat()'s return the same address under NOMMU-mode instead of different
           virtual addresses as under MMU-mode.
      
       (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
      
       (8) /proc/maps is now the global list of mapped regions, and may list bits
           that aren't actually mapped anywhere.
      
       (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
           of RAM currently allocated by mmap to hold mappable regions that can't be
           mapped directly.  These are copies of the backing device or file if not
           anonymous.
      
      These changes make NOMMU mode more similar to MMU mode.  The downside is that
      NOMMU mode requires some extra memory to track things over NOMMU without this
      patch (VMAs are no longer shared, and there are now region structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      8feae131
    • P
      itimers: remove the per-cpu-ish-ness · 490dea45
      Peter Zijlstra 提交于
      Either we bounce once cacheline per cpu per tick, yielding n^2 bounces
      or we just bounce a single..
      
      Also, using per-cpu allocations for the thread-groups complicates the
      per-cpu allocator in that its currently aimed to be a fixed sized
      allocator and the only possible extention to that would be vmap based,
      which is seriously constrained on 32 bit archs.
      
      So making the per-cpu memory requirement depend on the number of
      processes is an issue.
      
      Lastly, it didn't deal with cpu-hotplug, although admittedly that might
      be fixable.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      490dea45