1. 14 10月, 2008 1 次提交
    • M
      tracing, sched: LTTng instrumentation - scheduler · 0a16b607
      Mathieu Desnoyers 提交于
      Instrument the scheduler activity (sched_switch, migration, wakeups,
      wait for a task, signal delivery) and process/thread
      creation/destruction (fork, exit, kthread stop). Actually, kthread
      creation is not instrumented in this patch because it is architecture
      dependent. It allows to connect tracers such as ftrace which detects
      scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
      export this scheduler information along with instrumentation of the rest
      of the kernel activity to perform post-mortem analysis on the scheduler
      activity.
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added. See the "Tracepoints" patch header for
      performance result detail.
      
      Changelog :
      
      - Change instrumentation location and parameter to match ftrace
        instrumentation, previously done with kernel markers.
      
      [ mingo@elte.hu: conflict resolutions ]
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a16b607
  2. 29 9月, 2008 1 次提交
    • B
      mm owner: fix race between swapoff and exit · 31a78f23
      Balbir Singh 提交于
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a78f23
  3. 06 9月, 2008 1 次提交
    • B
      sched: fix process time monotonicity · 49048622
      Balbir Singh 提交于
      Spencer reported a problem where utime and stime were going negative despite
      the fixes in commit b27f03d4. The suspected
      reason for the problem is that signal_struct maintains it's own utime and
      stime (of exited tasks), these are not updated using the new task_utime()
      routine, hence sig->utime can go backwards and cause the same problem
      to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
      fixes the problem
      
      TODO: using max(task->prev_utime, derived utime) works for now, but a more
      generic solution is to implement cputime_max() and use the cputime_gt()
      function for comparison.
      
      Reported-by: spencer@bluehost.com
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49048622
  4. 03 9月, 2008 1 次提交
  5. 27 8月, 2008 1 次提交
  6. 02 8月, 2008 1 次提交
  7. 28 7月, 2008 1 次提交
    • A
      task IO accounting: improve code readability · 5995477a
      Andrea Righi 提交于
      Put all i/o statistics in struct proc_io_accounting and use inline functions to
      initialize and increment statistics, removing a lot of single variable
      assignments.
      
      This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
      CONFIG_TASK_IO_ACCOUNTING=y).
      
          text    data     bss     dec     hex filename
         11651       0       0   11651    2d83 kernel/exit.o.before
         11619       0       0   11619    2d63 kernel/exit.o.after
         10886     132     136   11154    2b92 kernel/fork.o.before
         10758     132     136   11026    2b12 kernel/fork.o.after
      
       3082029  807968 4818600 8708597  84e1f5 vmlinux.o.before
       3081869  807968 4818600 8708437  84e155 vmlinux.o.after
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5995477a
  8. 27 7月, 2008 4 次提交
  9. 26 7月, 2008 8 次提交
  10. 17 7月, 2008 4 次提交
    • R
      fix dangling zombie when new parent ignores children · 666f164f
      Roland McGrath 提交于
      This fixes an arcane bug that we think was a regression introduced
      by commit b2b2cbc4.  When a parent
      ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap
      but they don't if it's using ptrace on them.  When the parent thread
      later exits and ceases to ptrace a child but leaves other live
      threads in the parent's thread group, any zombie children are left
      dangling.  The fix makes them self-reap then, as they would have
      done earlier if ptrace had not been in use.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      666f164f
    • R
      do_wait: return security_task_wait() error code in place of -ECHILD · 14dd0b81
      Roland McGrath 提交于
      This reverts the effect of commit f2cc3eb1
      "do_wait: fix security checks".  That change reverted the effect of commit
      73243284.  The rationale for the original
      commit still stands.  The inconsistent treatment of children hidden by
      ptrace was an unintended omission in the original change and in no way
      invalidates its purpose.
      
      This makes do_wait return the error returned by security_task_wait()
      (usually -EACCES) in place of -ECHILD when there are some children the
      caller would be able to wait for if not for the permission failure.  A
      permission error will give the user a clue to look for security policy
      problems, rather than for mysterious wait bugs.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      14dd0b81
    • R
      ptrace children revamp · f470021a
      Roland McGrath 提交于
      ptrace no longer fiddles with the children/sibling links, and the
      old ptrace_children list is gone.  Now ptrace, whether of one's own
      children or another's via PTRACE_ATTACH, just uses the new ptraced
      list instead.
      
      There should be no user-visible difference that matters.  The only
      change is the order in which do_wait() sees multiple stopped
      children and stopped ptrace attachees.  Since wait_task_stopped()
      was changed earlier so it no longer reorders the children list, we
      already know this won't cause any new problems.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      f470021a
    • R
      do_wait reorganization · 98abed02
      Roland McGrath 提交于
      This breaks out the guts of do_wait into three subfunctions.
      The control flow is less nonobvious without so much goto.
      do_wait_thread and ptrace_do_wait contain the main work of the outer loop.
      wait_consider_task contains the main work of the inner loop.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      98abed02
  11. 03 7月, 2008 1 次提交
  12. 25 5月, 2008 1 次提交
  13. 02 5月, 2008 1 次提交
  14. 30 4月, 2008 6 次提交
  15. 29 4月, 2008 1 次提交
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
  16. 28 4月, 2008 1 次提交
    • L
      mempolicy: rename mpol_free to mpol_put · f0be3d32
      Lee Schermerhorn 提交于
      This is a change that was requested some time ago by Mel Gorman.  Makes sense
      to me, so here it is.
      
      Note: I retain the name "mpol_free_shared_policy()" because it actually does
      free the shared_policy, which is NOT a reference counted object.  However, ...
      
      The mempolicy object[s] referenced by the shared_policy are reference counted,
      so mpol_put() is used to release the reference held by the shared_policy.  The
      mempolicy might not be freed at this time, because some task attached to the
      shared object associated with the shared policy may be in the process of
      allocating a page based on the mempolicy.  In that case, the task performing
      the allocation will hold a reference on the mempolicy, obtained via
      mpol_shared_policy_lookup().  The mempolicy will be freed when all tasks
      holding such a reference have called mpol_put() for the mempolicy.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0be3d32
  17. 25 4月, 2008 2 次提交
    • A
      [PATCH] sanitize unshare_files/reset_files_struct · 3b125388
      Al Viro 提交于
      * let unshare_files() give caller the displaced files_struct
      * don't bother with grabbing reference only to drop it in the
        caller if it hadn't been shared in the first place
      * in that form unshare_files() is trivially implemented via
        unshare_fd(), so we eliminate the duplicate logics in fork.c
      * reset_files_struct() is not just only called for current;
        it will break the system if somebody ever calls it for anything
        else (we can't modify ->files of somebody else).  Lose the
        task_struct * argument.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3b125388
    • A
      [PATCH] sanitize handling of shared descriptor tables in failing execve() · fd8328be
      Al Viro 提交于
      * unshare_files() can fail; doing it after irreversible actions is wrong
        and de_thread() is certainly irreversible.
      * since we do it unconditionally anyway, we might as well do it in do_execve()
        and save ourselves the PITA in binfmt handlers, etc.
      * while we are at it, binfmt_som actually leaked files_struct on failure.
      
      As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
      become unexported.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd8328be
  18. 23 4月, 2008 1 次提交
  19. 11 4月, 2008 1 次提交
    • R
      asmlinkage_protect replaces prevent_tail_call · 54a01510
      Roland McGrath 提交于
      The prevent_tail_call() macro works around the problem of the compiler
      clobbering argument words on the stack, which for asmlinkage functions
      is the caller's (user's) struct pt_regs.  The tail/sibling-call
      optimization is not the only way that the compiler can decide to use
      stack argument words as scratch space, which we have to prevent.
      Other optimizations can do it too.
      
      Until we have new compiler support to make "asmlinkage" binding on the
      compiler's own use of the stack argument frame, we have work around all
      the manifestations of this issue that crop up.
      
      More cases seem to be prevented by also keeping the incoming argument
      variables live at the end of the function.  This makes their original
      stack slots attractive places to leave those variables, so the compiler
      tends not clobber them for something else.  It's still no guarantee, but
      it handles some observed cases that prevent_tail_call() did not.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a01510
  20. 09 3月, 2008 1 次提交
  21. 04 3月, 2008 1 次提交
    • O
      exit_notify: fix kill_orphaned_pgrp() usage with mt exit · 821c7de7
      Oleg Nesterov 提交于
      1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we
         should do this only when the whole process exits.
      
      2. exit_notify() uses "current" as "ignored_task", obviously wrong.
         Use ->group_leader instead.
      
      Test case:
      
      	void hup(int sig)
      	{
      		printf("HUP received\n");
      	}
      
      	void *tfunc(void *arg)
      	{
      		sleep(2);
      		printf("sub-thread exited\n");
      		return NULL;
      	}
      
      	int main(int argc, char *argv[])
      	{
      		if (!fork()) {
      			signal(SIGHUP, hup);
      			kill(getpid(), SIGSTOP);
      			exit(0);
      		}
      
      		pthread_t thr;
      		pthread_create(&thr, NULL, tfunc, NULL);
      
      		sleep(1);
      		printf("main thread exited\n");
      		syscall(__NR_exit, 0);
      
      		return 0;
      	}
      
      output:
      
      	main thread exited
      	HUP received
      	Hangup
      
      With this patch the output is:
      
      	main thread exited
      	sub-thread exited
      	HUP received
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      821c7de7