1. 10 7月, 2009 1 次提交
  2. 03 7月, 2009 1 次提交
  3. 01 7月, 2009 3 次提交
  4. 29 6月, 2009 1 次提交
  5. 27 6月, 2009 2 次提交
  6. 26 6月, 2009 1 次提交
  7. 25 6月, 2009 2 次提交
    • P
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt 提交于
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1155de47
    • L
      ftrace: Remove duplicate newline · 00e54d08
      Li Zefan 提交于
      Before:
        # echo 'sys_open:traceon:' > set_ftrace_filter
        # echo 'sys_close:traceoff:5' > set_ftrace_filter
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
      
        sys_close:traceoff:count=0
      
      After:
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
        sys_close:traceoff:count=0
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A4313A7.7030105@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00e54d08
  8. 24 6月, 2009 8 次提交
  9. 23 6月, 2009 1 次提交
  10. 20 6月, 2009 4 次提交
    • P
      perf_counter: Push perf_sample_data through the swcounter code · 92bf309a
      Peter Zijlstra 提交于
      Push the perf_sample_data further outwards to the swcounter interface,
      to abstract it away some more.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92bf309a
    • F
      tracing/urgent: warn in case of ftrace_start_up inbalance · 9ea1a153
      Frederic Weisbecker 提交于
      Prevent from further ftrace_start_up inbalances so that we avoid
      future nop patching omissions with dynamic ftrace.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      9ea1a153
    • F
      tracing/urgent: fix unbalanced ftrace_start_up · c85a17e2
      Frederic Weisbecker 提交于
      Perfcounter reports the following stats for a wide system
      profiling:
      
       #
       # (2364 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
          15.40%  [k] mwait_idle_with_hints
           8.29%  [k] read_hpet
           5.75%  [k] ftrace_caller
           3.60%  [k] ftrace_call
           [...]
      
      This snapshot has been taken while neither the function tracer nor
      the function graph tracer was running.
      With dynamic ftrace, such results show a wrong ftrace behaviour
      because all calls to ftrace_caller or ftrace_graph_caller (the patched
      calls to mcount) are supposed to be patched into nop if none of those
      tracers are running.
      
      The problem occurs after the first run of the function tracer. Once we
      launch it a second time, the callsites will never be nopped back,
      unless you set custom filters.
      For example it happens during the self tests at boot time.
      The function tracer selftest runs, and then the dynamic tracing is
      tested too. After that, the callsites are left un-nopped.
      
      This is because the reset callback of the function tracer tries to
      unregister two ftrace callbacks in once: the common function tracer
      and the function tracer with stack backtrace, regardless of which
      one is currently in use.
      It then creates an unbalance on ftrace_start_up value which is expected
      to be zero when the last ftrace callback is unregistered. When it
      reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
      command, triggering the patching into nop. But since it becomes
      unbalanced, ie becomes lower than zero, if the kernel functions
      are patched again (as in every further function tracer runs), they
      won't ever be nopped back.
      
      Note that ftrace_call and ftrace_graph_call are still patched back
      to ftrace_stub in the off case, but not the callers of ftrace_call
      and ftrace_graph_caller. It means that the tracing is well deactivated
      but we waste a useless call into every kernel function.
      
      This patch just unregisters the right ftrace_ops for the function
      tracer on its reset callback and ignores the other one which is
      not registered, fixing the unbalance. The problem also happens
      is .30
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      c85a17e2
    • O
      ptrace: wait_task_zombie: do not account traced sub-threads · befca967
      Oleg Nesterov 提交于
      The bug is ancient.
      
      If we trace the sub-thread of our natural child and this sub-thread exits,
      we update parent->signal->cxxx fields.  But we should not do this until
      the whole thread-group exits, otherwise we account this thread (and all
      other live threads) twice.
      
      Add the task_detached() check.  No need to check thread_group_empty(),
      wait_consider_task()->delay_group_leader() already did this.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      befca967
  11. 19 6月, 2009 16 次提交
    • P
      perf_counter: Close race in perf_lock_task_context() · b49a9e7e
      Peter Zijlstra 提交于
      perf_lock_task_context() is buggy because it can return a dead
      context.
      
      the RCU read lock in perf_lock_task_context() only guarantees
      the memory won't get freed, it doesn't guarantee the object is
      valid (in our case refcount > 0).
      
      Therefore we can return a locked object that can get freed the
      moment we release the rcu read lock.
      
      perf_pin_task_context() then increases the refcount and does an
      unlock on freed memory.
      
      That increased refcount will cause a double free, in case it
      started out with 0.
      
      Ammend this by including the get_ctx() functionality in
      perf_lock_task_context() (all users already did this later
      anyway), and return a NULL context when the found one is
      already dead.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b49a9e7e
    • P
      perf_counter: Simplify and fix task migration counting · e5289d4a
      Peter Zijlstra 提交于
      The task migrations counter was causing rare and hard to decypher
      memory corruptions under load. After a day of debugging and bisection
      we found that the problem was introduced with:
      
        3f731ca6: perf_counter: Fix cpu migration counter
      
      Turning them off fixes the crashes. Incidentally, the whole
      perf_counter_task_migration() logic can be done simpler as well,
      by injecting a proper sw-counter event.
      
      This cleanup also fixed the crashes. The precise failure mode is
      not completely clear yet, but we are clearly not unhappy about
      having a fix ;-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5289d4a
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • S
      function-graph: disable when both x86_32 and optimize for size are configured · eb4a0378
      Steven Rostedt 提交于
      On x86_32, when optimize for size is set, gcc may align the frame pointer
      and make a copy of the the return address inside the stack frame.
      The return address that is located in the stack frame may not be
      the one used to return to the calling function. This will break the
      function graph tracer.
      
      The function graph tracer replaces the return address with a jump to a hook
      function that can trace the exit of the function. If it only replaces
      a copy, then the hook will not be called when the function returns.
      Worse yet, when the parent function returns, the function graph tracer
      will return back to the location of the child function which will
      easily crash the kernel with weird results.
      
      To see the problem, when i386 is compiled with -Os we get:
      
      c106be03:       57                      push   %edi
      c106be04:       8d 7c 24 08             lea    0x8(%esp),%edi
      c106be08:       83 e4 e0                and    $0xffffffe0,%esp
      c106be0b:       ff 77 fc                pushl  0xfffffffc(%edi)
      c106be0e:       55                      push   %ebp
      c106be0f:       89 e5                   mov    %esp,%ebp
      c106be11:       57                      push   %edi
      c106be12:       56                      push   %esi
      c106be13:       53                      push   %ebx
      c106be14:       81 ec 8c 00 00 00       sub    $0x8c,%esp
      c106be1a:       e8 f5 57 fb ff          call   c1021614 <mcount>
      
      When it is compiled with -O2 instead we get:
      
      c10896f0:       55                      push   %ebp
      c10896f1:       89 e5                   mov    %esp,%ebp
      c10896f3:       83 ec 28                sub    $0x28,%esp
      c10896f6:       89 5d f4                mov    %ebx,0xfffffff4(%ebp)
      c10896f9:       89 75 f8                mov    %esi,0xfffffff8(%ebp)
      c10896fc:       89 7d fc                mov    %edi,0xfffffffc(%ebp)
      c10896ff:       e8 d0 08 fa ff          call   c1029fd4 <mcount>
      
      The compile with -Os will align the stack pointer then set up the
      frame pointer (%ebp), and it copies the return address back into
      the stack frame. The change to the return address in mcount is done
      to the copy and not the real place holder of the return address.
      
      Then compile with -O2 sets up the frame pointer first, this makes
      the change to the return address by mcount affect where the function
      will jump on exit.
      Reported-by: NJake Edge <jake@lwn.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb4a0378
    • P
      gcov: enable GCOV_PROFILE_ALL for x86_64 · 7bf99fb6
      Peter Oberparleiter 提交于
      Enable gcov profiling of the entire kernel on x86_64. Required changes
      include disabling profiling for:
      
      * arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
        not linked to main kernel
      * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
        profiling causes segfaults during boot (incompatible context)
      Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf99fb6
    • P
      gcov: add gcov profiling infrastructure · 2521f2c2
      Peter Oberparleiter 提交于
      Enable the use of GCC's coverage testing tool gcov [1] with the Linux
      kernel.  gcov may be useful for:
      
       * debugging (has this code been reached at all?)
       * test improvement (how do I change my test to cover these lines?)
       * minimizing kernel configurations (do I need this option if the
         associated code is never run?)
      
      The profiling patch incorporates the following changes:
      
       * change kbuild to include profiling flags
       * provide functions needed by profiling code
       * present profiling data as files in debugfs
      
      Note that on some architectures, enabling gcc's profiling option
      "-fprofile-arcs" for the entire kernel may trigger compile/link/
      run-time problems, some of which are caused by toolchain bugs and
      others which require adjustment of architecture code.
      
      For this reason profiling the entire kernel is initially restricted
      to those architectures for which it is known to work without changes.
      This restriction can be lifted once an architecture has been tested
      and found compatible with gcc's profiling. Profiling of single files
      or directories is still available on all platforms (see config help
      text).
      
      [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.htmlSigned-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2521f2c2
    • P
      kernel: constructor support · b99b87f7
      Peter Oberparleiter 提交于
      Call constructors (gcc-generated initcall-like functions) during kernel
      start and module load.  Constructors are e.g.  used for gcov data
      initialization.
      
      Disable constructor support for usermode Linux to prevent conflicts with
      host glibc.
      Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b99b87f7
    • A
      nsproxy: extract create_nsproxy() · 90af90d7
      Alexey Dobriyan 提交于
      clone_nsproxy() does useless copying of old nsproxy -- every pointer will
      be rewritten to new ns or to old ns.  Remove copying, rename
      clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh
      nsproxy on restart.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90af90d7
    • A
      utsns: extract creeate_uts_ns() · 4c2a7e72
      Alexey Dobriyan 提交于
      create_uts_ns() will be used by C/R to create fresh uts_ns.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c2a7e72
    • A
      pidns: rewrite copy_pid_ns() · dca4a979
      Alexey Dobriyan 提交于
      copy_pid_ns() is a perfect example of a case where unwinding leads to more
      code and makes it less clear.  Watch the diffstat.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Reviewed-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dca4a979
    • A
      pidns: make create_pid_namespace() accept parent pidns · ed469a63
      Alexey Dobriyan 提交于
      create_pid_namespace() creates everything, but caller has to assign parent
      pidns by hand, which is unnatural.  At the moment of call new ->level has
      to be taken from somewhere and parent pidns is already available.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed469a63
    • C
      pids: clean up find_task_by_pid variants · 17f98dcf
      Christoph Hellwig 提交于
      find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
      find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
      So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
      find_task_by_pid_ns to implement find_task_by_vpid.
      
      While we're at it also remove the exports for find_task_by_pid_ns and
      find_task_by_vpid - we don't have any modular callers left as the only
      modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
      switched to pid_task which operates on a struct pid pointer instead of a
      pid_t.  Given the confusion about pid_t values vs namespace that's
      generally the better option anyway and I think we're better of restricting
      modules to do it that way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17f98dcf
    • S
      sysctl.c: remove unused variable · 7338f299
      Sukanto Ghosh 提交于
      Remoce the unused variable 'val' from __do_proc_dointvec()
      
      The integer has been declared and used as 'val = -val' and there is no
      reference to it anywhere.
      Signed-off-by: NSukanto Ghosh <sukanto.cse.iitb@gmail.com>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Sukanto Ghosh <sukanto.cse.iitb@gmail.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7338f299
    • O
      kthreads: simplify migration_thread() exit path · 371cbb38
      Oleg Nesterov 提交于
      Now that kthread_stop() can be used even if the task has already exited,
      we can kill the "wait_to_die:" loop in migration_thread().  But we must
      pin rq->migration_thread after creation.
      
      Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
      ->migration_thread exit.  Perhaps we can simplify this code a bit more.
      migration_call() can set ->should_stop and forget about this thread.  But
      we need a new helper in kthred.c for that.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      371cbb38
    • O
      kthreads: rework kthread_stop() · 63706172
      Oleg Nesterov 提交于
      Based on Eric's patch which in turn was based on my patch.
      
      kthread_stop() has the nasty problems:
      
      - it runs unpredictably long with the global semaphore held.
      
      - it deadlocks if kthread itself does kthread_stop() before it obeys
        the kthread_should_stop() request.
      
      - it is not useable if kthread exits on its own, see for example the
        ugly "wait_to_die:" hack in migration_thread()
      
      - it is not possible to just tell kthread it should stop, we must always
        wait for its exit.
      
      With this patch kthread() allocates all neccesary data (struct kthread) on
      its own stack, globals kthread_stop_xxx are deleted.  ->vfork_done is used
      as a pointer into "struct kthread", this means kthread_stop() can easily
      wait for kthread's exit.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63706172
    • O
      kthreads: simplify the startup synchronization · cdd140bd
      Oleg Nesterov 提交于
      We use two completions two create the kernel thread, this is a bit ugly.
      kthread() wakes up create_kthread() via ->started, then create_kthread()
      wakes up the caller kthread_create() via ->done.  But kthread() does not
      need to wait for kthread(), it can just return.  Instead kthread() itself
      can wake up the caller of kthread_create().
      
      Kill kthread_create_info->started, ->done is enough.  This improves the
      scalability a bit and sijmplifies the code.
      
      The only problem if kernel_thread() fails, in that case create_kthread()
      must do complete(&create->done).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cdd140bd