1. 06 5月, 2008 1 次提交
    • P
      sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · 3e51f33f
      Peter Zijlstra 提交于
      this replaces the rq->clock stuff (and possibly cpu_clock()).
      
       - architectures that have an 'imperfect' hardware clock can set
         CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      
       - the 'jiffie' window might be superfulous when we update tick_gtod
         before the __update_sched_clock() call in sched_clock_tick()
      
       - cpu_clock() might be implemented as:
      
           sched_clock_cpu(smp_processor_id())
      
         if the accuracy proves good enough - how far can TSC drift in a
         single jiffie when considering the filtering and idle hooks?
      
      [ mingo@elte.hu: various fixes and cleanups ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e51f33f
  2. 30 4月, 2008 3 次提交
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
    • P
      Deprecate find_task_by_pid() · 5cd20455
      Pavel Emelyanov 提交于
      There are some places that are known to operate on tasks'
      global pids only:
      
      * the rest_init() call (called on boot)
      * the kgdb's getthread
      * the create_kthread() (since the kthread is run in init ns)
      
      So use the find_task_by_pid_ns(..., &init_pid_ns) there
      and schedule the find_task_by_pid for removal.
      
      [sukadev@us.ibm.com: Fix warning in kernel/pid.c]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cd20455
    • O
      signals: fix /sbin/init protection from unwanted signals · fae5fa44
      Oleg Nesterov 提交于
      The global init has a lot of long standing problems with the unhandled fatal
      signals.
      
      	- The "is_global_init(current)" check in get_signal_to_deliver()
      	  protects only the main thread. Sub-thread can dequee the fatal
      	  signal and shutdown the whole thread group except the main thread.
      	  If it dequeues SIGSTOP /sbin/init will be stopped, this is not
      	  right too. Note that we can't use is_global_init(->group_leader),
      	  this breaks exec and this can't solve other problems we have.
      
      	- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
      	  on delivery. This breaks exec, has other bad implications, and this
      	  is just wrong.
      
      Introduce the new SIGNAL_UNKILLABLE flag to fix these problems.  It also helps
      to solve some other problems addressed by the subsequent patches.
      
      Currently we use this flag for the global init only, but it could also be used
      by kthreads and (perhaps) by the sub-namespace inits.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fae5fa44
  3. 29 4月, 2008 3 次提交
    • A
      idr: create idr_layer_cache at boot time · 199f0ca5
      Akinobu Mita 提交于
      Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
      unconditionary at boot time rather than creating it on-demand when idr_init()
      is called the first time.
      
      This change also enables us to eliminate the check every time idr_init() is
      called.
      
      [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
      [akpm@linux-foundation.org: fix alpha build]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      199f0ca5
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
    • B
      Simplify initcall_debug output · 626adeb6
      Bjorn Helgaas 提交于
      print_fn_descriptor_symbol() prints the address if we don't have a symbol, so
      no need to print both.
      
      Also, combine printing return value with elapsed time.  Changes this:
      
        Calling initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50()
        initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() returned 1.
        initcall 0xc05b7a70 ran for 0 msecs: pci_mmcfg_late_insert_resources+0x0/0x50()
        initcall at 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50(): returned with error code 1
      
      to this:
      
        calling  pci_mmcfg_late_insert_resources+0x0/0x50()
        initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned 1 after 0 msecs
        initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned with error code 1
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      626adeb6
  4. 24 4月, 2008 2 次提交
  5. 20 4月, 2008 2 次提交
  6. 16 3月, 2008 1 次提交
    • L
      ACPI: Remove ACPI_CUSTOM_DSDT_INITRD option · 9a9e0d68
      Linus Torvalds 提交于
      This essentially reverts commit 71fc47a9
      ("ACPI: basic initramfs DSDT override support"), because the code simply
      isn't ready.
      
      It did ugly things to the init sequence to populate the rootfs image
      early, but that just ended up showing other problems with the whole
      approach.  The fact is, the VFS layer simply isn't initialized this
      early, and the relevant ACPI code should either run much later, or this
      shouldn't be done at all.
      
      For 2.6.25, we'll just pick the latter option.  We can revisit this
      concept later if necessary.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Tilman Schmidt <tilman@imap.cc>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Eric Piel <eric.piel@tremplin-utc.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Markus Gaugusch <dsdt@gaugusch.at>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a9e0d68
  7. 05 3月, 2008 1 次提交
    • A
      Fix "Malformed early option 'loglevel'" · d9d4fcfe
      Alex Riesen 提交于
      Keith Mannthey said:
      
        The parameter hotadd_percent is setup right but there is a "Malformed
        early option 'numa'" message.
      
      Rusty Russell said:
      
        This happens when the function registered with early_param() returns
        non-zero.  __setup() functions return 1 if OK, module_param() and
        early_param() return 0 or a -ve error code.
      
      For instance:
      
      Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26
      BIOS-provided physical RAM map:
       BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
       BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
       BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
       BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
       BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
       BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
      Malformed early option 'loglevel'
      127MB HIGHMEM available.
      896MB LOWMEM available.
      
      Command line:
      
      BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5
      Acked-by: NYinghai Lu <yhlu.kernel@gmai.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Keith Mannthey <kmannth@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9d4fcfe
  8. 10 2月, 2008 1 次提交
  9. 09 2月, 2008 3 次提交
  10. 07 2月, 2008 2 次提交
  11. 30 1月, 2008 4 次提交
  12. 26 1月, 2008 1 次提交
    • G
      cpu-hotplug: refcount based cpu hotplug · d221938c
      Gautham R Shenoy 提交于
      This patch implements a Refcount + Waitqueue based model for
      cpu-hotplug.
      
      Now, a thread which wants to prevent cpu-hotplug, will bump up a global
      refcount and the thread which wants to perform a cpu-hotplug operation
      will block till the global refcount goes to zero.
      
      The readers, if any, during an ongoing cpu-hotplug operation are blocked
      until the cpu-hotplug operation is over.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d221938c
  13. 10 11月, 2007 1 次提交
  14. 20 10月, 2007 3 次提交
    • S
      spelling fixes: init/ · 211fee8a
      Simon Arlott 提交于
      Spelling fix in init/.
      Signed-off-by: NSimon Arlott <simon@fire.lp0.eu>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      211fee8a
    • R
      Drop the superfluous test for an old version of gcc. · d8af7c6a
      Robert P. J. Day 提交于
      The header file <linux/compiler.h> already enforces a suitably recent
      version of gcc, so there's no point checking for that again.
      Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      d8af7c6a
    • P
      Task Control Groups: basic task cgroup framework · ddbcc7e8
      Paul Menage 提交于
      Generic Process Control Groups
      --------------------------
      
      There have recently been various proposals floating around for
      resource management/accounting and other task grouping subsystems in
      the kernel, including ResGroups, User BeanCounters, NSProxy
      cgroups, and others.  These all need the basic abstraction of being
      able to group together multiple processes in an aggregate, in order to
      track/limit the resources permitted to those processes, or control
      other behaviour of the processes, and all implement this grouping in
      different ways.
      
      This patchset provides a framework for tracking and grouping processes
      into arbitrary "cgroups" and assigning arbitrary state to those
      groupings, in order to control the behaviour of the cgroup as an
      aggregate.
      
      The intention is that the various resource management and
      virtualization/cgroup efforts can also become task cgroup
      clients, with the result that:
      
      - the userspace APIs are (somewhat) normalised
      
      - it's easier to test e.g. the ResGroups CPU controller in
       conjunction with the BeanCounters memory controller, or use either of
      them as the resource-control portion of a virtual server system.
      
      - the additional kernel footprint of any of the competing resource
       management systems is substantially reduced, since it doesn't need
       to provide process grouping/containment, hence improving their
       chances of getting into the kernel
      
      This patch:
      
      Add the main task cgroups framework - the cgroup filesystem, and the
      basic structures for tracking membership and associating subsystem state
      objects to tasks.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddbcc7e8
  15. 31 8月, 2007 1 次提交
    • H
      fix maxcpus=1 oops in show_stat() · 62e6f1e8
      Hugh Dickins 提交于
      Alexey Dobriyan reports that maxcpus=1 is still broken in 2.6.23-rc4:
      if CONFIG_HOTPLUG_CPU is not set, x86_64 bootup oopses in show_stat() -
      for_each_possible_cpu accesses a per-cpu area which was never set up.
      
      Alexey identified commit 61ec7567
      (ACPI: boot correctly with "nosmp" or "maxcpus=0") as the origin;
      but it's not really to blame, just exposes a bug in 2.6.23-rc1's commit
      8b3b2955 (Especially when !CONFIG_HOTPLUG_CPU,
      avoid needlessy allocating resources for CPUs that can never become available).
      
      rc1's test for max_cpus < 2 in start_kernel() wasn't working because
      max_cpus was still NR_CPUS at that point: until rc4 moved the maxcpus
      parsing earlier.  Now it sets cpu_possible_map to 1 before allocating
      all possible per-cpu areas; then smp_init() expands cpu_possible_map
      to cpu_present_map (0xf in my case) later on.
      
      rc1's commit has good intentions, but expects cpu_present_map to be
      limited by maxcpus, which is only the case on i386.  cpus_and(possible,
      possible,present) might be good, but needs an audit of cpu_present_map
      uses - there may well be assumptions that any cpu present is possible.
      
      So stay safe for now and just revert those #ifndef CONFIG_HOTPLUG_CPU
      optimizations in rc1's commit.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62e6f1e8
  16. 28 8月, 2007 1 次提交
    • H
      fix maxcpus=N parsing · 81340977
      Hugh Dickins 提交于
      Commit 61ec7567 ('ACPI: boot correctly
      with "nosmp" or "maxcpus=0"') broke 'maxcpus=' handling on x86[-64].
      
      maxcpus=N is now having no effect on x86_64, and freezing bootup on i386
      (because of inconsistency with the separate maxcpus parsing down in
      arch/i386, I guess).  That's because early_param parsing is a little
      different from __setup parsing, and needs the "=" omitted: then it seems
      to work as the original commit intended (no mention of IO-APIC in
      /proc/interrupts when maxcpus=0).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81340977
  17. 21 8月, 2007 1 次提交
    • L
      ACPI: boot correctly with "nosmp" or "maxcpus=0" · 61ec7567
      Len Brown 提交于
      In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
      However, in ACPI mode, these parameters didn't completely disable
      the IO APIC initialization code and boot failed.
      
      init/main.c:
      	Disable the IO_APIC if "nosmp" or "maxcpus=0"
      	undefine disable_ioapic_setup() when it doesn't apply.
      
      i386:
      	delete ioapic_setup(), it was a duplicate of parse_noapic()
      	delete undefinition of disable_ioapic_setup()
      
      x86_64:
      	rename disable_ioapic_setup() to parse_noapic() to match i386
      	define disable_ioapic_setup() in header to match i386
      
      http://bugzilla.kernel.org/show_bug.cgi?id=1641Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      61ec7567
  18. 17 7月, 2007 3 次提交
    • J
      adjust nosmp handling · 8b3b2955
      Jan Beulich 提交于
      Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for
      CPUs that can never become available.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b3b2955
    • D
      Allow softlockup to be runtime disabled · 97842216
      Dave Jones 提交于
      It's useful sometimes to disable the softlockup checker at boottime.
      Especially if it triggers during a distro install.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97842216
    • Y
      serial: convert early_uart to earlycon for 8250 · 18a8bd94
      Yinghai Lu 提交于
      Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
      include/asm-x86_64/serial.h.  the serial8250_ports need to be probed late in
      serial initializing stage.  the console_init=>serial8250_console_init=>
      register_console=>serial8250_console_setup will return -ENDEV, and console
      ttyS0 can not be enabled at that time.  need to wait till uart_add_one_port in
      drivers/serial/serial_core.c to call register_console to get console ttyS0.
      that is too late.
      
      Make early_uart to use early_param, so uart console can be used earlier.  Make
      it to be bootconsole with CON_BOOT flag, so can use console handover feature.
      and it will switch to corresponding normal serial console automatically.
      
      new command line will be:
      	console=uart8250,io,0x3f8,9600n8
      	console=uart8250,mmio,0xff5e0000,115200n8
      or
      	earlycon=uart8250,io,0x3f8,9600n8
      	earlycon=uart8250,mmio,0xff5e0000,115200n8
      
      it will print in very early stage:
      	Early serial console at I/O port 0x3f8 (options '9600n8')
      	console [uart0] enabled
      later for console it will print:
      	console handover: boot [uart0] -> real [ttyS0]
      
      Signed-off-by: <yinghai.lu@sun.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18a8bd94
  19. 10 7月, 2007 1 次提交
  20. 19 5月, 2007 1 次提交
  21. 11 5月, 2007 1 次提交
  22. 10 5月, 2007 1 次提交
    • E
      kthread: don't depend on work queues · 73c27992
      Eric W. Biederman 提交于
      Currently there is a circular reference between work queue initialization
      and kthread initialization.  This prevents the kthread infrastructure from
      initializing until after work queues have been initialized.
      
      We want the properties of tasks created with kthread_create to be as close
      as possible to the init_task and to not be contaminated by user processes.
      The later we start our kthreadd that creates these tasks the harder it is
      to avoid contamination from user processes and the more of a mess we have
      to clean up because the defaults have changed on us.
      
      So this patch modifies the kthread support to not use work queues but to
      instead use a simple list of structures, and to have kthreadd start from
      init_task immediately after our kernel thread that execs /sbin/init.
      
      By being a true child of init_task we only have to change those process
      settings that we want to have different from init_task, such as our process
      name, the cpus that are allowed, blocking all signals and setting SIGCHLD
      to SIG_IGN so that all of our children are reaped automatically.
      
      By being a true child of init_task we also naturally get our ppid set to 0
      and do not wind up as a child of PID == 1.  Ensuring that tasks generated
      by kthread_create will not slow down the functioning of the wait family of
      functions.
      
      [akpm@linux-foundation.org: use interruptible sleeps]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73c27992
  23. 09 5月, 2007 2 次提交