1. 15 11月, 2012 1 次提交
    • Y
      ACPI / processor: prevent cpu from becoming online · 5e5041f3
      Yasuaki Ishimatsu 提交于
      Even if acpi_processor_handle_eject() offlines cpu, there is a chance
      to online the cpu after that. So the patch closes the window by using
      get/put_online_cpus().
      
      Why does the patch change _cpu_up() logic?
      
      The patch cares the race of hot-remove cpu and _cpu_up(). If the patch
      does not change it, there is the following race.
      
      hot-remove cpu                         |  _cpu_up()
      ------------------------------------- ------------------------------------
      call acpi_processor_handle_eject()     |
           call cpu_down()                   |
           call get_online_cpus()            |
                                             | call cpu_hotplug_begin() and stop here
           call arch_unregister_cpu()        |
           call acpi_unmap_lsapic()          |
           call put_online_cpus()            |
                                             | start and continue _cpu_up()
           return acpi_processor_remove()    |
      continue hot-remove the cpu            |
      
      So _cpu_up() can continue to itself. And hot-remove cpu can also continue
      itself. If the patch changes _cpu_up() logic, the race disappears as below:
      
      hot-remove cpu                         | _cpu_up()
      -----------------------------------------------------------------------
      call acpi_processor_handle_eject()     |
           call cpu_down()                   |
           call get_online_cpus()            |
                                             | call cpu_hotplug_begin() and stop here
           call arch_unregister_cpu()        |
           call acpi_unmap_lsapic()          |
                cpu's cpu_present is set     |
                to false by set_cpu_present()|
           call put_online_cpus()            |
                                             | start _cpu_up()
                                             | check cpu_present() and return -EINVAL
           return acpi_processor_remove()    |
      continue hot-remove the cpu            |
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5e5041f3
  2. 09 10月, 2012 1 次提交
  3. 23 8月, 2012 1 次提交
    • R
      x86/smp: Don't ever patch back to UP if we unplug cpus · 816afe4f
      Rusty Russell 提交于
      We still patch SMP instructions to UP variants if we boot with a
      single CPU, but not at any other time.  In particular, not if we
      unplug CPUs to return to a single cpu.
      
      Paul McKenney points out:
      
       mean offline overhead is 6251/48=130.2 milliseconds.
      
       If I remove the alternatives_smp_switch() from the offline
       path [...] the mean offline overhead is 550/42=13.1 milliseconds
      
      Basically, we're never going to get those 120ms back, and the
      code is pretty messy.
      
      We get rid of:
      
       1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
          documentation is wrong. It's now the default.
      
       2) The skip_smp_alternatives flag used by suspend.
      
       3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
          which were only used to set this one flag.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Paul McKenney <paul.mckenney@us.ibm.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
      816afe4f
  4. 13 8月, 2012 1 次提交
  5. 01 8月, 2012 1 次提交
    • J
      mm/hotplug: correctly setup fallback zonelists when creating new pgdat · 9adb62a5
      Jiang Liu 提交于
      When hotadd_new_pgdat() is called to create new pgdat for a new node, a
      fallback zonelist should be created for the new node.  There's code to try
      to achieve that in hotadd_new_pgdat() as below:
      
      	/*
      	 * The node we allocated has no zone fallback lists. For avoiding
      	 * to access not-initialized zonelist, build here.
      	 */
      	mutex_lock(&zonelists_mutex);
      	build_all_zonelists(pgdat, NULL);
      	mutex_unlock(&zonelists_mutex);
      
      But it doesn't work as expected.  When hotadd_new_pgdat() is called, the
      new node is still in offline state because node_set_online(nid) hasn't
      been called yet.  And build_all_zonelists() only builds zonelists for
      online nodes as:
      
              for_each_online_node(nid) {
                      pg_data_t *pgdat = NODE_DATA(nid);
      
                      build_zonelists(pgdat);
                      build_zonelist_cache(pgdat);
              }
      
      Though we hope to create zonelist for the new pgdat, but it doesn't.  So
      add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
      the new pgdat too.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9adb62a5
  6. 01 6月, 2012 2 次提交
    • A
      kernel/cpu.c: document clear_tasks_mm_cpumask() · e4cc2f87
      Anton Vorontsov 提交于
      Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
      the function is only suitable for offlined CPUs, and if called
      inappropriately, the kernel should scream aloud.
      
      [akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4cc2f87
    • A
      cpu: introduce clear_tasks_mm_cpumask() helper · cb79295e
      Anton Vorontsov 提交于
      Many architectures clear tasks' mm_cpumask like this:
      
      	read_lock(&tasklist_lock);
      	for_each_process(p) {
      		if (p->mm)
      			cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
      	}
      	read_unlock(&tasklist_lock);
      
      Depending on the context, the code above may have several problems,
      such as:
      
      1. Working with task->mm w/o getting mm or grabing the task lock is
         dangerous as ->mm might disappear (exit_mm() assigns NULL under
         task_lock(), so tasklist lock is not enough).
      
      2. Checking for process->mm is not enough because process' main
         thread may exit or detach its mm via use_mm(), but other threads
         may still have a valid mm.
      
      This patch implements a small helper function that does things
      correctly, i.e.:
      
      1. We take the task's lock while whe handle its mm (we can't use
         get_task_mm()/mmput() pair as mmput() might sleep);
      
      2. To catch exited main thread case, we use find_lock_task_mm(),
         which walks up all threads and returns an appropriate task
         (with task lock held).
      
      Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
      the new helper, instead we take the rcu read lock. We can do this
      because the function is called after the cpu is taken down and marked
      offline, so no new tasks will get this cpu set in their mm mask.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb79295e
  7. 04 5月, 2012 1 次提交
  8. 26 4月, 2012 3 次提交
    • T
      smp: Provide generic idle thread allocation · 29d5e047
      Thomas Gleixner 提交于
      All SMP architectures have magic to fork the idle task and to store it
      for reusage when cpu hotplug is enabled. Provide a generic
      infrastructure for it.
      
      Create/reinit the idle thread for the cpu which is brought up in the
      generic code and hand the thread pointer to the architecture code via
      __cpu_up().
      
      Note, that fork_idle() is called via a workqueue, because this
      guarantees that the idle thread does not get a reference to a user
      space VM. This can happen when the boot process did not bring up all
      possible cpus and a later cpu_up() is initiated via the sysfs
      interface. In that case fork_idle() would be called in the context of
      the user space task and take a reference on the user space VM.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Acked-by: NVenkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de
      29d5e047
    • T
      smp: Add generic smpboot facility · 38498a67
      Thomas Gleixner 提交于
      Start a new file, which will hold SMP and CPU hotplug related generic
      infrastructure.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20120420124557.035417523@linutronix.de
      38498a67
    • T
      smp: Add task_struct argument to __cpu_up() · 8239c25f
      Thomas Gleixner 提交于
      Preparatory patch to make the idle thread allocation for secondary
      cpus generic.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
      8239c25f
  9. 15 12月, 2011 1 次提交
  10. 13 12月, 2011 1 次提交
  11. 24 11月, 2011 1 次提交
  12. 05 11月, 2011 1 次提交
    • S
      PM / Sleep: Fix race between CPU hotplug and freezer · 79cfbdfa
      Srivatsa S. Bhat 提交于
      The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
      functions depend on the value of the 'tasks_frozen' argument passed to them
      (which indicates whether tasks have been frozen or not).
      (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
      CPU_DEAD, CPU_DEAD_FROZEN).
      
      Thus, it is essential that while the callbacks for those notifications are
      running, the state of the system with respect to the tasks being frozen or
      not remains unchanged, *throughout that duration*. Hence there is a need for
      synchronizing the CPU hotplug code with the freezer subsystem.
      
      Since the freezer is involved only in the Suspend/Hibernate call paths, this
      patch hooks the CPU hotplug code to the suspend/hibernate notifiers
      PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
      the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
      notifications will always be run with the state of the system really being
      what the notifications indicate, _throughout_ their execution time.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      79cfbdfa
  13. 31 10月, 2011 1 次提交
  14. 31 3月, 2011 1 次提交
  15. 23 3月, 2011 1 次提交
  16. 14 12月, 2010 1 次提交
  17. 23 11月, 2010 2 次提交
  18. 18 11月, 2010 1 次提交
    • P
      sched: Simplify cpu-hot-unplug task migration · 48c5ccae
      Peter Zijlstra 提交于
      While discussing the need for sched_idle_next(), Oleg remarked that
      since try_to_wake_up() ensures sleeping tasks will end up running on a
      sane cpu, we can do away with migrate_live_tasks().
      
      If we then extend the existing hack of migrating current from
      CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
      need for the sched_idle_next() abomination disappears as well, since
      idle will be the only possible thread left after the migration thread
      stops.
      
      This greatly simplifies the hot-unplug task migration path, as can be
      seen from the resulting code reduction (and about half the new lines
      are comments).
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1289851597.2109.547.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48c5ccae
  19. 09 6月, 2010 1 次提交
    • T
      sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining · 3a101d05
      Tejun Heo 提交于
      Currently, when a cpu goes down, cpu_active is cleared before
      CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
      default priority cpu notifier.  When a cpu is coming up, it's set
      before CPU_ONLINE but cpuset configuration again is updated from the
      same cpu notifier.
      
      For cpu notifiers, this presents an inconsistent state.  Threads which
      a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
      migrated to other cpus because the cpu is no more inactive.
      
      Fix it by updating cpu_active in the highest priority cpu notifier and
      cpuset configuration in the second highest when a cpu is coming up.
      Down path is updated similarly.  This guarantees that all other cpu
      notifiers see consistent cpu_active and cpuset configuration.
      
      cpuset_track_online_cpus() notifier is converted to
      cpuset_update_active_cpus() which just updates the configuration and
      now called from cpuset_cpu_[in]active() notifiers registered from
      sched_init_smp().  If cpuset is disabled, cpuset_update_active_cpus()
      degenerates into partition_sched_domains() making separate notifier
      for !CONFIG_CPUSETS unnecessary.
      
      This problem is triggered by cmwq.  During CPU_DOWN_PREPARE, hotplug
      callback creates a kthread and kthread_bind()s it to the target cpu,
      and the thread is expected to run on that cpu.
      
      * Ingo's test discovered __cpuinit/exit markups were incorrect.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Menage <menage@google.com>
      3a101d05
  20. 02 6月, 2010 1 次提交
  21. 31 5月, 2010 1 次提交
  22. 28 5月, 2010 4 次提交
  23. 25 5月, 2010 3 次提交
    • H
      mem-hotplug: fix potential race while building zonelist for new populated zone · 4eaf3f64
      Haicheng Li 提交于
      Add global mutex zonelists_mutex to fix the possible race:
      
           CPU0                                  CPU1                    CPU2
      (1) zone->present_pages += online_pages;
      (2)                                       build_all_zonelists();
      (3)                                                               alloc_page();
      (4)                                                               free_page();
      (5) build_all_zonelists();
      (6)   __build_all_zonelists();
      (7)     zone->pageset = alloc_percpu();
      
      In step (3,4), zone->pageset still points to boot_pageset, so bad
      things may happen if 2+ nodes are in this state. Even if only 1 node
      is accessing the boot_pageset, (3) may still consume too much memory
      to fail the memory allocations in step (7).
      
      Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
      since there is a new fresh memory block added in step(6).
      
      [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eaf3f64
    • H
      mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset · 1f522509
      Haicheng Li 提交于
      For each new populated zone of hotadded node, need to update its pagesets
      with dynamically allocated per_cpu_pageset struct for all possible CPUs:
      
          1) Detach zone->pageset from the shared boot_pageset
             at end of __build_all_zonelists().
      
          2) Use mutex to protect zone->pageset when it's still
             shared in onlined_pages()
      
      Otherwises, multiple zones of different nodes would share same boot strapping
      boot_pageset for same CPU, which will finally cause below kernel panic:
      
        ------------[ cut here ]------------
        kernel BUG at mm/page_alloc.c:1239!
        invalid opcode: 0000 [#1] SMP
        ...
        Call Trace:
         [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
         [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
         [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
         [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
         [<ffffffff81132751>] ra_submit+0x21/0x30
         [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
         [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
         [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
         [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
         [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
         [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
         [<ffffffff8117c151>] sys_read+0x51/0x80
         [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
        RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
         RSP <ffff88000d1e78a8>
        ---[ end trace 4bda28328b9990db ]
      
      [akpm@linux-foundation.org: merge fix]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f522509
    • M
      cpu/mem hotplug: enable CPUs online before local memory online · cf23422b
      minskey guo 提交于
      Enable users to online CPUs even if the CPUs belongs to a numa node which
      doesn't have onlined local memory.
      
      The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
      in system boot/init period, or at the time of local memory online.  For a
      numa node without onlined local memory, its zonelists are not initialized
      at present.  As a result, any memory allocation operations executed by
      CPUs within this node will fail.  In fact, an out-of-memory error is
      triggered when attempt to online CPUs before memory comes to online.
      
      This patch tries to create zonelists for such numa nodes, so that the
      memory allocation for this node can be fallback'ed to other nodes.
      
      [akpm@linux-foundation.org: remove unneeded export]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: minskey guo<chaohong.guo@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf23422b
  24. 07 5月, 2010 1 次提交
    • T
      stop_machine: reimplement using cpu_stop · 3fc1f1e2
      Tejun Heo 提交于
      Reimplement stop_machine using cpu_stop.  As cpu stoppers are
      guaranteed to be available for all online cpus,
      stop_machine_create/destroy() are no longer necessary and removed.
      
      With resource management and synchronization handled by cpu_stop, the
      new implementation is much simpler.  Asking the cpu_stop to execute
      the stop_cpu() state machine on all online cpus with cpu hotplug
      disabled is enough.
      
      stop_machine itself doesn't need to manage any global resources
      anymore, so all per-instance information is rolled into struct
      stop_machine_data and the mutex and all static data variables are
      removed.
      
      The previous implementation created and destroyed RT workqueues as
      necessary which made stop_machine() calls highly expensive on very
      large machines.  According to Dimitri Sivanich, preventing the dynamic
      creation/destruction makes booting faster more than twice on very
      large machines.  cpu_stop resources are preallocated for all online
      cpus and should have the same effect.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      3fc1f1e2
  25. 03 4月, 2010 1 次提交
  26. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  27. 07 3月, 2010 1 次提交
  28. 28 1月, 2010 2 次提交
  29. 17 12月, 2009 1 次提交
  30. 07 12月, 2009 1 次提交
    • P
      sched: Fix balance vs hotplug race · 6ad4c188
      Peter Zijlstra 提交于
      Since (e761b772: cpu hotplug, sched: Introduce cpu_active_map and redo
      sched domain managment) we have cpu_active_mask which is suppose to rule
      scheduler migration and load-balancing, except it never (fully) did.
      
      The particular problem being solved here is a crash in try_to_wake_up()
      where select_task_rq() ends up selecting an offline cpu because
      select_task_rq_fair() trusts the sched_domain tree to reflect the
      current state of affairs, similarly select_task_rq_rt() trusts the
      root_domain.
      
      However, the sched_domains are updated from CPU_DEAD, which is after the
      cpu is taken offline and after stop_machine is done. Therefore it can
      race perfectly well with code assuming the domains are right.
      
      Cure this by building the domains from cpu_active_mask on
      CPU_DOWN_PREPARE.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ad4c188