1. 14 12月, 2010 1 次提交
  2. 23 11月, 2010 2 次提交
  3. 18 11月, 2010 1 次提交
    • P
      sched: Simplify cpu-hot-unplug task migration · 48c5ccae
      Peter Zijlstra 提交于
      While discussing the need for sched_idle_next(), Oleg remarked that
      since try_to_wake_up() ensures sleeping tasks will end up running on a
      sane cpu, we can do away with migrate_live_tasks().
      
      If we then extend the existing hack of migrating current from
      CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
      need for the sched_idle_next() abomination disappears as well, since
      idle will be the only possible thread left after the migration thread
      stops.
      
      This greatly simplifies the hot-unplug task migration path, as can be
      seen from the resulting code reduction (and about half the new lines
      are comments).
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1289851597.2109.547.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48c5ccae
  4. 09 6月, 2010 1 次提交
    • T
      sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining · 3a101d05
      Tejun Heo 提交于
      Currently, when a cpu goes down, cpu_active is cleared before
      CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
      default priority cpu notifier.  When a cpu is coming up, it's set
      before CPU_ONLINE but cpuset configuration again is updated from the
      same cpu notifier.
      
      For cpu notifiers, this presents an inconsistent state.  Threads which
      a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
      migrated to other cpus because the cpu is no more inactive.
      
      Fix it by updating cpu_active in the highest priority cpu notifier and
      cpuset configuration in the second highest when a cpu is coming up.
      Down path is updated similarly.  This guarantees that all other cpu
      notifiers see consistent cpu_active and cpuset configuration.
      
      cpuset_track_online_cpus() notifier is converted to
      cpuset_update_active_cpus() which just updates the configuration and
      now called from cpuset_cpu_[in]active() notifiers registered from
      sched_init_smp().  If cpuset is disabled, cpuset_update_active_cpus()
      degenerates into partition_sched_domains() making separate notifier
      for !CONFIG_CPUSETS unnecessary.
      
      This problem is triggered by cmwq.  During CPU_DOWN_PREPARE, hotplug
      callback creates a kthread and kthread_bind()s it to the target cpu,
      and the thread is expected to run on that cpu.
      
      * Ingo's test discovered __cpuinit/exit markups were incorrect.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Menage <menage@google.com>
      3a101d05
  5. 02 6月, 2010 1 次提交
  6. 31 5月, 2010 1 次提交
  7. 28 5月, 2010 4 次提交
  8. 25 5月, 2010 3 次提交
    • H
      mem-hotplug: fix potential race while building zonelist for new populated zone · 4eaf3f64
      Haicheng Li 提交于
      Add global mutex zonelists_mutex to fix the possible race:
      
           CPU0                                  CPU1                    CPU2
      (1) zone->present_pages += online_pages;
      (2)                                       build_all_zonelists();
      (3)                                                               alloc_page();
      (4)                                                               free_page();
      (5) build_all_zonelists();
      (6)   __build_all_zonelists();
      (7)     zone->pageset = alloc_percpu();
      
      In step (3,4), zone->pageset still points to boot_pageset, so bad
      things may happen if 2+ nodes are in this state. Even if only 1 node
      is accessing the boot_pageset, (3) may still consume too much memory
      to fail the memory allocations in step (7).
      
      Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
      since there is a new fresh memory block added in step(6).
      
      [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eaf3f64
    • H
      mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset · 1f522509
      Haicheng Li 提交于
      For each new populated zone of hotadded node, need to update its pagesets
      with dynamically allocated per_cpu_pageset struct for all possible CPUs:
      
          1) Detach zone->pageset from the shared boot_pageset
             at end of __build_all_zonelists().
      
          2) Use mutex to protect zone->pageset when it's still
             shared in onlined_pages()
      
      Otherwises, multiple zones of different nodes would share same boot strapping
      boot_pageset for same CPU, which will finally cause below kernel panic:
      
        ------------[ cut here ]------------
        kernel BUG at mm/page_alloc.c:1239!
        invalid opcode: 0000 [#1] SMP
        ...
        Call Trace:
         [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
         [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
         [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
         [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
         [<ffffffff81132751>] ra_submit+0x21/0x30
         [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
         [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
         [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
         [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
         [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
         [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
         [<ffffffff8117c151>] sys_read+0x51/0x80
         [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
        RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
         RSP <ffff88000d1e78a8>
        ---[ end trace 4bda28328b9990db ]
      
      [akpm@linux-foundation.org: merge fix]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f522509
    • M
      cpu/mem hotplug: enable CPUs online before local memory online · cf23422b
      minskey guo 提交于
      Enable users to online CPUs even if the CPUs belongs to a numa node which
      doesn't have onlined local memory.
      
      The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
      in system boot/init period, or at the time of local memory online.  For a
      numa node without onlined local memory, its zonelists are not initialized
      at present.  As a result, any memory allocation operations executed by
      CPUs within this node will fail.  In fact, an out-of-memory error is
      triggered when attempt to online CPUs before memory comes to online.
      
      This patch tries to create zonelists for such numa nodes, so that the
      memory allocation for this node can be fallback'ed to other nodes.
      
      [akpm@linux-foundation.org: remove unneeded export]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: minskey guo<chaohong.guo@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf23422b
  9. 07 5月, 2010 1 次提交
    • T
      stop_machine: reimplement using cpu_stop · 3fc1f1e2
      Tejun Heo 提交于
      Reimplement stop_machine using cpu_stop.  As cpu stoppers are
      guaranteed to be available for all online cpus,
      stop_machine_create/destroy() are no longer necessary and removed.
      
      With resource management and synchronization handled by cpu_stop, the
      new implementation is much simpler.  Asking the cpu_stop to execute
      the stop_cpu() state machine on all online cpus with cpu hotplug
      disabled is enough.
      
      stop_machine itself doesn't need to manage any global resources
      anymore, so all per-instance information is rolled into struct
      stop_machine_data and the mutex and all static data variables are
      removed.
      
      The previous implementation created and destroyed RT workqueues as
      necessary which made stop_machine() calls highly expensive on very
      large machines.  According to Dimitri Sivanich, preventing the dynamic
      creation/destruction makes booting faster more than twice on very
      large machines.  cpu_stop resources are preallocated for all online
      cpus and should have the same effect.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      3fc1f1e2
  10. 03 4月, 2010 1 次提交
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 07 3月, 2010 1 次提交
  13. 28 1月, 2010 2 次提交
  14. 17 12月, 2009 1 次提交
  15. 07 12月, 2009 1 次提交
    • P
      sched: Fix balance vs hotplug race · 6ad4c188
      Peter Zijlstra 提交于
      Since (e761b772: cpu hotplug, sched: Introduce cpu_active_map and redo
      sched domain managment) we have cpu_active_mask which is suppose to rule
      scheduler migration and load-balancing, except it never (fully) did.
      
      The particular problem being solved here is a crash in try_to_wake_up()
      where select_task_rq() ends up selecting an offline cpu because
      select_task_rq_fair() trusts the sched_domain tree to reflect the
      current state of affairs, similarly select_task_rq_rt() trusts the
      root_domain.
      
      However, the sched_domains are updated from CPU_DEAD, which is after the
      cpu is taken offline and after stop_machine is done. Therefore it can
      race perfectly well with code assuming the domains are right.
      
      Cure this by building the domains from cpu_active_mask on
      CPU_DOWN_PREPARE.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ad4c188
  16. 26 11月, 2009 1 次提交
    • M
      timers, init: Limit the number of per cpu calibration bootup messages · feae3203
      Mike Travis 提交于
      Limit the number of per cpu calibration messages by only
      printing out results for the first cpu to boot.
      
      Also, don't print "CPUx is down" as this is expected, and we
      don't need 4096 reminders... ;-)
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      feae3203
  17. 02 9月, 2009 1 次提交
  18. 22 8月, 2009 1 次提交
    • S
      x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init · d0af9eed
      Suresh Siddha 提交于
      SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
      the need for synchronizing the logical cpu's while initializing/updating
      MTRR.
      
      Currently Linux kernel does the synchronization of all cpu's only when
      a single MTRR register is programmed/updated. During an AP online
      (during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
      we don't follow this synchronization algorithm.
      
      This can lead to scenarios where during a dynamic cpu online, that logical cpu
      is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
      HT sibling continue to run (also with cache disabled because of cr0.cd=1
      on its sibling).
      
      Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
      (because of some VMX performance optimizations) and the above scenario
      (with one logical cpu doing VMX activity and another logical cpu coming online)
      can result in system crash.
      
      Fix the MTRR initialization by doing rendezvous of all the cpus. During
      boot and resume, we delay the MTRR/PAT init for APs till all the
      logical cpu's come online and the rendezvous process at the end of AP's bringup,
      will initialize the MTRR/PAT for all AP's.
      
      For dynamic single cpu online, we synchronize all the logical cpus and
      do the MTRR/PAT init on the AP that is coming online.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d0af9eed
  19. 22 7月, 2009 1 次提交
    • J
      x86, intel_txt: Intel TXT Sx shutdown support · 86886e55
      Joseph Cihula 提交于
      Support for graceful handling of sleep states (S3/S4/S5) after an Intel(R) TXT launch.
      
      Without this patch, attempting to place the system in one of the ACPI sleep
      states (S3/S4/S5) will cause the TXT hardware to treat this as an attack and
      will cause a system reset, with memory locked.  Not only may the subsequent
      memory scrub take some time, but the platform will be unable to enter the
      requested power state.
      
      This patch calls back into the tboot so that it may properly and securely clean
      up system state and clear the secrets-in-memory flag, after which it will place
      the system into the requested sleep state using ACPI information passed by the kernel.
      
       arch/x86/kernel/smpboot.c     |    2 ++
       drivers/acpi/acpica/hwsleep.c |    3 +++
       kernel/cpu.c                  |    7 ++++++-
       3 files changed, 11 insertions(+), 1 deletion(-)
      Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
      Signed-off-by: NShane Wang <shane.wang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86886e55
  20. 23 6月, 2009 1 次提交
  21. 30 3月, 2009 1 次提交
  22. 08 1月, 2009 1 次提交
    • H
      stop_machine/cpu hotplug: fix disable_nonboot_cpus · a0e280e0
      Heiko Carstens 提交于
      disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
      caller already created the stop_machine workqueue (like cpu_down does).
      Otherwise a call to stop_machine will lead to accesses to random memory
      regions.
      
      When introducing this new interface (9ea09af3
      "stop_machine: introduce stop_machine_create/destroy") I missed the second
      call site of _cpu_down.
      So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
      as well.
      
      Fixes suspend-to-ram/disk and also this bug:
      
      [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
      [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
      [  286.550598] Oops: 0002 [#1] SMP
      [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d54
      [  286.560580] EIP: is at __stop_machine+0x88/0xe3
      [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
      [  286.560580] Call Trace:
      [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
      [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
      [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
      [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
      [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
      [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
      [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
      [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
      [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
      [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
      [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
      [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
      [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
      [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
      [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
      Reported-and-tested-by: N"Justin P. Mattock" <justinmattock@gmail.com>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0e280e0
  23. 05 1月, 2009 1 次提交
    • H
      stop_machine: introduce stop_machine_create/destroy. · 9ea09af3
      Heiko Carstens 提交于
      Introduce stop_machine_create/destroy. With this interface subsystems
      that need a non-failing stop_machine environment can create the
      stop_machine machine threads before actually calling stop_machine.
      When the threads aren't needed anymore they can be killed with
      stop_machine_destroy again.
      
      When stop_machine gets called and the threads aren't present they
      will be created and destroyed automatically. This restores the old
      behaviour of stop_machine.
      
      This patch also converts cpu hotplug to the new interface since it
      is special: cpu_down calls __stop_machine instead of stop_machine.
      However the kstop threads will only be created when stop_machine
      gets called.
      
      Changing the code so that the threads would be created automatically
      on __stop_machine is currently not possible: when __stop_machine gets
      called we hold cpu_add_remove_lock, which is the same lock that
      create_rt_workqueue would take. So the workqueue needs to be created
      before the cpu hotplug code locks cpu_add_remove_lock.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9ea09af3
  24. 01 1月, 2009 1 次提交
    • R
      cpumask: convert kernel/cpu.c · e0b582ec
      Rusty Russell 提交于
      Impact: Reduce kernel stack and memory usage, use new cpumask API.
      
      Use cpumask_var_t for take_cpu_down() stack var, and frozen_cpus.
      
      Note that notify_cpu_starting() can be called before core_initcall
      allocates frozen_cpus, but the NULL check is optimized out by gcc for
      the CONFIG_CPUMASK_OFFSTACK=n case.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e0b582ec
  25. 30 12月, 2008 2 次提交
  26. 13 12月, 2008 1 次提交
    • R
      cpumask: centralize cpu_online_map and cpu_possible_map · 98a79d6a
      Rusty Russell 提交于
      Impact: cleanup
      
      Each SMP arch defines these themselves.  Move them to a central
      location.
      
      Twists:
      1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
         CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
      
      2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
         Those archs simply have phys_cpu_present_map replaced everywhere.
      
      3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
         so I just manipulate them both in sync.
      
      4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
         declarations.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: NGrant Grundler <grundler@parisc-linux.org>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Mike Travis <travis@sgi.com>
      Cc: ink@jurassic.park.msu.ru
      Cc: rmk@arm.linux.org.uk
      Cc: starvik@axis.com
      Cc: tony.luck@intel.com
      Cc: takata@linux-m32r.org
      Cc: ralf@linux-mips.org
      Cc: grundler@parisc-linux.org
      Cc: paulus@samba.org
      Cc: schwidefsky@de.ibm.com
      Cc: lethal@linux-sh.org
      Cc: wli@holomorphy.com
      Cc: davem@davemloft.net
      Cc: jdike@addtoit.com
      Cc: mingo@redhat.com
      98a79d6a
  27. 01 12月, 2008 1 次提交
  28. 06 11月, 2008 1 次提交
    • R
      cpumask: introduce new API, without changing anything · 2d3854a3
      Rusty Russell 提交于
      Impact: introduce new APIs
      
      We want to deprecate cpumasks on the stack, as we are headed for
      gynormous numbers of CPUs.  Eventually, we want to head towards an
      undefined 'struct cpumask' so they can never be declared on stack.
      
      1) New cpumask functions which take pointers instead of copies.
         (cpus_* -> cpumask_*)
      
      2) Several new helpers to reduce requirements for temporary cpumasks
         (cpumask_first_and, cpumask_next_and, cpumask_any_and)
      
      3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
         (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
      
      4) 'struct cpumask' for explicitness and to mark new-style code.
      
      5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
         not NR_CPUS for time efficiency and for smaller dynamic allocations
         in future.
      
      6) cpumask_copy() so we can allocate less than a full cpumask eventually
         (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
         definition eventually.
      
      7) work_on_cpu() helper for doing task on a CPU, rather than saving old
         cpumask for current thread and manipulating it.
      
      8) smp_call_function_many() which is smp_call_function_mask() except
         taking a cpumask pointer.
      
      Note that this patch simply introduces the new functions and leaves
      the obsolescent ones in place.  This is to simplify the transition
      patches.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d3854a3
  29. 09 9月, 2008 1 次提交
    • M
      kernel/cpu.c: create a CPU_STARTING cpu_chain notifier · e545a614
      Manfred Spraul 提交于
      Right now, there is no notifier that is called on a new cpu, before the new
      cpu begins processing interrupts/softirqs.
      Various kernel function would need that notification, e.g. kvm works around
      by calling smp_call_function_single(), rcu polls cpu_online_map.
      
      The patch adds a CPU_STARTING notification. It also adds a helper function
      that sends the message to all cpu_chain handlers.
      
      Tested on x86-64.
      All other archs are untested. Especially on sparc, I'm not sure if I got
      it right.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e545a614
  30. 07 9月, 2008 1 次提交
    • M
      kernel/cpu.c: Move the CPU_DYING notifiers · 3ba35573
      Manfred Spraul 提交于
      When a cpu is taken offline, the CPU_DYING notifiers are called on the
      dying cpu. According to <linux/notifiers.h>, the cpu should be "not
      running any task, not handling interrupts, soon dead".
      
      For the current implementation, this is not true:
      - __cpu_disable can fail. If it fails, then the cpu will remain alive
        and happy.
      - At least on x86, __cpu_disable() briefly enables the local interrupts
        to handle any outstanding interrupts.
      
      What about moving CPU_DYING down a few lines, behind the __cpu_disable()
      line?
      There are only two CPU_DYING handlers in the kernel right now: one in
      kvm, one in the scheduler. Both should work with the patch applied
      [and: I'm not sure if either one handles a failing __cpu_disable()]
      
      The patch survives simple offlining a cpu. kvm untested due to lack
      of a test setup.
      Signed-off-By: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ba35573
  31. 13 8月, 2008 1 次提交
  32. 11 8月, 2008 1 次提交
    • D
      sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks · 279ef6bb
      Dmitry Adamushko 提交于
      Mark Langsdorf reported:
      
      > One of my co-workers noticed that the powernow-k8
      > driver no longer restarts when a CPU core is
      > hot-disabled and then hot-enabled on AMD quad-core
      > systems.
      >
      > The following comands work fine on 2.6.26 and fail
      > on 2.6.27-rc1:
      >
      > echo 0 > /sys/devices/system/cpu/cpu3/online
      > echo 1 > /sys/devices/system/cpu/cpu3/online
      > find /sys -name cpufreq
      >
      > For 2.6.26, the find will return a cpufreq
      > directory for each processor.  In 2.6.27-rc1,
      > the cpu3 directory is missing.
      >
      > After digging through the code, the following
      > logic is failing when the core is hot-enabled
      > at runtime.  The code works during the boot
      > sequence.
      >
      >       cpumask_t = current->cpus_allowed;
      >       set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
      >       if (smp_processor_id() != cpu)
      >               return -ENODEV;
      
      So set the CPU active before calling the CPU_ONLINE notifier chain,
      there are a handful of notifiers that use set_cpus_allowed().
      
      This fix also solves the problem with x86-microcode. I've sent
      alternative patches for microcode, but as this "rely on
      set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
      assumption seems to be more broad than what we thought, perhaps this fix
      should be applied.
      
      With this patch we define that by the moment CPU_ONLINE is being sent,
      a 'cpu' is online and ready for tasks to be migrated onto it.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Reported-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Tested-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      279ef6bb