提交 74b84233 编写于 作者: L Linus Torvalds

Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 BSP hotplug changes from Ingo Molnar:
 "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
  x86, just like any other CPU.  Enabled on Intel CPUs for now.

  Allowing this required the identification and fixing of latent CPU#0
  assumptions (such as CPU#0 initializations, etc.) in the x86
  architecture code, plus the identification of barriers to
  BSP-offlining, such as active PIC interrupts which can only be
  serviced on the BSP.

  It's behind a default-off option, and there's a debug option that
  allows the automatic testing of this feature.

  The motivation of this feature is to allow and prepare for true
  CPU-hotplug hardware support: recent changes to MCE support enable us
  to detect a deteriorating but not yet hard-failing L1/L2 cache on a
  CPU that could be soft-unplugged - or a failing L3 cache on a
  multi-socket system.

  Note that true hardware hot-plug is not yet fully enabled by this,
  because that requires a special platform wakeup sequence to be sent to
  the freshly powered up CPU#0.  Future patches for this are planned,
  once such a platform exists.  Chicken and egg"

* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, topology: Debug CPU0 hotplug
  x86/i387.c: Initialize thread xstate only on CPU0 only once
  x86, hotplug: Handle retrigger irq by the first available CPU
  x86, hotplug: The first online processor saves the MTRR state
  x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
  x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
  x86-32, hotplug: Add start_cpu0() entry point to head_32.S
  x86-64, hotplug: Add start_cpu0() entry point to head_64.S
  kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
  x86, hotplug, suspend: Online CPU0 for suspend or hibernate
  x86, hotplug: Support functions for CPU0 online/offline
  x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
  x86, Kconfig: Add config switch for CPU0 hotplug
  doc: Add x86 CPU0 online/offline feature
...@@ -207,6 +207,30 @@ by making it not-removable. ...@@ -207,6 +207,30 @@ by making it not-removable.
In such cases you will also notice that the online file is missing under cpu0. In such cases you will also notice that the online file is missing under cpu0.
Q: Is CPU0 removable on X86?
A: Yes. If kernel is compiled with CONFIG_BOOTPARAM_HOTPLUG_CPU0=y, CPU0 is
removable by default. Otherwise, CPU0 is also removable by kernel option
cpu0_hotplug.
But some features depend on CPU0. Two known dependencies are:
1. Resume from hibernate/suspend depends on CPU0. Hibernate/suspend will fail if
CPU0 is offline and you need to online CPU0 before hibernate/suspend can
continue.
2. PIC interrupts also depend on CPU0. CPU0 can't be removed if a PIC interrupt
is detected.
It's said poweroff/reboot may depend on CPU0 on some machines although I haven't
seen any poweroff/reboot failure so far after CPU0 is offline on a few tested
machines.
Please let me know if you know or see any other dependencies of CPU0.
If the dependencies are under your control, you can turn on CPU0 hotplug feature
either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug.
--Fenghua Yu <fenghua.yu@intel.com>
Q: How do i find out if a particular CPU is not removable? Q: How do i find out if a particular CPU is not removable?
A: Depending on the implementation, some architectures may show this by the A: Depending on the implementation, some architectures may show this by the
absence of the "online" file. This is done if it can be determined ahead of absence of the "online" file. This is done if it can be determined ahead of
......
...@@ -1984,6 +1984,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted. ...@@ -1984,6 +1984,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nox2apic [X86-64,APIC] Do not enable x2APIC mode. nox2apic [X86-64,APIC] Do not enable x2APIC mode.
cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
Some features depend on CPU0. Known dependencies are:
1. Resume from suspend/hibernate depends on CPU0.
Suspend/hibernate will fail if CPU0 is offline and you
need to online CPU0 before suspend/hibernate.
2. PIC interrupts also depend on CPU0. CPU0 can't be
removed if a PIC interrupt is detected.
It's said poweroff/reboot may depend on CPU0 on some
machines although I haven't seen such issues so far
after CPU0 is offline on a few tested machines.
If the dependencies are under your control, you can
turn on cpu0_hotplug.
nptcg= [IA-64] Override max number of concurrent global TLB nptcg= [IA-64] Override max number of concurrent global TLB
purges which is reported from either PAL_VM_SUMMARY or purges which is reported from either PAL_VM_SUMMARY or
SAL PALO. SAL PALO.
......
...@@ -1698,6 +1698,50 @@ config HOTPLUG_CPU ...@@ -1698,6 +1698,50 @@ config HOTPLUG_CPU
automatically on SMP systems. ) automatically on SMP systems. )
Say N if you want to disable CPU hotplug. Say N if you want to disable CPU hotplug.
config BOOTPARAM_HOTPLUG_CPU0
bool "Set default setting of cpu0_hotpluggable"
default n
depends on HOTPLUG_CPU && EXPERIMENTAL
---help---
Set whether default state of cpu0_hotpluggable is on or off.
Say Y here to enable CPU0 hotplug by default. If this switch
is turned on, there is no need to give cpu0_hotplug kernel
parameter and the CPU0 hotplug feature is enabled by default.
Please note: there are two known CPU0 dependencies if you want
to enable the CPU0 hotplug feature either by this switch or by
cpu0_hotplug kernel parameter.
First, resume from hibernate or suspend always starts from CPU0.
So hibernate and suspend are prevented if CPU0 is offline.
Second dependency is PIC interrupts always go to CPU0. CPU0 can not
offline if any interrupt can not migrate out of CPU0. There may
be other CPU0 dependencies.
Please make sure the dependencies are under your control before
you enable this feature.
Say N if you don't want to enable CPU0 hotplug feature by default.
You still can enable the CPU0 hotplug feature at boot by kernel
parameter cpu0_hotplug.
config DEBUG_HOTPLUG_CPU0
def_bool n
prompt "Debug CPU0 hotplug"
depends on HOTPLUG_CPU && EXPERIMENTAL
---help---
Enabling this option offlines CPU0 (if CPU0 can be offlined) as
soon as possible and boots up userspace with CPU0 offlined. User
can online CPU0 back after boot time.
To debug CPU0 hotplug, you need to enable CPU0 offline/online
feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
compilation or giving cpu0_hotplug kernel parameter at boot.
If unsure, say N.
config COMPAT_VDSO config COMPAT_VDSO
def_bool y def_bool y
prompt "Compat VDSO support" prompt "Compat VDSO support"
......
...@@ -28,6 +28,10 @@ struct x86_cpu { ...@@ -28,6 +28,10 @@ struct x86_cpu {
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num); extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int); extern void arch_unregister_cpu(int);
extern void __cpuinit start_cpu0(void);
#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
extern int _debug_hotplug_cpu(int cpu, int action);
#endif
#endif #endif
DECLARE_PER_CPU(int, cpu_state); DECLARE_PER_CPU(int, cpu_state);
......
...@@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask); ...@@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask);
void native_send_call_func_single_ipi(int cpu); void native_send_call_func_single_ipi(int cpu);
void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
void smp_store_boot_cpu_info(void);
void smp_store_cpu_info(int id); void smp_store_cpu_info(int id);
#define cpu_physical_id(cpu) per_cpu(x86_cpu_to_apicid, cpu) #define cpu_physical_id(cpu) per_cpu(x86_cpu_to_apicid, cpu)
......
...@@ -2199,9 +2199,11 @@ static int ioapic_retrigger_irq(struct irq_data *data) ...@@ -2199,9 +2199,11 @@ static int ioapic_retrigger_irq(struct irq_data *data)
{ {
struct irq_cfg *cfg = data->chip_data; struct irq_cfg *cfg = data->chip_data;
unsigned long flags; unsigned long flags;
int cpu;
raw_spin_lock_irqsave(&vector_lock, flags); raw_spin_lock_irqsave(&vector_lock, flags);
apic->send_IPI_mask(cpumask_of(cpumask_first(cfg->domain)), cfg->vector); cpu = cpumask_first_and(cfg->domain, cpu_online_mask);
apic->send_IPI_mask(cpumask_of(cpu), cfg->vector);
raw_spin_unlock_irqrestore(&vector_lock, flags); raw_spin_unlock_irqrestore(&vector_lock, flags);
return 1; return 1;
......
...@@ -1237,7 +1237,7 @@ void __cpuinit cpu_init(void) ...@@ -1237,7 +1237,7 @@ void __cpuinit cpu_init(void)
oist = &per_cpu(orig_ist, cpu); oist = &per_cpu(orig_ist, cpu);
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
if (cpu != 0 && this_cpu_read(numa_node) == 0 && if (this_cpu_read(numa_node) == 0 &&
early_cpu_to_node(cpu) != NUMA_NO_NODE) early_cpu_to_node(cpu) != NUMA_NO_NODE)
set_numa_node(early_cpu_to_node(cpu)); set_numa_node(early_cpu_to_node(cpu));
#endif #endif
...@@ -1269,8 +1269,7 @@ void __cpuinit cpu_init(void) ...@@ -1269,8 +1269,7 @@ void __cpuinit cpu_init(void)
barrier(); barrier();
x86_configure_nx(); x86_configure_nx();
if (cpu != 0) enable_x2apic();
enable_x2apic();
/* /*
* set up and load the per-CPU TSS * set up and load the per-CPU TSS
......
...@@ -695,11 +695,16 @@ void mtrr_ap_init(void) ...@@ -695,11 +695,16 @@ void mtrr_ap_init(void)
} }
/** /**
* Save current fixed-range MTRR state of the BSP * Save current fixed-range MTRR state of the first cpu in cpu_online_mask.
*/ */
void mtrr_save_state(void) void mtrr_save_state(void)
{ {
smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1); int first_cpu;
get_online_cpus();
first_cpu = cpumask_first(cpu_online_mask);
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
put_online_cpus();
} }
void set_mtrr_aps_delayed_init(void) void set_mtrr_aps_delayed_init(void)
......
...@@ -266,6 +266,19 @@ num_subarch_entries = (. - subarch_entries) / 4 ...@@ -266,6 +266,19 @@ num_subarch_entries = (. - subarch_entries) / 4
jmp default_entry jmp default_entry
#endif /* CONFIG_PARAVIRT */ #endif /* CONFIG_PARAVIRT */
#ifdef CONFIG_HOTPLUG_CPU
/*
* Boot CPU0 entry point. It's called from play_dead(). Everything has been set
* up already except stack. We just set up stack here. Then call
* start_secondary().
*/
ENTRY(start_cpu0)
movl stack_start, %ecx
movl %ecx, %esp
jmp *(initial_code)
ENDPROC(start_cpu0)
#endif
/* /*
* Non-boot CPU entry point; entered from trampoline.S * Non-boot CPU entry point; entered from trampoline.S
* We can't lgdt here, because lgdt itself uses a data segment, but * We can't lgdt here, because lgdt itself uses a data segment, but
......
...@@ -252,6 +252,22 @@ ENTRY(secondary_startup_64) ...@@ -252,6 +252,22 @@ ENTRY(secondary_startup_64)
pushq %rax # target address in negative space pushq %rax # target address in negative space
lretq lretq
#ifdef CONFIG_HOTPLUG_CPU
/*
* Boot CPU0 entry point. It's called from play_dead(). Everything has been set
* up already except stack. We just set up stack here. Then call
* start_secondary().
*/
ENTRY(start_cpu0)
movq stack_start(%rip),%rsp
movq initial_code(%rip),%rax
pushq $0 # fake return address to stop unwinder
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
ENDPROC(start_cpu0)
#endif
/* SMP bootup changes these two */ /* SMP bootup changes these two */
__REFDATA __REFDATA
.align 8 .align 8
......
...@@ -175,7 +175,11 @@ void __cpuinit fpu_init(void) ...@@ -175,7 +175,11 @@ void __cpuinit fpu_init(void)
cr0 |= X86_CR0_EM; cr0 |= X86_CR0_EM;
write_cr0(cr0); write_cr0(cr0);
if (!smp_processor_id()) /*
* init_thread_xstate is only called once to avoid overriding
* xstate_size during boot time or during CPU hotplug.
*/
if (xstate_size == 0)
init_thread_xstate(); init_thread_xstate();
mxcsr_feature_mask_init(); mxcsr_feature_mask_init();
......
...@@ -127,8 +127,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info); ...@@ -127,8 +127,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
atomic_t init_deasserted; atomic_t init_deasserted;
/* /*
* Report back to the Boot Processor. * Report back to the Boot Processor during boot time or to the caller processor
* Running on AP. * during CPU online.
*/ */
static void __cpuinit smp_callin(void) static void __cpuinit smp_callin(void)
{ {
...@@ -140,15 +140,17 @@ static void __cpuinit smp_callin(void) ...@@ -140,15 +140,17 @@ static void __cpuinit smp_callin(void)
* we may get here before an INIT-deassert IPI reaches * we may get here before an INIT-deassert IPI reaches
* our local APIC. We have to wait for the IPI or we'll * our local APIC. We have to wait for the IPI or we'll
* lock up on an APIC access. * lock up on an APIC access.
*
* Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI.
*/ */
if (apic->wait_for_init_deassert) cpuid = smp_processor_id();
if (apic->wait_for_init_deassert && cpuid != 0)
apic->wait_for_init_deassert(&init_deasserted); apic->wait_for_init_deassert(&init_deasserted);
/* /*
* (This works even if the APIC is not enabled.) * (This works even if the APIC is not enabled.)
*/ */
phys_id = read_apic_id(); phys_id = read_apic_id();
cpuid = smp_processor_id();
if (cpumask_test_cpu(cpuid, cpu_callin_mask)) { if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__, panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
phys_id, cpuid); phys_id, cpuid);
...@@ -230,6 +232,8 @@ static void __cpuinit smp_callin(void) ...@@ -230,6 +232,8 @@ static void __cpuinit smp_callin(void)
cpumask_set_cpu(cpuid, cpu_callin_mask); cpumask_set_cpu(cpuid, cpu_callin_mask);
} }
static int cpu0_logical_apicid;
static int enable_start_cpu0;
/* /*
* Activate a secondary processor. * Activate a secondary processor.
*/ */
...@@ -245,6 +249,8 @@ notrace static void __cpuinit start_secondary(void *unused) ...@@ -245,6 +249,8 @@ notrace static void __cpuinit start_secondary(void *unused)
preempt_disable(); preempt_disable();
smp_callin(); smp_callin();
enable_start_cpu0 = 0;
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
/* switch away from the initial page table */ /* switch away from the initial page table */
load_cr3(swapper_pg_dir); load_cr3(swapper_pg_dir);
...@@ -281,19 +287,30 @@ notrace static void __cpuinit start_secondary(void *unused) ...@@ -281,19 +287,30 @@ notrace static void __cpuinit start_secondary(void *unused)
cpu_idle(); cpu_idle();
} }
void __init smp_store_boot_cpu_info(void)
{
int id = 0; /* CPU 0 */
struct cpuinfo_x86 *c = &cpu_data(id);
*c = boot_cpu_data;
c->cpu_index = id;
}
/* /*
* The bootstrap kernel entry code has set these up. Save them for * The bootstrap kernel entry code has set these up. Save them for
* a given CPU * a given CPU
*/ */
void __cpuinit smp_store_cpu_info(int id) void __cpuinit smp_store_cpu_info(int id)
{ {
struct cpuinfo_x86 *c = &cpu_data(id); struct cpuinfo_x86 *c = &cpu_data(id);
*c = boot_cpu_data; *c = boot_cpu_data;
c->cpu_index = id; c->cpu_index = id;
if (id != 0) /*
identify_secondary_cpu(c); * During boot time, CPU0 has this setup already. Save the info when
* bringing up AP or offlined CPU0.
*/
identify_secondary_cpu(c);
} }
static bool __cpuinit static bool __cpuinit
...@@ -483,7 +500,7 @@ void __inquire_remote_apic(int apicid) ...@@ -483,7 +500,7 @@ void __inquire_remote_apic(int apicid)
* won't ... remember to clear down the APIC, etc later. * won't ... remember to clear down the APIC, etc later.
*/ */
int __cpuinit int __cpuinit
wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
{ {
unsigned long send_status, accept_status = 0; unsigned long send_status, accept_status = 0;
int maxlvt; int maxlvt;
...@@ -491,7 +508,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) ...@@ -491,7 +508,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
/* Target chip */ /* Target chip */
/* Boot on the stack */ /* Boot on the stack */
/* Kick the second */ /* Kick the second */
apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid); apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);
pr_debug("Waiting for send to finish...\n"); pr_debug("Waiting for send to finish...\n");
send_status = safe_apic_wait_icr_idle(); send_status = safe_apic_wait_icr_idle();
...@@ -651,6 +668,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid) ...@@ -651,6 +668,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid)
node, cpu, apicid); node, cpu, apicid);
} }
static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
{
int cpu;
cpu = smp_processor_id();
if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
return NMI_HANDLED;
return NMI_DONE;
}
/*
* Wake up AP by INIT, INIT, STARTUP sequence.
*
* Instead of waiting for STARTUP after INITs, BSP will execute the BIOS
* boot-strap code which is not a desired behavior for waking up BSP. To
* void the boot-strap code, wake up CPU0 by NMI instead.
*
* This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined
* (i.e. physically hot removed and then hot added), NMI won't wake it up.
* We'll change this code in the future to wake up hard offlined CPU0 if
* real platform and request are available.
*/
static int __cpuinit
wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
int *cpu0_nmi_registered)
{
int id;
int boot_error;
/*
* Wake up AP by INIT, INIT, STARTUP sequence.
*/
if (cpu)
return wakeup_secondary_cpu_via_init(apicid, start_ip);
/*
* Wake up BSP by nmi.
*
* Register a NMI handler to help wake up CPU0.
*/
boot_error = register_nmi_handler(NMI_LOCAL,
wakeup_cpu0_nmi, 0, "wake_cpu0");
if (!boot_error) {
enable_start_cpu0 = 1;
*cpu0_nmi_registered = 1;
if (apic->dest_logical == APIC_DEST_LOGICAL)
id = cpu0_logical_apicid;
else
id = apicid;
boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
}
return boot_error;
}
/* /*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
* (ie clustered apic addressing mode), this is a LOGICAL apic ID. * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
...@@ -666,6 +740,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) ...@@ -666,6 +740,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
unsigned long boot_error = 0; unsigned long boot_error = 0;
int timeout; int timeout;
int cpu0_nmi_registered = 0;
/* Just in case we booted with a single CPU. */ /* Just in case we booted with a single CPU. */
alternatives_enable_smp(); alternatives_enable_smp();
...@@ -713,13 +788,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) ...@@ -713,13 +788,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
} }
/* /*
* Kick the secondary CPU. Use the method in the APIC driver * Wake up a CPU in difference cases:
* if it's defined - or use an INIT boot APIC message otherwise: * - Use the method in the APIC driver if it's defined
* Otherwise,
* - Use an INIT boot APIC message for APs or NMI for BSP.
*/ */
if (apic->wakeup_secondary_cpu) if (apic->wakeup_secondary_cpu)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip); boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
else else
boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip); boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
&cpu0_nmi_registered);
if (!boot_error) { if (!boot_error) {
/* /*
...@@ -784,6 +862,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) ...@@ -784,6 +862,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/ */
smpboot_restore_warm_reset_vector(); smpboot_restore_warm_reset_vector();
} }
/*
* Clean up the nmi handler. Do this after the callin and callout sync
* to avoid impact of possible long unregister time.
*/
if (cpu0_nmi_registered)
unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
return boot_error; return boot_error;
} }
...@@ -797,7 +882,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle) ...@@ -797,7 +882,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)
pr_debug("++++++++++++++++++++=_---CPU UP %u\n", cpu); pr_debug("++++++++++++++++++++=_---CPU UP %u\n", cpu);
if (apicid == BAD_APICID || apicid == boot_cpu_physical_apicid || if (apicid == BAD_APICID ||
!physid_isset(apicid, phys_cpu_present_map) || !physid_isset(apicid, phys_cpu_present_map) ||
!apic->apic_id_valid(apicid)) { !apic->apic_id_valid(apicid)) {
pr_err("%s: bad cpu %d\n", __func__, cpu); pr_err("%s: bad cpu %d\n", __func__, cpu);
...@@ -995,7 +1080,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) ...@@ -995,7 +1080,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
/* /*
* Setup boot CPU information * Setup boot CPU information
*/ */
smp_store_cpu_info(0); /* Final full version of the data */ smp_store_boot_cpu_info(); /* Final full version of the data */
cpumask_copy(cpu_callin_mask, cpumask_of(0)); cpumask_copy(cpu_callin_mask, cpumask_of(0));
mb(); mb();
...@@ -1031,6 +1116,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) ...@@ -1031,6 +1116,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
*/ */
setup_local_APIC(); setup_local_APIC();
if (x2apic_mode)
cpu0_logical_apicid = apic_read(APIC_LDR);
else
cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
/* /*
* Enable IO APIC before setting up error vector * Enable IO APIC before setting up error vector
*/ */
...@@ -1219,19 +1309,6 @@ void cpu_disable_common(void) ...@@ -1219,19 +1309,6 @@ void cpu_disable_common(void)
int native_cpu_disable(void) int native_cpu_disable(void)
{ {
int cpu = smp_processor_id();
/*
* Perhaps use cpufreq to drop frequency, but that could go
* into generic code.
*
* We won't take down the boot processor on i386 due to some
* interrupts only being able to be serviced by the BSP.
* Especially so if we're not using an IOAPIC -zwane
*/
if (cpu == 0)
return -EBUSY;
clear_local_APIC(); clear_local_APIC();
cpu_disable_common(); cpu_disable_common();
...@@ -1271,6 +1348,14 @@ void play_dead_common(void) ...@@ -1271,6 +1348,14 @@ void play_dead_common(void)
local_irq_disable(); local_irq_disable();
} }
static bool wakeup_cpu0(void)
{
if (smp_processor_id() == 0 && enable_start_cpu0)
return true;
return false;
}
/* /*
* We need to flush the caches before going to sleep, lest we have * We need to flush the caches before going to sleep, lest we have
* dirty data in our caches when we come back up. * dirty data in our caches when we come back up.
...@@ -1334,6 +1419,11 @@ static inline void mwait_play_dead(void) ...@@ -1334,6 +1419,11 @@ static inline void mwait_play_dead(void)
__monitor(mwait_ptr, 0, 0); __monitor(mwait_ptr, 0, 0);
mb(); mb();
__mwait(eax, 0); __mwait(eax, 0);
/*
* If NMI wants to wake up CPU0, start CPU0.
*/
if (wakeup_cpu0())
start_cpu0();
} }
} }
...@@ -1344,6 +1434,11 @@ static inline void hlt_play_dead(void) ...@@ -1344,6 +1434,11 @@ static inline void hlt_play_dead(void)
while (1) { while (1) {
native_halt(); native_halt();
/*
* If NMI wants to wake up CPU0, start CPU0.
*/
if (wakeup_cpu0())
start_cpu0();
} }
} }
......
...@@ -30,23 +30,110 @@ ...@@ -30,23 +30,110 @@
#include <linux/mmzone.h> #include <linux/mmzone.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/irq.h>
#include <asm/cpu.h> #include <asm/cpu.h>
static DEFINE_PER_CPU(struct x86_cpu, cpu_devices); static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0
static int cpu0_hotpluggable = 1;
#else
static int cpu0_hotpluggable;
static int __init enable_cpu0_hotplug(char *str)
{
cpu0_hotpluggable = 1;
return 1;
}
__setup("cpu0_hotplug", enable_cpu0_hotplug);
#endif
#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
/*
* This function offlines a CPU as early as possible and allows userspace to
* boot up without the CPU. The CPU can be onlined back by user after boot.
*
* This is only called for debugging CPU offline/online feature.
*/
int __ref _debug_hotplug_cpu(int cpu, int action)
{
struct device *dev = get_cpu_device(cpu);
int ret;
if (!cpu_is_hotpluggable(cpu))
return -EINVAL;
cpu_hotplug_driver_lock();
switch (action) {
case 0:
ret = cpu_down(cpu);
if (!ret) {
pr_info("CPU %u is now offline\n", cpu);
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
} else
pr_debug("Can't offline CPU%d.\n", cpu);
break;
case 1:
ret = cpu_up(cpu);
if (!ret)
kobject_uevent(&dev->kobj, KOBJ_ONLINE);
else
pr_debug("Can't online CPU%d.\n", cpu);
break;
default:
ret = -EINVAL;
}
cpu_hotplug_driver_unlock();
return ret;
}
static int __init debug_hotplug_cpu(void)
{
_debug_hotplug_cpu(0, 0);
return 0;
}
late_initcall_sync(debug_hotplug_cpu);
#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */
int __ref arch_register_cpu(int num) int __ref arch_register_cpu(int num)
{ {
struct cpuinfo_x86 *c = &cpu_data(num);
/*
* Currently CPU0 is only hotpluggable on Intel platforms. Other
* vendors can add hotplug support later.
*/
if (c->x86_vendor != X86_VENDOR_INTEL)
cpu0_hotpluggable = 0;
/* /*
* CPU0 cannot be offlined due to several * Two known BSP/CPU0 dependencies: Resume from suspend/hibernate
* restrictions and assumptions in kernel. This basically * depends on BSP. PIC interrupts depend on BSP.
* doesn't add a control file, one cannot attempt to offline
* BSP.
* *
* Also certain PCI quirks require not to enable hotplug control * If the BSP depencies are under control, one can tell kernel to
* for all CPU's. * enable BSP hotplug. This basically adds a control file and
* one can attempt to offline BSP.
*/ */
if (num) if (num == 0 && cpu0_hotpluggable) {
unsigned int irq;
/*
* We won't take down the boot processor on i386 if some
* interrupts only are able to be serviced by the BSP in PIC.
*/
for_each_active_irq(irq) {
if (!IO_APIC_IRQ(irq) && irq_has_action(irq)) {
cpu0_hotpluggable = 0;
break;
}
}
}
if (num || cpu0_hotpluggable)
per_cpu(cpu_devices, num).cpu.hotpluggable = 1; per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
return register_cpu(&per_cpu(cpu_devices, num).cpu, num); return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
......
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include <asm/suspend.h> #include <asm/suspend.h>
#include <asm/debugreg.h> #include <asm/debugreg.h>
#include <asm/fpu-internal.h> /* pcntxt_mask */ #include <asm/fpu-internal.h> /* pcntxt_mask */
#include <asm/cpu.h>
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
static struct saved_context saved_context; static struct saved_context saved_context;
...@@ -237,3 +238,84 @@ void restore_processor_state(void) ...@@ -237,3 +238,84 @@ void restore_processor_state(void)
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
EXPORT_SYMBOL(restore_processor_state); EXPORT_SYMBOL(restore_processor_state);
#endif #endif
/*
* When bsp_check() is called in hibernate and suspend, cpu hotplug
* is disabled already. So it's unnessary to handle race condition between
* cpumask query and cpu hotplug.
*/
static int bsp_check(void)
{
if (cpumask_first(cpu_online_mask) != 0) {
pr_warn("CPU0 is offline.\n");
return -ENODEV;
}
return 0;
}
static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
void *ptr)
{
int ret = 0;
switch (action) {
case PM_SUSPEND_PREPARE:
case PM_HIBERNATION_PREPARE:
ret = bsp_check();
break;
#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
case PM_RESTORE_PREPARE:
/*
* When system resumes from hibernation, online CPU0 because
* 1. it's required for resume and
* 2. the CPU was online before hibernation
*/
if (!cpu_online(0))
_debug_hotplug_cpu(0, 1);
break;
case PM_POST_RESTORE:
/*
* When a resume really happens, this code won't be called.
*
* This code is called only when user space hibernation software
* prepares for snapshot device during boot time. So we just
* call _debug_hotplug_cpu() to restore to CPU0's state prior to
* preparing the snapshot device.
*
* This works for normal boot case in our CPU0 hotplug debug
* mode, i.e. CPU0 is offline and user mode hibernation
* software initializes during boot time.
*
* If CPU0 is online and user application accesses snapshot
* device after boot time, this will offline CPU0 and user may
* see different CPU0 state before and after accessing
* the snapshot device. But hopefully this is not a case when
* user debugging CPU0 hotplug. Even if users hit this case,
* they can easily online CPU0 back.
*
* To simplify this debug code, we only consider normal boot
* case. Otherwise we need to remember CPU0's state and restore
* to that state and resolve racy conditions etc.
*/
_debug_hotplug_cpu(0, 0);
break;
#endif
default:
break;
}
return notifier_from_errno(ret);
}
static int __init bsp_pm_check_init(void)
{
/*
* Set this bsp_pm_callback as lower priority than
* cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
* earlier to disable cpu hotplug before bsp online check.
*/
pm_notifier(bsp_pm_callback, -INT_MAX);
return 0;
}
core_initcall(bsp_pm_check_init);
...@@ -603,6 +603,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb, ...@@ -603,6 +603,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,
static int __init cpu_hotplug_pm_sync_init(void) static int __init cpu_hotplug_pm_sync_init(void)
{ {
/*
* cpu_hotplug_pm_callback has higher priority than x86
* bsp_pm_callback which depends on cpu_hotplug_pm_callback
* to disable cpu hotplug to avoid cpu hotplug race.
*/
pm_notifier(cpu_hotplug_pm_callback, 0); pm_notifier(cpu_hotplug_pm_callback, 0);
return 0; return 0;
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册