1. 21 4月, 2011 1 次提交
  2. 16 4月, 2011 1 次提交
    • K
      x86, NUMA: Fix fakenuma boot failure · 7d6b4670
      KOSAKI Motohiro 提交于
      Currently, numa=fake boot parameter is broken. If it's used,
      kernel may panic due to devide by zero error depending on CPU
      configuration
      
      Call Trace:
       [<ffffffff8104ad4c>] find_busiest_group+0x38c/0xd30
       [<ffffffff81086aff>] ? local_clock+0x6f/0x80
       [<ffffffff81050533>] load_balance+0xa3/0x600
       [<ffffffff81050f53>] idle_balance+0xf3/0x180
       [<ffffffff81550092>] schedule+0x722/0x7d0
       [<ffffffff81550538>] ? wait_for_common+0x128/0x190
       [<ffffffff81550a65>] schedule_timeout+0x265/0x320
       [<ffffffff81095815>] ? lock_release_holdtime+0x35/0x1a0
       [<ffffffff81550538>] ? wait_for_common+0x128/0x190
       [<ffffffff8109bb6c>] ? __lock_release+0x9c/0x1d0
       [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
       [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
       [<ffffffff81550540>] wait_for_common+0x130/0x190
       [<ffffffff81051920>] ? try_to_wake_up+0x510/0x510
       [<ffffffff8155067d>] wait_for_completion+0x1d/0x20
       [<ffffffff8107f36c>] kthread_create_on_node+0xac/0x150
       [<ffffffff81077bb0>] ? process_scheduled_works+0x40/0x40
       [<ffffffff8155045f>] ? wait_for_common+0x4f/0x190
       [<ffffffff8107a283>] __alloc_workqueue_key+0x1a3/0x590
       [<ffffffff81e0cce2>] cpuset_init_smp+0x6b/0x7b
       [<ffffffff81df3d07>] kernel_init+0xc3/0x182
       [<ffffffff8155d5e4>] kernel_thread_helper+0x4/0x10
       [<ffffffff81553cd4>] ? retint_restore_args+0x13/0x13
       [<ffffffff81df3c44>] ? start_kernel+0x400/0x400
       [<ffffffff8155d5e0>] ? gs_change+0x13/0x13
      
      The divede by zero is caused by the following line,
      group->cpu_power==0:
      
       kernel/sched_fair.c::update_sg_lb_stats()
              /* Adjust by relative CPU power of the group */
              sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
      
      This regression was caused by commit e23bba60 ("x86-64, NUMA: Unify
      emulated distance mapping") because it changes cpu -> node
      mapping in the process of dropping fake_physnodes().
      
        old) all cpus are assinged node 0
        now) cpus are assigned round robin
             (the logic is implemented by numa_init_array())
      
        Note: The change in behavior only happens if the system doesn't
              have neither ACPI SRAT table nor AMD northbridge NUMA
      	information.
      
      Round robin assignment doesn't work because init_numa_sched_groups_power()
      assumes all logical cpus in the same physical cpu share the same node
      (then it only accounts for group_first_cpu()), and the simple round robin
      breaks the above assumption.
      
      Thus, this patch implements a reassignment of node-ids if buggy firmware
      or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
      in the same physical cpu share the same node.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110415203928.1303.A69D9226@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      7d6b4670
  3. 29 3月, 2011 1 次提交
    • C
      x86: A fast way to check capabilities of the current cpu · 349c004e
      Christoph Lameter 提交于
      Add this_cpu_has() which determines if the current cpu has a certain
      ability using a segment prefix and a bit test operation.
      
      For that we need to add bit operations to x86s percpu.h.
      
      Many uses of cpu_has use a pointer passed to a function to determine
      the current flags. That is no longer necessary after this patch.
      
      However, this patch only converts the straightforward cases where
      cpu_has is used with this_cpu_ptr. The rest is work for later.
      
      -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
           of percpu_read_stable().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      349c004e
  4. 23 2月, 2011 1 次提交
    • H
      x86: Rework arch_disable_smp_support() for x86 · 7167d08e
      Henrik Kretzschmar 提交于
      Currently arch_disable_smp_support() on x86 disables only the
      support for the IOAPIC and is also compiled in if SMP-support is
      not.
      
      Therefore this function is renamed to disable_ioapic_support(),
      which meets its purpose and is only compiled in the kernel
      when IOAPIC support is also.
      
      A new arch_disable_smp_support() is created in smpboot.c,
      which calls disable_ioapic_support() and gets only compiled
      in the kernel when SMP support is also.
      Signed-off-by: NHenrik Kretzschmar <henne@nachtwindheim.de>
      LKML-Reference: <1298385487-4708-3-git-send-email-henne@nachtwindheim.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7167d08e
  5. 18 2月, 2011 1 次提交
    • H
      x86, trampoline: Common infrastructure for low memory trampolines · 4822b7fc
      H. Peter Anvin 提交于
      Common infrastructure for low memory trampolines.  This code installs
      the trampolines permanently in low memory very early.  It also permits
      multiple pieces of code to be used for this purpose.
      
      This code also introduces a standard infrastructure for computing
      symbol addresses in the trampoline code.
      
      The only change to the actual SMP trampolines themselves is that the
      64-bit trampoline has been made reusable -- the previous version would
      overwrite the code with a status variable; this moves the status
      variable to a separate location.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <4D5DFBE4.7090104@intel.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Matthieu Castet <castet.matthieu@free.fr>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      4822b7fc
  6. 10 2月, 2011 1 次提交
    • J
      x86: Fix section mismatch in LAPIC initialization · 2fb270f3
      Jan Beulich 提交于
      Additionally doing things conditionally upon smp_processor_id()
      being zero is generally a bad idea, as this means CPU 0 cannot
      be offlined and brought back online later again.
      
      While there may be other places where this is done, I think adding
      more of those should be avoided so that some day SMP can really
      become "symmetrical".
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2fb270f3
  7. 05 2月, 2011 1 次提交
  8. 28 1月, 2011 8 次提交
    • T
      x86: Unify node_to_cpumask_map handling between 32 and 64bit · de2d9445
      Tejun Heo 提交于
      x86_32 has been managing node_to_cpumask_map explicitly from
      map_cpu_to_node() and friends in a rather ugly way.  With
      previous changes, it's now possible to share the code with
      64bit.
      
      * When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are
        implemented in numa.c and shared by 32 and 64bit.  CONFIG_NUMA_EMU
        versions still live in numa_64.c.
      
        NUMA_EMU's dependency on 64bit is planned to be removed and the
        above should go away together.
      
      * identify_cpu() now calls numa_add_cpu() for 32bit too.  This
        makes the explicit mask management from map_cpu_to_node() unnecessary.
      
      * The whole x86_32 specific map_cpu_to_node() chunk is no longer
        necessary.  Dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      de2d9445
    • T
      x86: Unify CPU -> NUMA node mapping between 32 and 64bit · 645a7919
      Tejun Heo 提交于
      Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
      CPU -> NUMA node mapping.  Replace it with early_percpu variable
      x86_cpu_to_node_map and share the mapping code with 64bit.
      
      * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
      
      * x86_cpu_to_node_map and numa_set/clear_node() are moved from
        numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
        with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
        behavior change and will be updated once init path is unified.
      
      * srat_detect_node() is now enabled for x86_32 too.  It calls
        numa_set_node() and initializes the mapping making explicit
        cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      645a7919
    • T
      x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit · bbc9e2f4
      Tejun Heo 提交于
      The mapping between cpu/apicid and node is done via
      apicid_to_node[] on 64bit and apicid_2_node[] +
      apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
      difficult to further unify 32 and 64bit NUMA handling.
      
      This patch unifies it by replacing both apicid_to_node[] and
      apicid_2_node[] with __apicid_to_node[] array, which is accessed
      by two accessors - set_apicid_to_node() and numa_cpu_node().  On
      64bit, numa_cpu_node() always consults __apicid_to_node[]
      directly while 32bit goes through apic->numa_cpu_node() method
      to allow apic implementations to override it.
      
      srat_detect_node() for amd cpus contains workaround for broken
      NUMA configuration which assumes relationship between APIC ID,
      HT node ID and NUMA topology.  Leave it to access
      __apicid_to_node[] directly as mapping through CPU might result
      in undesirable behavior change.  The comment is reformatted and
      updated to note the ugliness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      bbc9e2f4
    • T
      x86: Replace apic->apicid_to_node() with ->x86_32_numa_cpu_node() · 89e5dc21
      Tejun Heo 提交于
      apic->apicid_to_node() is 32bit specific apic operation which
      determines NUMA node for a CPU.  Depending on the APIC
      implementation, it can be easier to determine NUMA node from
      either physical or logical apicid.  Currently,
      ->apicid_to_node() takes @logical_apicid and calls
      hard_smp_processor_id() if the physical apicid is needed.
      
      This prevents NUMA mapping from being queried from a different
      CPU, which in turn makes it impossible to initialize NUMA
      mapping before SMP bringup.
      
      This patch replaces apic->apicid_to_node() with
      ->x86_32_numa_cpu_node() which takes @cpu, from which both
      logical and physical apicids can easily be determined.  While at
      it, drop duplicate implementations from bigsmp_32 and summit_32,
      and use the default one.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-13-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89e5dc21
    • T
      x86: Always use x86_cpu_to_logical_apicid for cpu -> logical apic id · 6f802c4b
      Tejun Heo 提交于
      Currently, cpu -> logical apic id translation is done by
      apic->cpu_to_logical_apicid() callback which may or may not use
      x86_cpu_to_logical_apicid.  This is unnecessary as it should
      always equal logical_smp_processor_id() which is known early
      during CPU bring up.
      
      Initialize x86_cpu_to_logical_apicid after apic->init_apic_ldr()
      in setup_local_APIC() and always use x86_cpu_to_logical_apicid
      for cpu -> logical apic id mapping.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-6-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f802c4b
    • T
      x86: Replace cpu_2_logical_apicid[] with early percpu variable · 4c321ff8
      Tejun Heo 提交于
      Unlike x86_64, on x86_32, the mapping from cpu to logical apicid
      may vary depending on apic in use.  cpu_2_logical_apicid[] array
      is used for this mapping.  Replace it with early percpu variable
      x86_cpu_to_logical_apicid to make it better aligned with other
      mappings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-5-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4c321ff8
    • T
      x86: Drop x86_32 MAX_APICID · b78aa66b
      Tejun Heo 提交于
      Commit 56d91f13 (x86, acpi: Add MAX_LOCAL_APIC for 32bit) added
      MAX_LOCAL_APIC for x86_32 but didn't replace MAX_APICID users
      with it. Convert MAX_APICID users to MAX_LOCAL_APIC and drop
      MAX_APICID.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-3-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b78aa66b
    • T
      x86: Kill unused static boot_cpu_logical_apicid in smpboot.c · bd22a2f1
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-2-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bd22a2f1
  9. 26 1月, 2011 3 次提交
    • Y
      x86: Don't copy per_cpu cpuinfo for BSP two times · 792363d2
      Yinghai Lu 提交于
      smp_store_cpu_info(0) will do that.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      LKML-Reference: <4D3A16F2.5090902@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      792363d2
    • Y
      x86: Move llc_shared_map out of cpu_info · b3d7336d
      Yinghai Lu 提交于
      cpu_info is already with per_cpu, We can take llc_shared_map out
      of cpu_info, and declare it as per_cpu variable directly.
      
      So later referencing could be simple and directly instead of
      diving to find cpu_info at first.
      
      Also could make smp_store_cpu_info() much simple to avoid to do
      save and restore trick.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Alok N Kataria <akataria@vmware.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Hans J. Koch <hjk@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <4D3A16E8.5020608@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3d7336d
    • A
      x86, amd: Normalize compute unit IDs on multi-node processors · d518573d
      Andreas Herrmann 提交于
      On multi-node CPUs we don't need the socket wide compute unit ID
      but the node-wide compute unit ID. Thus we need to normalize the
      value. This is similar to what we do with cpu_core_id.
      
      A compute unit is then identified by physical_package_id,
      node_id, and compute_unit_id.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <1295881543-572552-2-git-send-email-hans.rosenfeld@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d518573d
  10. 22 1月, 2011 1 次提交
    • B
      x86, hotplug: Fix powersavings with offlined cores on AMD · 93789b32
      Borislav Petkov 提交于
      ea530692 made a CPU use monitor/mwait
      when offline. This is not the optimal choice for AMD wrt to powersavings
      and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
      same selection whether to use monitor/mwait has to be used as when we
      select the idle routine for the machine.
      
      With this patch, offlining cores 1-5 on a X6 machine allows core0 to
      boost again.
      
      [ hpa: putting this in urgent since it is a (power) regression fix ]
      Reported-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: stable@kernel.org # 37.x
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      93789b32
  11. 09 1月, 2011 1 次提交
  12. 30 12月, 2010 2 次提交
  13. 14 12月, 2010 1 次提交
  14. 26 11月, 2010 1 次提交
  15. 18 11月, 2010 1 次提交
    • D
      x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog · 072b198a
      Don Zickus 提交于
      Now that the bulk of the old nmi_watchdog is gone, remove all
      the stub variables and hooks associated with it.
      
      This touches lots of files mainly because of how the io_apic
      nmi_watchdog was implemented.  Now that the io_apic nmi_watchdog
      is forever gone, remove all its fingers.
      
      Most of this code was not being exercised by virtue of
      nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
      risky here.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      072b198a
  16. 27 10月, 2010 2 次提交
  17. 21 10月, 2010 1 次提交
  18. 12 10月, 2010 1 次提交
  19. 02 10月, 2010 1 次提交
  20. 21 9月, 2010 1 次提交
  21. 18 9月, 2010 2 次提交
    • H
      x86, hotplug: Move WBINVD back outside the play_dead loop · a68e5c94
      H. Peter Anvin 提交于
      On processors with hyperthreading, when only one thread is offlined
      the other thread can cause a spurious wakeup on the idled thread.  We
      do not want to re-WBINVD when that happens.
      
      Ideally, we should simply skip WBINVD unless we're the last thread on
      a particular core to shut down, but there might be similar issues
      elsewhere in the system.
      
      Thus, revert to previous behavior of only WBINVD outside the loop.
      Partly as a result, remove the mb()'s around it: they are not
      necessary since wbinvd() is a serializing instruction, but they were
      intended to make sure the compiler didn't do any funny loop
      optimizations.
      Reported-by: NAsit Mallick <asit.k.mallick@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.kernel.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      LKML-Reference: <tip-ea530692@git.kernel.org>
      a68e5c94
    • H
      x86, hotplug: Use mwait to offline a processor, fix the legacy case · ea530692
      H. Peter Anvin 提交于
      The code in native_play_dead() has a number of problems:
      
      1. We should use MWAIT when available, to put ourselves into a deeper
         sleep state.
      2. We use the existence of CLFLUSH to determine if WBINVD is safe, but
         that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much
         later addition.
      3. We should do WBINVD inside the loop, just in case of something like
         setting an A bit on page tables.  Pointed out by Arjan van de Ven.
      
      This code is based in part of a previous patch by Venki Pallipadi, but
      unlike that patch this one keeps all the detection code local instead
      of pre-caching a bunch of information.  We're shutting down the CPU;
      there is absolutely no hurry.
      
      This patch moves all the code to C and deletes the global
      wbinvd_halt() which is broken anyway.
      Originally-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      LKML-Reference: <20090522232230.162239000@intel.com>
      ea530692
  22. 16 9月, 2010 1 次提交
  23. 24 8月, 2010 1 次提交
    • A
      x86, vmware: Remove deprecated VMI kernel support · 9863c90f
      Alok Kataria 提交于
      With the recent innovations in CPU hardware acceleration technologies
      from Intel and AMD, VMware ran a few experiments to compare these
      techniques to guest paravirtualization technique on VMware's platform.
      These hardware assisted virtualization techniques have outperformed the
      performance benefits provided by VMI in most of the workloads. VMware
      expects that these hardware features will be ubiquitous in a couple of
      years, as a result, VMware has started a phased retirement of this
      feature from the hypervisor.
      
      Please note that VMI has always been an optimization and non-VMI kernels
      still work fine on VMware's platform.
      Latest versions of VMware's product which support VMI are,
      Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
      releases for these products will continue supporting VMI.
      
      For more details about VMI retirement take a look at this,
      http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
      
      This feature removal was scheduled for 2.6.37 back in September 2009.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9863c90f
  24. 20 8月, 2010 1 次提交
    • B
      x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues · d7c53c9e
      Borislav Petkov 提交于
      When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
      Stuck ??" message due to multiple cores concurrently accessing the
      cpu_callin_mask, among others.
      
      Since these codepaths are not protected from concurrent access due to
      the fact that there's no sane reason for making an already complex
      code unnecessarily more complex - we hit the issue only when insanely
      switching cores off- and online - serialize hotplugging cores on the
      sysfs level and be done with it.
      
      [ v2.1: fix !HOTPLUG_CPU build ]
      
      Cc: <stable@kernel.org>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100819181029.GC17171@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d7c53c9e
  25. 19 8月, 2010 1 次提交
    • J
      x86-32: Separate 1:1 pagetables from swapper_pg_dir · fd89a137
      Joerg Roedel 提交于
      This patch fixes machine crashes which occur when heavily exercising the
      CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
      AMD Erratum 383 and result in a fatal machine check exception. Here's
      the scenario:
      
      1. On 32-bit, the swapper_pg_dir page table is used as the initial page
      table for booting a secondary CPU.
      
      2. To make this work, swapper_pg_dir needs a direct mapping of physical
      memory in it (the low mappings). By adding those low, large page (2M)
      mappings (PAE kernel), we create the necessary conditions for Erratum
      383 to occur.
      
      3. Other CPUs which do not participate in the off- and onlining game may
      use swapper_pg_dir while the low mappings are present (when leave_mm is
      called). For all steps below, the CPU referred to is a CPU that is using
      swapper_pg_dir, and not the CPU which is being onlined.
      
      4. The presence of the low mappings in swapper_pg_dir can result
      in TLB entries for addresses below __PAGE_OFFSET to be established
      speculatively. These TLB entries are marked global and large.
      
      5. When the CPU with such TLB entry switches to another page table, this
      TLB entry remains because it is global.
      
      6. The process then generates an access to an address covered by the
      above TLB entry but there is a permission mismatch - the TLB entry
      covers a large global page not accessible to userspace.
      
      7. Due to this permission mismatch a new 4kb, user TLB entry gets
      established. Further, Erratum 383 provides for a small window of time
      where both TLB entries are present. This results in an uncorrectable
      machine check exception signalling a TLB multimatch which panics the
      machine.
      
      There are two ways to fix this issue:
      
              1. Always do a global TLB flush when a new cr3 is loaded and the
              old page table was swapper_pg_dir. I consider this a hack hard
              to understand and with performance implications
      
              2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
              does.
      
      This patch implements solution 2. It introduces a trampoline_pg_dir
      which has the same layout as swapper_pg_dir with low_mappings. This page
      table is used as the initial page table of the booting CPU. Later in the
      bringup process, it switches to swapper_pg_dir and does a global TLB
      flush. This fixes the crashes in our test cases.
      
      -v2: switch to swapper_pg_dir right after entering start_secondary() so
      that we are able to access percpu data which might not be mapped in the
      trampoline page table.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <20100816123833.GB28147@aftab>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      fd89a137
  26. 10 8月, 2010 1 次提交
  27. 31 7月, 2010 1 次提交
    • S
      x86, mtrr: Use stop machine context to rendezvous all the cpu's · 68f202e4
      Suresh Siddha 提交于
      Use the stop machine context rather than IPI's to rendezvous all the cpus for
      MTRR initialization that happens during cpu bringup or for MTRR modifications
      during runtime.
      
      This avoids deadlock scenario (reported by Prarit) like:
      
      cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
      cpu B waits for the same lock with irqs disabled using write_lock_irq
      cpu C doing set_mtrr() (during AP bringup for example), which will try to
      rendezvous all the cpus using IPI's
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the lock and thus not
      reaching the rendezvous point.
      
      Using stop cpu (run in the process context of per cpu based keventd) to do
      this rendezvous, avoids this deadlock scenario.
      
      Also make sure all the cpu's are in the rendezvous handler before we proceed
      with the local_irq_save() on each cpu. This lock step disabling irqs on all
      the cpus will avoid other deadlock scenarios (for example involving
      with the blocking smp_call_function's etc).
      
         [ This problem is very old. Marking -stable only for 2.6.35 as the
           stop_one_cpu_nowait() API is present only in 2.6.35. Any older
           kernel interested in this fix need to do some more work in backporting
           this patch. ]
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: stable@kernel.org	[2.6.35]
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      68f202e4
  28. 29 6月, 2010 1 次提交
    • T
      workqueue: increase max_active of keventd and kill current_is_keventd() · b71ab8c2
      Tejun Heo 提交于
      Define WQ_MAX_ACTIVE and create keventd with max_active set to half of
      it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1
      works concurrently.  Unless some combination can result in dependency
      loop longer than max_active, deadlock won't happen and thus it's
      unnecessary to check whether current_is_keventd() before trying to
      schedule a work.  Kill current_is_keventd().
      
      (Lockdep annotations are broken.  We need lock_map_acquire_read_norecurse())
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      b71ab8c2