1. 15 1月, 2014 2 次提交
    • S
      powerpc: Add debug checks to catch invalid cpu-to-node mappings · 68fb18aa
      Srivatsa S. Bhat 提交于
      There have been some weird bugs in the past where the kernel tried to associate
      threads of the same core to different NUMA nodes, and things went haywire after
      that point (as expected).
      
      But unfortunately, root-causing such issues have been quite challenging, due to
      the lack of appropriate debug checks in the kernel. These bugs usually lead to
      some odd soft-lockups in the scheduler's build-sched-domain code in the CPU
      hotplug path, which makes it very hard to trace it back to the incorrect
      cpu-to-node mappings.
      
      So add appropriate debug checks to catch such invalid cpu-to-node mappings
      as early as possible.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      68fb18aa
    • S
      powerpc: Fix the setup of CPU-to-Node mappings during CPU online · d4edc5b6
      Srivatsa S. Bhat 提交于
      On POWER platforms, the hypervisor can notify the guest kernel about dynamic
      changes in the cpu-numa associativity (VPHN topology update). Hence the
      cpu-to-node mappings that we got from the firmware during boot, may no longer
      be valid after such updates. This is handled using the arch_update_cpu_topology()
      hook in the scheduler, and the sched-domains are rebuilt according to the new
      mappings.
      
      But unfortunately, at the moment, CPU hotplug ignores these updated mappings
      and instead queries the firmware for the cpu-to-numa relationships and uses
      them during CPU online. So the kernel can end up assigning wrong NUMA nodes
      to CPUs during subsequent CPU hotplug online operations (after booting).
      
      Further, a particularly problematic scenario can result from this bug:
      On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
      threads per core. The switch to Single-Threaded (ST) mode is performed by
      offlining all except the first CPU thread in each core. Switching back to
      SMT mode involves onlining those other threads back, in each core.
      
      Now consider this scenario:
      
      1. During boot, the kernel gets the cpu-to-node mappings from the firmware
         and assigns the CPUs to NUMA nodes appropriately, during CPU online.
      
      2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
         communicates this update to the kernel. The kernel in turn updates its
         cpu-to-node associations and rebuilds its sched domains. Everything is
         fine so far.
      
      3. Now, the user switches the machine from SMT to ST mode (say, by running
         ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
         core.
      
      4. The user then tries to switch back from ST to SMT mode (say, by running
         ppc64_cpu --smt=4), and this involves onlining those threads back. Since
         CPU hotplug ignores the new mappings, it queries the firmware and tries to
         associate the newly onlined sibling threads to the old NUMA nodes. This
         results in sibling threads within the same core getting associated with
         different NUMA nodes, which is incorrect.
      
         The scheduler's build-sched-domains code gets thoroughly confused with this
         and enters an infinite loop and causes soft-lockups, as explained in detail
         in commit 3be7db6a (powerpc: VPHN topology change updates all siblings).
      
      So to fix this, use the numa_cpu_lookup_table to remember the updated
      cpu-to-node mappings, and use them during CPU hotplug online operations.
      Further, we also need to ensure that all threads in a core are assigned to a
      common NUMA node, irrespective of whether all those threads were online during
      the topology update. To achieve this, we take care not to use cpu_sibling_mask()
      since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
      and set up the mappings manually using the 'threads_per_core' value for that
      particular platform. This helps us ensure that we don't hit this bug with any
      combination of CPU hotplug and SMT mode switching.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d4edc5b6
  2. 13 11月, 2013 1 次提交
  3. 30 10月, 2013 1 次提交
  4. 14 8月, 2013 2 次提交
  5. 01 8月, 2013 1 次提交
    • R
      powerpc: VPHN topology change updates all siblings · 3be7db6a
      Robert Jennings 提交于
      When an associativity level change is found for one thread, the
      siblings threads need to be updated as well.  This is done today
      for PRRN in stage_topology_update() but is missing for VPHN in
      update_cpu_associativity_changes_mask().  This patch will correctly
      update all thread siblings during a topology change.
      
      Without this patch a topology update can result in a CPU in
      init_sched_groups_power() getting stuck indefinitely in a loop.
      
      This loop is built in build_sched_groups(). As a result of the thread
      moving to a node separate from its siblings the struct sched_group will
      have its next pointer set to point to itself rather than the sched_group
      struct of the next thread.  This happens because we have a domain without
      the SD_OVERLAP flag, which is correct, and a topology that doesn't conform
      with reality (threads on the same core assigned to different numa nodes).
      When this list is traversed by init_sched_groups_power() it will reach
      the thread's sched_group structure and loop indefinitely; the cpu will
      be stuck at this point.
      
      The bug was exposed when VPHN was enabled in commit b7abef04 (v3.9).
      
      Cc: <stable@vger.kernel.org> [v3.9+]
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3be7db6a
  6. 01 7月, 2013 2 次提交
  7. 30 4月, 2013 3 次提交
    • N
      powerpc/pseries: Correct builds break when CONFIG_SMP not defined · 601abdc3
      Nathan Fontenot 提交于
      Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined.
      
      The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP
      is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP
      system doesn't really make sense.
      
      This patch ifdef's out the code making the affinity updates for PRRN events to
      fix the following build break.
      
      arch/powerpc/mm/numa.c: In function ‘stage_topology_update’:
      arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’
      arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast
      make[1]: *** [arch/powerpc/mm/numa.o] Error 1
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      601abdc3
    • S
      powerpc: Fix build failure after merge of the cgroup tree · 9f3a90e8
      Stephen Rothwell 提交于
      After merging the cgroup tree, today's linux-next build (powerpc
      ppc64_defconfig) failed like this:
      
      arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
      arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
      arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
      arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
      
      Caused by commit 30c05350 ("powerpc/pseries: Use stop machine to
      update cpu maps") from the powerpc tree interacting with (probably)
      commit ff794dea ("cpuset: remove include of cgroup.h from cpuset.h")
      from the cgroup tree.  Removing includes from header files is fraught
      with danger ...
      
      The former should have added an include of linux/slab.h to
      arch/powerpc/mm/numa.c.
      
      I have added the following merge fix patch for today (but it should be
      applied to the powerpc tree ASAP).
      
      From: Stephen Rothwell <sfr@canb.auug.org.au>
      Date: Mon, 29 Apr 2013 14:01:44 +1000
      Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including
       slab.h
      
      fixes these build errors:
      
      arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
      arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
      arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
      arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9f3a90e8
    • C
      powerpc/mm/numa: use setup_nr_node_ids() instead of opencoding. · f9d531b8
      Cody P Schafer 提交于
      [sfr@canb.auug.org.au: add missing semicolon]
      Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9d531b8
  8. 26 4月, 2013 6 次提交
  9. 18 4月, 2013 2 次提交
    • S
      powerpc: fix annotation of fake_numa_create_new_node() · 55671f3c
      Stephen Rothwell 提交于
      This function has always been marked as __cpuinit, but is only called
      from functions marked as __init and references an __initdata variable.
      So change its annotation to __init.
      
      Fixes this build warning:
      
      WARNING: arch/powerpc/mm/built-in.o(.cpuinit.text+0x86): Section mismatch in reference from the function .fake_numa_create_new_node() to the variable .init.data:cmdline
      The function __cpuinit .fake_numa_create_new_node() references
      a variable __initdata cmdline.
      If cmdline is only used by .fake_numa_create_new_node then
      annotate cmdline with a matching annotation.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      55671f3c
    • V
      powerpc: fix numa distance for form0 device tree · 7122beee
      Vaidyanathan Srinivasan 提交于
      The following commit breaks numa distance setup for old powerpc
      systems that use form0 encoding in device tree.
      
      commit 41eab6f8
      powerpc/numa: Use form 1 affinity to setup node distance
      
      Device tree node /rtas/ibm,associativity-reference-points would
      index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
      form1 encoding detected by ibm,architecture-vec-5 property.
      
      All modern systems use form1 and current kernel code is correct.
      However, on older systems with form0 encoding, the numa distance
      will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
      task scheduling anomaly since scheduler will skip building numa
      level domain (topmost domain with all cpus) if all numa distances
      are same.  (value of 'level' in sched_init_numa() will remain 0)
      
      Prior to the above commit:
      ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
      
      Restoring compatible behavior with this patch for old powerpc systems
      with device tree where numa distance are encoded as form0.
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      7122beee
  10. 15 11月, 2012 1 次提交
  11. 05 9月, 2012 1 次提交
  12. 10 7月, 2012 1 次提交
  13. 03 7月, 2012 1 次提交
  14. 29 6月, 2012 1 次提交
  15. 29 3月, 2012 1 次提交
  16. 13 1月, 2012 1 次提交
  17. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  18. 19 12月, 2011 2 次提交
  19. 09 12月, 2011 1 次提交
    • T
      powerpc: Use HAVE_MEMBLOCK_NODE_MAP · 1d7cfe18
      Tejun Heo 提交于
      powerpc doesn't access early_node_map[] directly and enabling
      HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
      with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
      enough.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      1d7cfe18
  20. 02 12月, 2011 1 次提交
  21. 08 11月, 2011 1 次提交
  22. 01 11月, 2011 1 次提交
  23. 20 9月, 2011 3 次提交
  24. 15 7月, 2011 1 次提交
  25. 04 5月, 2011 1 次提交
  26. 27 4月, 2011 1 次提交