1. 10 10月, 2017 1 次提交
    • T
      powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology() · 6b2c08f9
      Thiago Jung Bauermann 提交于
      It turns out that not all paths calling arch_update_cpu_topology() hold
      cpu_hotplug_lock, but that's OK because those paths can't race with
      any concurrent hotplug events.
      
      Warnings were reported with the following trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        sched_init_domains
        sched_init_smp
        kernel_init_freeable
        kernel_init
        ret_from_kernel_thread
      
      Which is safe because it's called early in boot when hotplug is not
      live yet.
      
      And also this trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        partition_sched_domains
        cpuset_update_active_cpus
        sched_cpu_deactivate
        cpuhp_invoke_callback
        cpuhp_down_callbacks
        cpuhp_thread_fun
        smpboot_thread_fn
        kthread
        ret_from_kernel_thread
      
      Which is safe because it's called as part of CPU hotplug, so although
      we don't hold the CPU hotplug lock, there is another thread driving
      the CPU hotplug operation which does hold the lock, and there is no
      race.
      
      Thanks to tglx for deciphering it for us.
      
      Fixes: 3e401f7a ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd")
      Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b2c08f9
  2. 05 10月, 2017 1 次提交
    • K
      timer: Remove init_timer_deferrable() in favor of timer_setup() · df7e828c
      Kees Cook 提交于
      This refactors the only users of init_timer_deferrable() to use
      the new timer_setup() and from_timer(). Removes definition of
      init_timer_deferrable().
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: David S. Miller <davem@davemloft.net> # for networking parts
      Acked-by: Sebastian Reichel <sre@kernel.org> # for drivers/hsi parts
      Cc: linux-mips@linux-mips.org
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: linux-s390@vger.kernel.org
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-pm@vger.kernel.org
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linux-watchdog@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-wireless@vger.kernel.org
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Michael Reed <mdr@sgi.com>
      Cc: netdev@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lkml.kernel.org/r/1507159627-127660-6-git-send-email-keescook@chromium.org
      df7e828c
  3. 23 6月, 2017 1 次提交
  4. 11 4月, 2017 1 次提交
    • A
      powerpc/mm: Remove reduntant initmem information from log · ea614555
      Anshuman Khandual 提交于
      Generic core VM already prints these information in the log
      buffer, hence there is no need for a second print. This just
      removes the second print from arch powerpc NUMA init path.
      
      Before the patch:
      
        $ dmesg | grep "Initmem"
      
        numa: Initmem setup node 0 [mem 0x00000000-0xffffffff]
        numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff]
        numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff]
        numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff]
        numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff]
        numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff]
        numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff]
        numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff]
        Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff]
        Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff]
        Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff]
        Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff]
        Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff]
        Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff]
        Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff]
        Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff]
      
      After the patch just the latter set is printed.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea614555
  5. 10 2月, 2017 1 次提交
  6. 30 1月, 2017 2 次提交
    • R
      powerpc/mm: Allow memory hotplug into an offline node · 2a8628d4
      Reza Arbab 提交于
      Relax the check preventing us from hotplugging into an offline node.
      
      This limitation was added in commit 482ec7c4 ("[PATCH] powerpc numa:
      Support sparse online node map") to prevent adding resources to an
      uninitialized node.
      
      These days, there is no harm in doing so. The addition will actually
      cause the node to be initialized and onlined; add_memory_resource()
      calls hotadd_new_pgdat() (if necessary) and node_set_online().
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a8628d4
    • R
      powerpc/mm: Simplify loop control in parse_numa_properties() · 7656cd8e
      Reza Arbab 提交于
      The flow of the main loop in parse_numa_properties() is overly
      complicated. Simplify it to be less confusing and easier to read.
      No functional change.
      
      The end of the main loop in parse_numa_properties() looks like this:
      
      	for_each_node_by_type(...) {
      		...
      		if (!condition) {
      			if (--ranges)
      				goto new_range;
      			else
      				continue;
      		}
      
      		statement();
      
      		if (--ranges)
      			goto new_range;
      		/* else
      		 *	continue; <- implicit, this is the end of the loop
      		 */
      	}
      
      The only effect of !condition is to skip execution of statement(). This
      can be rewritten in a simpler way:
      
      	for_each_node_by_type(...) {
      		...
      		if (condition)
      			statement();
      
      		if (--ranges)
      			goto new_range;
      	}
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7656cd8e
  7. 25 12月, 2016 1 次提交
  8. 13 12月, 2016 1 次提交
    • R
      powerpc/mm: allow memory hotplug into a memoryless node · 4a3bac4e
      Reza Arbab 提交于
      Patch series "enable movable nodes on non-x86 configs", v7.
      
      This patchset allows more configs to make use of movable nodes.  When
      CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such
      nodes into the system:
      
      1. Discover movable nodes at boot. Currently this is only possible on
         x86, but we will enable configs supporting fdt to do the same.
      
      2. Hotplug and online all of a node's memory using online_movable. This
         is already possible on any config supporting memory hotplug, not
         just x86, but the Kconfig doesn't say so. We will fix that.
      
      We'll also remove some cruft on power which would prevent (2).
      
      This patch (of 5):
      
      Remove the check which prevents us from hotplugging into an empty node.
      
      The original commit b226e462 ("[PATCH] powerpc: don't add memory to
      empty node/zone"), states that this was intended to be a temporary measure.
      It is a workaround for an oops which no longer occurs.
      
      Link: http://lkml.kernel.org/r/1479160961-25840-2-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alistair Popple <apopple@au1.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a3bac4e
  9. 19 10月, 2016 2 次提交
    • A
      powerpc: Fix numa topology console print · 8467801c
      Aneesh Kumar K.V 提交于
      With recent update to printk, we get console output like below:
      
      [    0.550639] Brought up 160 CPUs
      [    0.550718] Node 0 CPUs:
      [    0.550721]  0
      [    0.550754] -39
      
      [    0.550794] Node 1 CPUs:
      [    0.550798]  40
      [    0.550817] -79
      
      [    0.550856] Node 16 CPUs:
      [    0.550860]  80
      [    0.550880] -119
      
      [    0.550917] Node 17 CPUs:
      [    0.550923]  120
      [    0.550942] -159
      
      Fix this by properly using pr_cont(), ie. KERN_CONT.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8467801c
    • M
      powerpc/mm: Drop dump_numa_memory_topology() · 08b5e79e
      Michael Ellerman 提交于
      At boot we dump the NUMA memory topology in dump_numa_memory_topology(),
      at KERN_DEBUG level, resulting in output like:
      
        Node 0 Memory: 0x0-0x100000000
        Node 1 Memory: 0x100000000-0x200000000
      
      Which is nice enough, but immediately after that we iterate over each
      node and call setup_node_data(), which also prints out the node ranges,
      at KERN_INFO, giving eg:
      
        numa: Initmem setup node 0 [mem 0x00000000-0xffffffff]
        numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff]
      
      Additionally dump_numa_memory_topology() does not use KERN_CONT
      correctly, resulting in split output lines on recent kernels.
      
      So drop dump_numa_memory_topology() as superfluous chatter.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      08b5e79e
  10. 23 7月, 2016 1 次提交
    • S
      powerpc/numa: Convert to hotplug state machine · bdab88e0
      Sebastian Andrzej Siewior 提交于
      Install the callbacks via the state machine. On the boot cpu the callback is
      invoked manually because cpuhp is not up yet and everything must be
      preinitialized before additional CPUs are up.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160718140727.GA13132@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bdab88e0
  11. 14 6月, 2016 2 次提交
    • B
      powerpc/numa: Fix multiple bugs in memory_hotplug_max() · 45b64ee6
      Bharata B Rao 提交于
      memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
      addressable memory by referring to ibm,dyanamic-memory property. There
      are three problems with the current approach:
      
      1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
        all the LMBs of the guest, but that is not true for PowerKVM which
        populates only DR LMBs (LMBs that can be hotplugged/removed) in that
        property.
      2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
        at the max possible address. Since ibm,dynamic-memory doesn't include
        RMA LMBs, the address thus obtained will be less than the actual max
        address. For example, if max possible memory size is 32G, with lmb-size
        of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
        which won't be present here).  hot_add_drconf_memory_max() would then
        return the max addressable memory as 127 * 256MB = 31.75GB, the max
        address should have been 32G which is what ibm,lrdr-capacity shows.
      3 In PowerKVM, there can be a gap between the end of boot time RAM and
        beginning of hotplug RAM area. So just multiplying lmb-count with
        lmb-size will not provide the correct max possible address for PowerKVM.
      
      This patch fixes 1 by using ibm,lrdr-capacity property to return the max
      addressable memory whenever the property is present. Then it fixes 2 & 3
      by fetching the address of the last LMB in ibm,dynamic-memory property.
      
      Fixes: cd34206e ("powerpc: Add memory_hotplug_max()")
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      45b64ee6
    • B
  12. 06 11月, 2015 1 次提交
    • R
      arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodes · c118baf8
      Raghavendra K T 提交于
      With the setup_nr_nodes(), we have already initialized
      node_possible_map.  So it is safe to use for_each_node here.
      
      There are many places in the kernel that use hardcoded 'for' loop with
      nr_node_ids, because all other architectures have numa nodes populated
      serially.  That should be reason we had maintained the same for
      powerpc.
      
      But, since sparse numa node ids possible on powerpc, we unnecessarily
      allocate memory for non existent numa nodes.
      
      For e.g., on a system with 0,1,16,17 as numa nodes nr_node_ids=18 and
      we allocate memory for nodes 2-14.  This patch we allocate memory for
      only existing numa nodes.
      
      The patch is boot tested on a 4 node tuleta, confirming with printks
      that it works as expected.
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c118baf8
  13. 15 10月, 2015 1 次提交
  14. 18 8月, 2015 1 次提交
    • N
      powerpc/numa: initialize distance lookup table from drconf path · 1d805440
      Nikunj A Dadhania 提交于
      In some situations, a NUMA guest that supports
      ibm,dynamic-memory-reconfiguration node will end up having flat NUMA
      distances between nodes. This is because of two problems in the
      current code.
      
      1) Different representations of associativity lists.
      
         There is an assumption about the associativity list in
         initialize_distance_lookup_table(). Associativity list has two forms:
      
         a) [cpu,memory]@x/ibm,associativity has following
            format:
                 <N> <N integers>
      
         b) ibm,dynamic-reconfiguration-memory/ibm,associativity-lookup-arrays
      
                 <M> <N> <M associativity lists each having N integers>
                 M = the number of associativity lists
                 N = the number of entries per associativity list
      
         Fix initialize_distance_lookup_table() so that it does not assume
         "case a". And update the caller to skip the length field before
         sending the associativity list.
      
      2) Distance table not getting updated from drconf path.
      
         Node distance table will not get initialized in certain cases as
         ibm,dynamic-reconfiguration-memory path does not initialize the
         lookup table.
      
         Call initialize_distance_lookup_table() from drconf path with
         appropriate associativity list.
      Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      Acked-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d805440
  15. 23 3月, 2015 1 次提交
    • N
      powerpc/numa: Reset node_possible_map to only node_online_map · 3af229f2
      Nishanth Aravamudan 提交于
      Raghu noticed an issue with excessive memory allocation on power with a
      simple cgroup test, specifically, in mem_cgroup_css_alloc ->
      for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing
      up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup
      directories).
      
      The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes
      possible), which defines node_possible_map, which in turn defines the
      value of nr_node_ids in setup_nr_node_ids and the iteration of
      for_each_node.
      
      In practice, we never see a system with 256 NUMA nodes, and in fact, we
      do not support node hotplug on power in the first place, so the nodes
      that are online when we come up are the nodes that will be present for
      the lifetime of this kernel. So let's, at least, drop the NUMA possible
      map down to the online map at runtime. This is similar to what x86 does
      in its initialization routines.
      
      mem_cgroup_css_alloc should also be fixed to only iterate over
      memory-populated nodes and handle hotplug, but that is a separate
      change.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3af229f2
  16. 18 3月, 2015 3 次提交
  17. 25 11月, 2014 1 次提交
    • G
      of/reconfig: Always use the same structure for notifiers · f5242e5a
      Grant Likely 提交于
      The OF_RECONFIG notifier callback uses a different structure depending
      on whether it is a node change or a property change. This is silly, and
      not very safe. Rework the code to use the same data structure regardless
      of the type of notifier.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
      Cc: <linuxppc-dev@lists.ozlabs.org>
      f5242e5a
  18. 10 11月, 2014 2 次提交
  19. 29 10月, 2014 2 次提交
  20. 16 10月, 2014 1 次提交
    • G
      powerpc/vphn: NUMA node code expects big-endian · 5c9fb189
      Greg Kurz 提交于
      The associativity domain numbers are obtained from the hypervisor through
      registers and written into memory by the guest: the packed array passed to
      vphn_unpack_associativity() is then native-endian, unlike what was assumed
      in the following commit:
      
      commit b08a2a12
      Author: Alistair Popple <alistair@popple.id.au>
      Date:   Wed Aug 7 02:01:44 2013 +1000
      
          powerpc: Make NUMA device node code endian safe
      
      This issue fills the topology with bogus data and makes it unusable. It may
      lead to severe performance breakdowns.
      
      We should ideally patch the vphn_unpack_associativity() function to do the
      64-bit loads, but this requires some more brain storming.
      
      In the meantime, let's go for a suboptimal and temporary bug fix: this patch
      converts each 64-bit value of the packed array to big endian, as expected by
      the current parsing code in vphn_unpack_associativity().
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c9fb189
  21. 13 10月, 2014 2 次提交
  22. 25 9月, 2014 3 次提交
  23. 20 9月, 2014 1 次提交
    • S
      powerpc/mm: Use common paging_init() for NUMA · 6db35ad2
      Scott Wood 提交于
      Commit 1c98025c "powerpc: Dynamic DMA
      zone limits" updated how zones are created in paging_init(), but missed
      the NUMA version of paging_init().  This was noticed via a linker
      error, since dma_pfn_limit_to_zone() was, like the non-NUMA
      paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES.
      
      It turns out that the NUMA paging_init() was not actually doing
      anything different from the standard paging_init(), other than a couple
      debug prints, a couple 32-bit-only ifdef sections, and a call to
      mark_nonram_nosave().  It's not clear whether mark_nonram_nosave() is
      inherently wrong to do for NUMA, or just not useful on targets that
      have NUMA, but for now I'm preserving the existing behavior.
      
      Fixes: 1c98025c "powerpc: Dynamic DMA zone limits"
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      6db35ad2
  24. 13 8月, 2014 1 次提交
    • N
      powerpc: reorder per-cpu NUMA information's initialization · 2fabf084
      Nishanth Aravamudan 提交于
      There is an issue currently where NUMA information is used on powerpc
      (and possibly ia64) before it has been read from the device-tree, which
      leads to large slab consumption with CONFIG_SLUB and memoryless nodes.
      
      NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
      after start_secondary(), similar to ia64, which is invoked via
      smp_init().
      
      Commit 6ee0578b ("workqueue: mark init_workqueues() as
      early_initcall()") made init_workqueues() be invoked via
      do_pre_smp_initcalls(), which is obviously before the secondary
      processors are online.
      
      Additionally, the following commits changed init_workqueues() to use
      cpu_to_node to determine the node to use for kthread_create_on_node:
      
      bce90380 ("workqueue: add wq_numa_tbl_len and
      wq_numa_possible_cpumask[]")
      f3f90ad4 ("workqueue: determine NUMA node of workers accourding to
      the allowed cpumask")
      
      Therefore, when init_workqueues() runs, it sees all CPUs as being on
      Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
      a high number of slab deactivations
      (http://www.spinics.net/lists/linux-mm/msg67489.html).
      
      Fix this by initializing the powerpc-specific CPU<->node/local memory
      node mapping as early as possible, which on powerpc is
      do_init_bootmem(). Currently that function initializes the mapping for
      the boot CPU, but we extend it to setup the mapping for all possible
      CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
      per-cpu values for all possible CPUs. That ensures that before the
      early_initcalls run (and really as early as possible), the per-cpu NUMA
      mapping is accurate.
      
      While testing memoryless nodes on PowerKVM guests with a fix to the
      workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
      guest topology of:
      
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
      node 0 size: 0 MB
      node 0 free: 0 MB
      node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
      node 1 size: 16336 MB
      node 1 free: 15329 MB
      node distances:
      node   0   1
        0:  10  40
        1:  40  10
      
      the slab consumption decreases from
      
      Slab:             932416 kB
      SUnreclaim:       902336 kB
      
      to
      
      Slab:             395264 kB
      SUnreclaim:       359424 kB
      
      And we a corresponding increase in the slab efficiency from
      
      slab                                   mem     objs    slabs
                                            used   active   active
      ------------------------------------------------------------
      kmalloc-16384                       337 MB   11.28%  100.00%
      task_struct                         288 MB    9.93%  100.00%
      
      to
      
      slab                                   mem     objs    slabs
                                            used   active   active
      ------------------------------------------------------------
      kmalloc-16384                        37 MB  100.00%  100.00%
      task_struct                          31 MB  100.00%  100.00%
      
      Powerpc didn't support memoryless nodes until recently (64bb80d8
      "powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261
      "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
      helped improve memory consumption with these kind of environments.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2fabf084
  25. 05 8月, 2014 1 次提交
  26. 19 4月, 2014 1 次提交
    • M
      powerpc/mm: fix ".__node_distance" undefined · 12c743eb
      Mike Qiu 提交于
        CHK     include/config/kernel.release
        CHK     include/generated/uapi/linux/version.h
        CHK     include/generated/utsrelease.h
        ...
        Building modules, stage 2.
      WARNING: 1 bad relocations
      c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
        WRAP    arch/powerpc/boot/zImage.pseries
        WRAP    arch/powerpc/boot/zImage.epapr
        MODPOST 1849 modules
      ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      make: *** Waiting for unfinished jobs....
      
      The reason is symbol "__node_distance" not been exported in powerpc.
      Signed-off-by: NMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12c743eb
  27. 09 4月, 2014 1 次提交
    • M
      power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update · 9a013361
      Michael Wang 提交于
      Since v1:
      	Edited the comment according to Srivatsa's suggestion.
      
      During the testing, we encounter below WARN followed by Oops:
      
      	WARNING: at kernel/sched/core.c:6218
      	...
      	NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
      	LR [c000000000101358] .build_sched_domains+0xec8/0x1200
      	PACATMSCRATCH [800000000000f032]
      	Call Trace:
      	[c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
      	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
      	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
      	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
      	...
      	Oops: Kernel access of bad area, sig: 11 [#1]
      	...
      	NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
      	LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
      	PACATMSCRATCH [8000000000029032]
      	Call Trace:
      	[c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
      	[c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
      	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
      	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
      	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
      	...
      
      This was caused by that 'sd->groups == NULL' after building groups, which
      was caused by the empty 'sd->span'.
      
      The cpu's domain contained nothing because the cpu was assigned to a wrong
      node, due to the following unfortunate sequence of events:
      
      1. The hypervisor sent a topology update to the guest OS, to notify changes
         to the cpu-node mapping. However, the update was actually redundant - i.e.,
         the "new" mapping was exactly the same as the old one.
      
      2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
         the 'for-loop' in arch_update_cpu_topology().
      
      3. So we ended up calling stop-machine() with an empty cpumask list, which made
         stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
         the cpu to run the payload (the update_cpu_topology() function).
      
      4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
         is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
         finds update->cpu as well as update->new_nid to be 0. In other words, we
         end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.
      
      Along with the following wrong updating, it causes the sched-domain rebuild
      code to break and crash the system.
      
      Fix this by skipping the topology update in cases where we find that
      the topology has not actually changed in reality (ie., spurious updates).
      
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      CC: Stephen Rothwell <sfr@canb.auug.org.au>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Robert Jennings <rcj@linux.vnet.ibm.com>
      CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
      CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      CC: Alistair Popple <alistair@popple.id.au>
      Suggested-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9a013361
  28. 29 1月, 2014 1 次提交
  29. 22 1月, 2014 1 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  30. 15 1月, 2014 1 次提交
    • S
      powerpc: Add debug checks to catch invalid cpu-to-node mappings · 68fb18aa
      Srivatsa S. Bhat 提交于
      There have been some weird bugs in the past where the kernel tried to associate
      threads of the same core to different NUMA nodes, and things went haywire after
      that point (as expected).
      
      But unfortunately, root-causing such issues have been quite challenging, due to
      the lack of appropriate debug checks in the kernel. These bugs usually lead to
      some odd soft-lockups in the scheduler's build-sched-domain code in the CPU
      hotplug path, which makes it very hard to trace it back to the incorrect
      cpu-to-node mappings.
      
      So add appropriate debug checks to catch such invalid cpu-to-node mappings
      as early as possible.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      68fb18aa