1. 16 7月, 2020 3 次提交
  2. 04 3月, 2020 4 次提交
  3. 04 2月, 2020 1 次提交
  4. 04 7月, 2019 6 次提交
    • A
      powerpc/mm: Consolidate numa_enable check and min_common_depth check · 495c2ff4
      Aneesh Kumar K.V 提交于
      If we fail to parse min_common_depth from device tree we boot with
      numa disabled. Reflect the same by updating numa_enabled variable
      to false. Also, switch all min_common_depth failure check to
      if (!numa_enabled) check.
      
      This helps us to avoid checking for both in different code paths.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      495c2ff4
    • A
      powerpc/mm: Fix node look up with numa=off boot · f52741c4
      Aneesh Kumar K.V 提交于
      If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when
      looking up associativity details of resources. Without this, we hit crash
      like below
      
      BUG: Unable to handle kernel data access at 0x40000000008
      Faulting instruction address: 0xc000000008f31704
      cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320]
          pc: c000000008f31704: _raw_spin_lock+0x14/0x100
          lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290
          sp: c00000000b9bb5b0
         msr: 800000010280b033
         dar: 40000000008
        current = 0xc00000000b9a2700
        paca    = 0xc00000000a740c00   irqmask: 0x03   irq_happened: 0x01
          pid   = 1, comm = swapper/27
      Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019
      enter ? for help
      [link register   ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290
      [c00000000b9bb5b0] 0000000000000dc0 (unreliable)
      [c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360
      [c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0
      [c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130
      [c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170
      [c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180
      [c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260
      [c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500
      [c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0
      [c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100
      [c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0
      [c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130
      [c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50
      [c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300
      [c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0
      [c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100
      [c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48
      [c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0
      [c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c
      [c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160
      [c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68
      Reported-and-debugged-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f52741c4
    • A
      powerpc/mm/drconf: Use NUMA_NO_NODE on failures instead of node 0 · ea9f5b70
      Aneesh Kumar K.V 提交于
      If we fail to parse the associativity array we should default to
      NUMA_NO_NODE instead of NODE 0. Rest of the code fallback to the
      right default if we find the numa node value NUMA_NO_NODE.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea9f5b70
    • N
      powerpc/pseries: Provide vcpu dispatch statistics · d62c8dee
      Naveen N. Rao 提交于
      For Shared Processor LPARs, the POWER Hypervisor maintains a
      relatively static mapping of the LPAR processors (vcpus) to physical
      processor chips (representing the "home" node) and tries to always
      dispatch vcpus on their associated physical processor chip. However,
      under certain scenarios, vcpus may be dispatched on a different
      processor chip (away from its home node). The actual physical
      processor number on which a certain vcpu is dispatched is available to
      the guest in the 'processor_id' field of each DTL entry.
      
      The guest can discover the home node of each vcpu through the
      H_HOME_NODE_ASSOCIATIVITY(flags=1) hcall. The guest can also discover
      the associativity of physical processors, as represented in the DTL
      entry, through the H_HOME_NODE_ASSOCIATIVITY(flags=2) hcall.
      
      These can then be compared to determine if the vcpu was dispatched on
      its home node or not. If the vcpu was not dispatched on the home node,
      it is possible to determine if the vcpu was dispatched in a different
      chip, socket or drawer.
      
      Introduce a procfs file /proc/powerpc/vcpudispatch_stats that can be
      used to obtain these statistics. Writing '1' to this file enables
      collecting the statistics, while writing '0' disables the statistics.
      The statistics themselves are available by reading the procfs file. By
      default, the DTLB log for each vcpu is processed 50 times a second so
      as not to miss any entries. This processing frequency can be changed
      through /proc/powerpc/vcpudispatch_stats_freq.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d62c8dee
    • N
      powerpc/pseries: Move mm/book3s64/vphn.c under platforms/pseries/ · 5a1ea477
      Naveen N. Rao 提交于
      hcall_vphn() is specific to pseries and will be used in a subsequent
      patch. So, move it to a more appropriate place under
      arch/powerpc/platforms/pseries. Also merge vphn.h into lppaca.h
      and update vphn selftest to use the new files.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a1ea477
    • N
      powerpc/pseries: Generalize hcall_vphn() · ef34e0ef
      Naveen N. Rao 提交于
      H_HOME_NODE_ASSOCIATIVITY hcall can take two different flags and return
      different associativity information in each case. Generalize the
      existing hcall_vphn() function to take flags as an argument and to
      return the result. Update the only existing user to pass the proper
      arguments.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ef34e0ef
  5. 31 5月, 2019 1 次提交
  6. 02 5月, 2019 1 次提交
  7. 20 4月, 2019 3 次提交
  8. 13 3月, 2019 1 次提交
    • M
      memblock: memblock_phys_alloc_try_nid(): don't panic · 33755574
      Mike Rapoport 提交于
      The memblock_phys_alloc_try_nid() function tries to allocate memory from
      the requested node and then falls back to allocation from any node in
      the system.  The memblock_alloc_base() fallback used by this function
      panics if the allocation fails.
      
      Replace the memblock_alloc_base() fallback with the direct call to
      memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
      callers to check the returned value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-7-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33755574
  9. 06 3月, 2019 2 次提交
    • A
      numa: make "nr_node_ids" unsigned int · b9726c26
      Alexey Dobriyan 提交于
      Number of NUMA nodes can't be negative.
      
      This saves a few bytes on x86_64:
      
      	add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238)
      	Function                                     old     new   delta
      	hv_synic_alloc.cold                           88     110     +22
      	prealloc_shrinker                            260     262      +2
      	bootstrap                                    249     251      +2
      	sched_init_numa                             1566    1567      +1
      	show_slab_objects                            778     777      -1
      	s_show                                      1201    1200      -1
      	kmem_cache_init                              346     345      -1
      	__alloc_workqueue_key                       1146    1145      -1
      	mem_cgroup_css_alloc                        1614    1612      -2
      	__do_sys_swapon                             4702    4699      -3
      	__list_lru_init                              655     651      -4
      	nic_probe                                   2379    2374      -5
      	store_user_store                             118     111      -7
      	red_zone_store                               106      99      -7
      	poison_store                                 106      99      -7
      	wq_numa_init                                 348     338     -10
      	__kmem_cache_empty                            75      65     -10
      	task_numa_free                               186     173     -13
      	merge_across_nodes_store                     351     336     -15
      	irq_create_affinity_masks                   1261    1246     -15
      	do_numa_crng_init                            343     321     -22
      	task_numa_fault                             4760    4737     -23
      	swapfile_init                                179     156     -23
      	hv_synic_alloc                               536     492     -44
      	apply_wqattrs_prepare                        746     695     -51
      
      Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9726c26
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  10. 30 1月, 2019 1 次提交
  11. 26 11月, 2018 1 次提交
  12. 14 11月, 2018 1 次提交
  13. 31 10月, 2018 2 次提交
    • M
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport 提交于
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • M
      memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc* · 9a8dd708
      Mike Rapoport 提交于
      Make it explicit that the caller gets a physical address rather than a
      virtual one.
      
      This will also allow using meblock_alloc prefix for memblock allocations
      returning virtual address, which is done in the following patches.
      
      The conversion is done using the following semantic patch:
      
      @@
      expression e1, e2, e3;
      @@
      (
      - memblock_alloc(e1, e2)
      + memblock_phys_alloc(e1, e2)
      |
      - memblock_alloc_nid(e1, e2, e3)
      + memblock_phys_alloc_nid(e1, e2, e3)
      |
      - memblock_alloc_try_nid(e1, e2, e3)
      + memblock_phys_alloc_try_nid(e1, e2, e3)
      )
      
      Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a8dd708
  14. 13 10月, 2018 1 次提交
    • M
      powerpc/pseries/mobility: Extend start/stop topology update scope · 65b9fdad
      Michael Bringmann 提交于
      The powerpc mobility code may receive RTAS requests to perform PRRN
      (Platform Resource Reassignment Notification) topology changes at any
      time, including during LPAR migration operations.
      
      In some configurations where the affinity of CPUs or memory is being
      changed on that platform, the PRRN requests may apply or refer to
      outdated information prior to the complete update of the device-tree.
      
      This patch changes the duration for which topology updates are
      suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
      / 'ibm,suspend-me' call(s) to cover the entire migration_store()
      operation to allow all changes to the device-tree to be applied prior
      to accepting and applying any PRRN requests.
      
      For tracking purposes, pr_info notices are added to the functions
      start_topology_update() and stop_topology_update() of 'numa.c'.
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      65b9fdad
  15. 05 10月, 2018 1 次提交
    • S
      powerpc/numa: Skip onlining a offline node in kdump path · ac1788cc
      Srikar Dronamraju 提交于
      With commit 2ea62630 ("powerpc/topology: Get topology for shared
      processors at boot"), kdump kernel on shared LPAR may crash.
      
      The necessary conditions are
      - Shared LPAR with at least 2 nodes having memory and CPUs.
      - Memory requirement for kdump kernel must be met by the first N-1
        nodes where there are at least N nodes with memory and CPUs.
      
      Example numactl of such a machine.
        $ numactl -H
        available: 5 nodes (0,2,5-7)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 2 cpus:
        node 2 size: 255 MB
        node 2 free: 189 MB
        node 5 cpus: 24 25 26 27 28 29 30 31
        node 5 size: 4095 MB
        node 5 free: 4024 MB
        node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
        node 6 size: 6353 MB
        node 6 free: 5998 MB
        node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
        node 7 size: 7640 MB
        node 7 free: 7164 MB
        node distances:
        node   0   2   5   6   7
          0:  10  40  40  40  40
          2:  40  10  40  40  40
          5:  40  40  10  40  40
          6:  40  40  40  10  20
          7:  40  40  40  20  10
      
      Steps to reproduce.
      1. Load / start kdump service.
      2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)
      
      When booting a kdump kernel with 2048M:
      
        kexec: Starting switchover sequence.
        I'm in purgatory
        Using 1TB segments
        hash-mmu: Initializing hash mmu with SLB
        Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
        Found initrd at 0xc000000009e70000:0xc00000000ae554b4
        Using pSeries machine description
        -----------------------------------------------------
        ppc64_pft_size    = 0x1e
        phys_mem_size     = 0x88000000
        dcache_bsize      = 0x80
        icache_bsize      = 0x80
        cpu_features      = 0x000000ff8f5d91a7
          possible        = 0x0000fbffcf5fb1a7
          always          = 0x0000006f8b5c91a1
        cpu_user_features = 0xdc0065c2 0xef000000
        mmu_features      = 0x7c006001
        firmware_features = 0x00000007c45bfc57
        htab_hash_mask    = 0x7fffff
        physical_start    = 0x8000000
        -----------------------------------------------------
        numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
        numa:     NODE_DATA(0) on node 6
        numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
        Top of RAM: 0x88000000, Total RAM: 0x88000000
        Memory hole size: 0MB
        Zone ranges:
          DMA      [mem 0x0000000000000000-0x0000000087ffffff]
          DMA32    empty
          Normal   empty
        Movable zone start for each node
        Early memory node ranges
          node   6: [mem 0x0000000000000000-0x0000000087ffffff]
        Could not find start_pfn for node 0
        Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
        On node 0 totalpages: 0
        Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff]
        On node 6 totalpages: 34816
      
        Unable to handle kernel paging request for data at address 0x00000060
        Faulting instruction address: 0xc000000008703a54
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
        NIP:  c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000
        REGS: c00000000b673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022022  XER: 20000002
        CFAR: c0000000086fc238 IRQMASK: 0
        GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000
        GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220
        GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008
        GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000
        GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008
        GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78
        GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400
        NIP [c000000008703a54] bus_add_device+0x84/0x1e0
        LR [c000000008703a38] bus_add_device+0x68/0x1e0
        Call Trace:
        [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable)
        [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0
        [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240
        [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180
        [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90
        [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190
        [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580
        [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108
        [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c
        [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450
        [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160
        [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80
        Instruction dump:
        60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c
        e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378
        ---[ end trace 593577668c2daa65 ]---
      
      However a regular kernel with 4096M (2048 gets reserved for crash
      kernel) boots properly.
      
      Unlike regular kernels, which mark all available nodes as online,
      kdump kernel only marks just enough nodes as online and marks the rest
      as offline at boot. However kdump kernel boots with all available
      CPUs. With Commit 2ea62630 ("powerpc/topology: Get topology for
      shared processors at boot"), all CPUs are onlined on their respective
      nodes at boot time. try_online_node() tries to online the offline
      nodes but fails as all needed subsystems are not yet initialized.
      
      As part of fix, detect and skip early onlining of a offline node.
      
      Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot")
      Reported-by: NPavithra Prakash <pavrampu@in.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: NHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac1788cc
  16. 25 9月, 2018 1 次提交
    • S
      powerpc/numa: Use associativity if VPHN hcall is successful · 2483ef05
      Srikar Dronamraju 提交于
      Currently associativity is used to lookup node-id even if the
      preceding VPHN hcall failed. However this can cause CPU to be made
      part of the wrong node, (most likely to be node 0). This is because
      VPHN is not enabled on KVM guests.
      
      With 2ea62630 ("powerpc/topology: Get topology for shared processors at
      boot"), associativity is used to set to the wrong node. Hence KVM
      guest topology is broken.
      
      For example : A 4 node KVM guest before would have reported.
      
        [root@localhost ~]#  numactl -H
        available: 4 nodes (0-3)
        node 0 cpus: 0 1 2 3
        node 0 size: 1746 MB
        node 0 free: 1604 MB
        node 1 cpus: 4 5 6 7
        node 1 size: 2044 MB
        node 1 free: 1765 MB
        node 2 cpus: 8 9 10 11
        node 2 size: 2044 MB
        node 2 free: 1837 MB
        node 3 cpus: 12 13 14 15
        node 3 size: 2044 MB
        node 3 free: 1903 MB
        node distances:
        node   0   1   2   3
          0:  10  40  40  40
          1:  40  10  40  40
          2:  40  40  10  40
          3:  40  40  40  10
      
      Would now report:
      
        [root@localhost ~]# numactl -H
        available: 4 nodes (0-3)
        node 0 cpus: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        node 0 size: 1746 MB
        node 0 free: 1244 MB
        node 1 cpus:
        node 1 size: 2044 MB
        node 1 free: 2032 MB
        node 2 cpus: 1
        node 2 size: 2044 MB
        node 2 free: 2028 MB
        node 3 cpus:
        node 3 size: 2044 MB
        node 3 free: 2032 MB
        node distances:
        node   0   1   2   3
          0:  10  40  40  40
          1:  40  10  40  40
          2:  40  40  10  40
          3:  40  40  40  10
      
      Fix this by skipping associativity lookup if the VPHN hcall failed.
      
      Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot")
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2483ef05
  17. 24 9月, 2018 1 次提交
    • M
      powerpc/pseries: Fix unitialized timer reset on migration · 8604895a
      Michael Bringmann 提交于
      After migration of a powerpc LPAR, the kernel executes code to
      update the system state to reflect new platform characteristics.
      
      Such changes include modifications to device tree properties provided
      to the system by PHYP. Property notifications received by the
      post_mobility_fixup() code are passed along to the kernel in general
      through a call to of_update_property() which in turn passes such
      events back to all modules through entries like the '.notifier_call'
      function within the NUMA module.
      
      When the NUMA module updates its state, it resets its event timer. If
      this occurs after a previous call to stop_topology_update() or on a
      system without VPHN enabled, the code runs into an unitialized timer
      structure and crashes. This patch adds a safety check along this path
      toward the problem code.
      
      An example crash log is as follows.
      
        ibmvscsi 30000081: Re-enabling adapter!
        ------------[ cut here ]------------
        kernel BUG at kernel/time/timer.c:958!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag lockd unix_diag af_packet_diag netlink_diag grace fscache sunrpc xts vmx_crypto pseries_rng sg binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
        CPU: 11 PID: 3067 Comm: drmgr Not tainted 4.17.0+ #179
        ...
        NIP mod_timer+0x4c/0x400
        LR  reset_topology_timer+0x40/0x60
        Call Trace:
          0xc0000003f9407830 (unreliable)
          reset_topology_timer+0x40/0x60
          dt_update_callback+0x100/0x120
          notifier_call_chain+0x90/0x100
          __blocking_notifier_call_chain+0x60/0x90
          of_property_notify+0x90/0xd0
          of_update_property+0x104/0x150
          update_dt_property+0xdc/0x1f0
          pseries_devicetree_update+0x2d0/0x510
          post_mobility_fixup+0x7c/0xf0
          migration_store+0xa4/0xc0
          kobj_attr_store+0x30/0x60
          sysfs_kf_write+0x64/0xa0
          kernfs_fop_write+0x16c/0x240
          __vfs_write+0x40/0x200
          vfs_write+0xc8/0x240
          ksys_write+0x5c/0x100
          system_call+0x58/0x6c
      
      Fixes: 5d88aa85 ("powerpc/pseries: Update CPU maps when device tree is updated")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8604895a
  18. 21 8月, 2018 1 次提交
    • S
      powerpc/topology: Get topology for shared processors at boot · 2ea62630
      Srikar Dronamraju 提交于
      On a shared LPAR, Phyp will not update the CPU associativity at boot
      time. Just after the boot system does recognize itself as a shared
      LPAR and trigger a request for correct CPU associativity. But by then
      the scheduler would have already created/destroyed its sched domains.
      
      This causes
        - Broken load balance across Nodes causing islands of cores.
        - Performance degradation esp if the system is lightly loaded
        - dmesg to wrongly report all CPUs to be in Node 0.
        - Messages in dmesg saying borken topology.
        - With commit 051f3ca0 ("sched/topology: Introduce NUMA identity
          node sched domain"), can cause rcu stalls at boot up.
      
      The sched_domains_numa_masks table which is used to generate cpumasks
      is only created at boot time just before creating sched domains and
      never updated. Hence, its better to get the topology correct before
      the sched domains are created.
      
      For example on 64 core Power 8 shared LPAR, dmesg reports
      
        Brought up 512 CPUs
        Node 0 CPUs: 0-511
        Node 1 CPUs:
        Node 2 CPUs:
        Node 3 CPUs:
        Node 4 CPUs:
        Node 5 CPUs:
        Node 6 CPUs:
        Node 7 CPUs:
        Node 8 CPUs:
        Node 9 CPUs:
        Node 10 CPUs:
        Node 11 CPUs:
        ...
        BUG: arch topology borken
             the DIE domain not a subset of the NUMA domain
        BUG: arch topology borken
             the DIE domain not a subset of the NUMA domain
      
      numactl/lscpu output will still be correct with cores spreading across
      all nodes:
      
        Socket(s):             64
        NUMA node(s):          12
        Model:                 2.0 (pvr 004d 0200)
        Model name:            POWER8 (architected), altivec supported
        Hypervisor vendor:     pHyp
        Virtualization type:   para
        L1d cache:             64K
        L1i cache:             32K
        NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
        NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
        NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
        NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
        NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
        NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
        NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
        NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
        NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
        NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
        NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
        NUMA node11 CPU(s):    160-167,256-263,352-359,448-455
      
      Currently on this LPAR, the scheduler detects 2 levels of Numa and
      created numa sched domains for all CPUs, but it finds a single DIE
      domain consisting of all CPUs. Hence it deletes all numa sched
      domains.
      
      To address this, detect the shared processor and update topology soon
      after CPUs are setup so that correct topology is updated just before
      scheduler creates sched domain.
      
      With the fix, dmesg reports:
      
        numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
        numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
        numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
        numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
        numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
        numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
        numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
        numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
        numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
        numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
        numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
        numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
      
      and lscpu also reports:
      
        Socket(s):             64
        NUMA node(s):          12
        Model:                 2.0 (pvr 004d 0200)
        Model name:            POWER8 (architected), altivec supported
        Hypervisor vendor:     pHyp
        Virtualization type:   para
        L1d cache:             64K
        L1i cache:             32K
        NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
        NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
        NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
        NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
        NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
        NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
        NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
        NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
        NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
        NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
        NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
        NUMA node11 CPU(s):    160-167,256-263,352-359,448-455
      Reported-by: NManjunatha H R <manjuhr1@in.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      [mpe: Trim / format change log]
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2ea62630
  19. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  20. 30 3月, 2018 2 次提交
  21. 08 2月, 2018 1 次提交
    • N
      powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove · 1d9a0907
      Nathan Fontenot 提交于
      When DLPAR removing a CPU, the unmapping of the cpu from a node in
      unmap_cpu_from_node() should also invalidate the CPUs entry in the
      numa_cpu_lookup_table. There is not a guarantee that on a subsequent
      DLPAR add of the CPU the associativity will be the same and thus
      could be in a different node. Invalidating the entry in the
      numa_cpu_lookup_table causes the associativity to be read from the
      device tree at the time of the add.
      
      The current behavior of not invalidating the CPUs entry in the
      numa_cpu_lookup_table can result in scenarios where the the topology
      layout of CPUs in the partition does not match the device tree
      or the topology reported by the HMC.
      
      This bug looks like it was introduced in 2004 in the commit titled
      "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the
      linux-fullhist tree. Hence tag it for all stable releases.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d9a0907
  22. 27 1月, 2018 3 次提交
    • M
      powerpc/pseries: Fix cpu hotplug crash with memoryless nodes · e67e02a5
      Michael Bringmann 提交于
      On powerpc systems with shared configurations of CPUs and memory and
      memoryless nodes at boot, an event ordering problem was observed on a
      SLES12 build platforms with the hot-add of CPUs to the memoryless
      nodes.
      
      * The most common error occurred when the memory SLAB driver attempted
        to reference the memoryless node to which a CPU was being added
        before the kernel had finished initializing all of the data
        structures for the CPU and exited 'device_online' under
        DLPAR/hot-add.
      
        Normally the memoryless node would be initialized through the call
        path device_online ... arch_update_cpu_topology ... find_cpu_nid ...
        try_online_node. This patch ensures that the powerpc node will be
        initialized as early as possible, even if it was memoryless and
        CPU-less at the point when we are trying to hot-add a new CPU to it.
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e67e02a5
    • M
      powerpc/numa: Ensure nodes initialized for hotplug · ea05ba7c
      Michael Bringmann 提交于
      This patch fixes some problems encountered at runtime with
      configurations that support memory-less nodes, or that hot-add CPUs
      into nodes that are memoryless during system execution after boot. The
      problems of interest include:
      
      * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
        them are allowed to be 'possible' and 'online'. Memory allocations
        for those nodes are taken from another node that does have memory
        until and if memory is hot-added to the node.
      
      * Nodes which have no resources assigned at boot, but which may still
        be referenced subsequently by affinity or associativity attributes,
        are kept in the list of 'possible' nodes for powerpc. Hot-add of
        memory or CPUs to the system can reference these nodes and bring
        them online instead of redirecting the references to one of the set
        of nodes known to have memory at boot.
      
      Note that this software operates under the context of CPU hotplug. We
      are not doing memory hotplug in this code, but rather updating the
      kernel's CPU topology (i.e. arch_update_cpu_topology /
      numa_update_cpu_topology). We are initializing a node that may be used
      by CPUs or memory before it can be referenced as invalid by a CPU
      hotplug operation. CPU hotplug operations are protected by a range of
      APIs including cpu_maps_update_begin/cpu_maps_update_done,
      cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
      Memory hotplug operations, including try_online_node, are protected by
      mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
      case of CPUs being hot-added to a previously memoryless node, the
      try_online_node operation occurs wholly within the CPU locks with no
      overlap. Using HMC hot-add/hot-remove operations, we have been able to
      add and remove CPUs to any possible node without failures. HMC
      operations involve a degree self-serialization, though.
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea05ba7c
    • M
      powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes · a346137e
      Michael Bringmann 提交于
      On powerpc systems which allow 'hot-add' of CPU or memory resources,
      it may occur that the new resources are to be inserted into nodes that
      were not used for these resources at bootup. In the kernel, any node
      that is used must be defined and initialized. These empty nodes may
      occur when,
      
      * Dedicated vs. shared resources. Shared resources require information
        such as the VPHN hcall for CPU assignment to nodes. Associativity
        decisions made based on dedicated resource rules, such as
        associativity properties in the device tree, may vary from decisions
        made using the values returned by the VPHN hcall.
      
      * memoryless nodes at boot. Nodes need to be defined as 'possible' at
        boot for operation with other code modules. Previously, the powerpc
        code would limit the set of possible nodes to those which have
        memory assigned at boot, and were thus online. Subsequent add/remove
        of CPUs or memory would only work with this subset of possible
        nodes.
      
      * memoryless nodes with CPUs at boot. Due to the previous restriction
        on nodes, nodes that had CPUs but no memory were being collapsed
        into other nodes that did have memory at boot. In practice this
        meant that the node assignment presented by the runtime kernel
        differed from the affinity and associativity attributes presented by
        the device tree or VPHN hcalls. Nodes that might be known to the
        pHyp were not 'possible' in the runtime kernel because they did not
        have memory at boot.
      
      This patch ensures that sufficient nodes are defined to support
      configuration requirements after boot, as well as at boot. This patch
      set fixes a couple of problems.
      
      * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
        them are allowed to be 'possible' and 'online'. Memory allocations
        for those nodes are taken from another node that does have memory
        until and if memory is hot-added to the node. * Nodes which have no
        resources assigned at boot, but which may still be referenced
        subsequently by affinity or associativity attributes, are kept in
        the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
        to the system can reference these nodes and bring them online
        instead of redirecting to one of the set of nodes that were known to
        have memory at boot.
      
      This patch extracts the value of the lowest domain level (number of
      allocable resources) from the device tree property
      "ibm,max-associativity-domains" to use as the maximum number of nodes
      to setup as possibly available in the system. This new setting will
      override the instruction:
      
          nodes_and(node_possible_map, node_possible_map, node_online_map);
      
      presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
      
      If the "ibm,max-associativity-domains" property is not present at
      boot, no operation will be performed to define or enable additional
      nodes, or enable the above 'nodes_and()'.
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a346137e
  23. 16 1月, 2018 1 次提交
    • N
      powerpc/numa: Update numa code use walk_drmem_lmbs · 514a9cb3
      Nathan Fontenot 提交于
      Update code in powerpc/numa.c to use the walk_drmem_lmbs()
      routine instead of parsing the device tree directly. This is
      in anticipation of introducing a new ibm,dynamic-memory-v2
      property with a different format. This will allow the numa code
      to use a single initialization routine per-LMB irregardless of
      the device tree format.
      
      Additionally, to support additional routines in numa.c that need
      to look up LMB information, an late_init routine is added to drmem.c
      to allocate the array of LMB information. This LMB array will provide
      per-LMB information to separate the LMB data from the device tree
      format.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      514a9cb3