1. 13 2月, 2019 3 次提交
  2. 26 1月, 2019 1 次提交
    • B
      powerpc/xmon: Fix invocation inside lock region · 115a0d66
      Breno Leitao 提交于
      [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]
      
      Currently xmon needs to get devtree_lock (through rtas_token()) during its
      invocation (at crash time). If there is a crash while devtree_lock is being
      held, then xmon tries to get the lock but spins forever and never get into
      the interactive debugger, as in the following case:
      
      	int *ptr = NULL;
      	raw_spin_lock_irqsave(&devtree_lock, flags);
      	*ptr = 0xdeadbeef;
      
      This patch avoids calling rtas_token(), thus trying to get the same lock,
      at crash time. This new mechanism proposes getting the token at
      initialization time (xmon_init()) and just consuming it at crash time.
      
      This would allow xmon to be possible invoked independent of devtree_lock
      being held or not.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      115a0d66
  3. 13 1月, 2019 10 次提交
    • B
      powerpc/tm: Set MSR[TS] just prior to recheckpoint · a854ab8f
      Breno Leitao 提交于
      commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream.
      
      On a signal handler return, the user could set a context with MSR[TS] bits
      set, and these bits would be copied to task regs->msr.
      
      At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
      several __get_user() are called and then a recheckpoint is executed.
      
      This is a problem since a page fault (in kernel space) could happen when
      calling __get_user(). If it happens, the process MSR[TS] bits were
      already set, but recheckpoint was not executed, and SPRs are still invalid.
      
      The page fault can cause the current process to be de-scheduled, with
      MSR[TS] active and without tm_recheckpoint() being called.  More
      importantly, without TEXASR[FS] bit set also.
      
      Since TEXASR might not have the FS bit set, and when the process is
      scheduled back, it will try to reclaim, which will be aborted because of
      the CPU is not in the suspended state, and, then, recheckpoint. This
      recheckpoint will restore thread->texasr into TEXASR SPR, which might be
      zero, hitting a BUG_ON().
      
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0]
      	    pc: c000000000054550: restore_gprs+0xb0/0x180
      	    lr: 0000000000000000
      	    sp: c00000041f157950
      	   msr: 8000000100021033
      	  current = 0xc00000041f143000
      	  paca    = 0xc00000000fb86300	 softe: 0	 irq_happened: 0x01
      	    pid   = 1021, comm = kworker/11:1
      	kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434!
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	enter ? for help
      	[c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0
      	[c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0
      	[c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990
      	[c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0
      	[c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610
      	[c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140
      	[c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c
      
      This patch simply delays the MSR[TS] set, so, if there is any page fault in
      the __get_user() section, it does not have regs->msr[TS] set, since the TM
      structures are still invalid, thus avoiding doing TM operations for
      in-kernel exceptions and possible process reschedule.
      
      With this patch, the MSR[TS] will only be set just before recheckpointing
      and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
      invalid state.
      
      Other than that, if CONFIG_PREEMPT is set, there might be a preemption just
      after setting MSR[TS] and before tm_recheckpoint(), thus, this block must
      be atomic from a preemption perspective, thus, calling
      preempt_disable/enable() on this code.
      
      It is not possible to move tm_recheckpoint to happen earlier, because it is
      required to get the checkpointed registers from userspace, with
      __get_user(), thus, the only way to avoid this undesired behavior is
      delaying the MSR[TS] set.
      
      The 32-bits signal handler seems to be safe this current issue, but, it
      might be exposed to the preemption issue, thus, disabling preemption in
      this chunk of code.
      
      Changes from v2:
       * Run the critical section with preempt_disable.
      
      Fixes: 87b4e539 ("powerpc/tm: Fix return of active 64bit signals")
      Cc: stable@vger.kernel.org (v3.9+)
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a854ab8f
    • G
      Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing" · 6d9e96f3
      Greg Kroah-Hartman 提交于
      This reverts commit a9935a12 which is
      commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
      
      It breaks the powerpc build, so drop it from the tree until a fix goes
      upstream.
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: Michal Suchánek <msuchanek@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d9e96f3
    • J
      powerpc/boot: Set target when cross-compiling for clang · 7dfb22b5
      Joel Stanley 提交于
      commit 813af51f5d30a2da6a2523c08465f9726e51772e upstream.
      
      Clang needs to be told which target it is building for when cross
      compiling.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/259Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Tested-by: Daniel Axtens <dja@axtens.net> # powerpc 64-bit BE
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dfb22b5
    • J
      powerpc: Disable -Wbuiltin-requires-header when setjmp is used · 28e1d143
      Joel Stanley 提交于
      commit aea447141c7e7824b81b49acd1bc785506fba46e upstream.
      
      The powerpc kernel uses setjmp which causes a warning when building
      with clang:
      
        In file included from arch/powerpc/xmon/xmon.c:51:
        ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of
        built-in function 'setjmp' requires inclusion of the header <setjmp.h>
              [-Werror,-Wbuiltin-requires-header]
        extern long setjmp(long *);
                    ^
        ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of
        built-in function 'longjmp' requires inclusion of the header <setjmp.h>
              [-Werror,-Wbuiltin-requires-header]
        extern void longjmp(long *, long);
                    ^
      
      This *is* the header and we're not using the built-in setjump but
      rather the one in arch/powerpc/kernel/misc.S. As the compiler warning
      does not make sense, it for the files where setjmp is used.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28e1d143
    • N
      powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer · ba1fe90b
      Nicholas Piggin 提交于
      commit 6977f95e63b9b3fb4a5973481a800dd9f48a1338 upstream.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba1fe90b
    • N
      powerpc: consolidate -mno-sched-epilog into FTRACE flags · 24e26062
      Nicholas Piggin 提交于
      commit 2a056f58fd33ccc6a0261b552b0f17e7fa4a12f3 upstream.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24e26062
    • N
      powerpc: remove old GCC version checks · 5b59eeba
      Nicholas Piggin 提交于
      commit f2910f0e6835339e6ce82cef22fa15718b7e3bfa upstream.
      
      GCC 4.6 is the minimum supported now.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      [nc: Applied to minimize unnecessary conflicts]
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b59eeba
    • O
      powerpc/mm: Fallback to RAM if the altmap is unusable · 54568ab2
      Oliver O'Halloran 提交于
      [ Upstream commit 9ef34630a4614ee1cd478f9859ebea55d55f10ec ]
      
      The "altmap" is used to provide a pool of memory that is reserved for
      the vmemmap backing of hot-plugged memory. This is useful when adding
      large amount of ZONE_DEVICE memory to a system with a limited amount of
      normal memory.
      
      On ppc64 we use huge pages to map the vmemmap which requires the backing
      storage to be contigious and aligned to the hugepage size. The altmap
      implementation allows for the altmap provider to reserve a few PFNs at
      the start of the range for it's own uses and when this occurs the
      first chunk of the altmap is not usable for hugepage mappings. On hash
      there is no sane way to fall back to a normal sized page mapping so we
      fail the allocation. This results in memory hotplug failing with
      ENOMEM when the new range doesn't fall into an existing vmemmap block.
      
      This patch handles this case by falling back to using system memory
      rather than failing if we cannot allocate from the altmap. This
      fallback should only ever be used for the first vmemmap block so it
      should not cause excess memory consumption.
      
      Fixes: 7b73d978 ("mm: pass the vmem_altmap to vmemmap_populate")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      54568ab2
    • M
      powerpc/mm: Fix linux page tables build with some configs · d4fdc5e8
      Michael Ellerman 提交于
      [ Upstream commit 462951cd32e1496dc64b00051dfb777efc8ae5d8 ]
      
      For some configs the build fails with:
      
        arch/powerpc/mm/dump_linuxpagetables.c: In function 'populate_markers':
        arch/powerpc/mm/dump_linuxpagetables.c:306:39: error: 'PKMAP_BASE' undeclared (first use in this function)
        arch/powerpc/mm/dump_linuxpagetables.c:314:50: error: 'LAST_PKMAP' undeclared (first use in this function)
      
      These come from highmem.h, including that fixes the build.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d4fdc5e8
    • P
      powerpc: Fix COFF zImage booting on old powermacs · 4642a5c5
      Paul Mackerras 提交于
      [ Upstream commit 5564597d51c8ff5b88d95c76255e18b13b760879 ]
      
      Commit 6975a783 ("powerpc/boot: Allow building the zImage wrapper
      as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor
      at the start of crt0.S to have a hard-coded start address of 0x500000
      rather than a reference to _zimage_start, presumably because having
      a reference to a symbol introduced a relocation which is awkward to
      handle in a position-independent executable.  Unfortunately, what is
      at 0x500000 in the COFF image is not the first instruction, but the
      procedure descriptor itself, that is, a word containing 0x500000,
      which is not a valid instruction.  Hence, booting a COFF zImage
      results in a "DEFAULT CATCH!, code=FFF00700" message from Open
      Firmware.
      
      This fixes the problem by (a) putting the procedure descriptor in the
      data section and (b) adding a branch to _zimage_start as the first
      instruction in the program.
      
      Fixes: 6975a783 ("powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4642a5c5
  4. 10 1月, 2019 2 次提交
    • B
      powerpc/tm: Unset MSR[TS] if not recheckpointing · a9935a12
      Breno Leitao 提交于
      commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
      
      There is a TM Bad Thing bug that can be caused when you return from a
      signal context in a suspended transaction but with ucontext MSR[TS] unset.
      
      This forces regs->msr[TS] to be set at syscall entrance (since the CPU
      state is transactional). It also calls treclaim() to flush the transaction
      state, which is done based on the live (mfmsr) MSR state.
      
      Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
      called, thus, not executing recheckpoint, keeping the CPU state as not
      transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
      state is non transactional, causing the TM Bad Thing with the following
      stack:
      
      	[   33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c
      	cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40]
      	    pc: c00000000000c47c: fast_exception_return+0xac/0xb4
      	    lr: 00003fff865f442c
      	    sp: 3fffd9dce3e0
      	   msr: 8000000102a03031
      	  current = 0xc00000041f68b700
      	  paca    = 0xc00000000fb84800   softe: 0        irq_happened: 0x01
      	    pid   = 1721, comm = tm-signal-sigre
      	Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
      	WARNING: exception is not recoverable, can't continue
      
      The same problem happens on 32-bits signal handler, and the fix is very
      similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be
      zeroed.
      
      This patch also fixes a sparse warning related to lack of indentation when
      CONFIG_PPC_TRANSACTIONAL_MEM is set.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      CC: Stable <stable@vger.kernel.org>	# 3.10+
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Tested-by: NMichal Suchánek <msuchanek@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9935a12
    • D
      powerpc/fsl: Fix spectre_v2 mitigations reporting · 34fc0919
      Diana Craciun 提交于
      commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream.
      
      Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:
      
        $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
        "Mitigation: Software count cache flush"
      
      Which is wrong. Fix it to report vulnerable for now.
      
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34fc0919
  5. 20 12月, 2018 2 次提交
    • B
      powerpc: Look for "stdout-path" when setting up legacy consoles · c97c353e
      Benjamin Herrenschmidt 提交于
      commit bf3d6afbb234156749b640b6c50f714967a85964 upstream.
      
      Commit 78e5dfea ("powerpc: dts: replace 'linux,stdout-path' with
      'stdout-path'") broke the default console on a number of embedded
      PowerPC systems, because it failed to also update the code in
      arch/powerpc/kernel/legacy_serial.c to look for that property in
      addition to the old one.
      
      This fixes it.
      
      Fixes: 78e5dfea ("powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'")
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c97c353e
    • R
      powerpc/msi: Fix NULL pointer access in teardown code · 73732de1
      Radu Rendec 提交于
      commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream.
      
      The arch_teardown_msi_irqs() function assumes that controller ops
      pointers were already checked in arch_setup_msi_irqs(), but this
      assumption is wrong: arch_teardown_msi_irqs() can be called even when
      arch_setup_msi_irqs() returns an error (-ENOSYS).
      
      This can happen in the following scenario:
        - msi_capability_init() calls pci_msi_setup_msi_irqs()
        - pci_msi_setup_msi_irqs() returns -ENOSYS
        - msi_capability_init() notices the error and calls free_msi_irqs()
        - free_msi_irqs() calls pci_msi_teardown_msi_irqs()
      
      This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and
      pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just
      aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs().
      
      The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure
      seems legit, as it does additional cleanup; e.g.
      list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do
      happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs()
      is called and need to be cleaned up if that fails).
      
      Fixes: 6b2fd7ef ("PCI/MSI/PPC: Remove arch_msi_check_device()")
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NRadu Rendec <radu.rendec@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73732de1
  6. 06 12月, 2018 1 次提交
  7. 01 12月, 2018 3 次提交
    • S
      powerpc/numa: Suppress "VPHN is not supported" messages · f5c632cf
      Satheesh Rajendran 提交于
      [ Upstream commit 437ccdc8ce629470babdda1a7086e2f477048cbd ]
      
      When VPHN function is not supported and during cpu hotplug event,
      kernel prints message 'VPHN function not supported. Disabling
      polling...'. Currently it prints on every hotplug event, it floods
      dmesg when a KVM guest tries to hotplug huge number of vcpus, let's
      just print once and suppress further kernel prints.
      Signed-off-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f5c632cf
    • M
      powerpc/io: Fix the IO workarounds code to work with Radix · b7718632
      Michael Ellerman 提交于
      [ Upstream commit 43c6494fa1499912c8177e71450c0279041152a6 ]
      
      Back in 2006 Ben added some workarounds for a misbehaviour in the
      Spider IO bridge used on early Cell machines, see commit
      014da7ff ("[POWERPC] Cell "Spider" MMIO workarounds"). Later these
      were made to be generic, ie. not tied specifically to Spider.
      
      The code stashes a token in the high bits (59-48) of virtual addresses
      used for IO (eg. returned from ioremap()). This works fine when using
      the Hash MMU, but when we're using the Radix MMU the bits used for the
      token overlap with some of the bits of the virtual address.
      
      This is because the maximum virtual address is larger with Radix, up
      to c00fffffffffffff, and in fact we use that high part of the address
      range for ioremap(), see RADIX_KERN_IO_START.
      
      As it happens the bits that are used overlap with the bits that
      differentiate an IO address vs a linear map address. If the resulting
      address lies outside the linear mapping we will crash (see below), if
      not we just corrupt memory.
      
        virtio-pci 0000:00:00.0: Using 64-bit direct DMA at offset 800000000000000
        Unable to handle kernel paging request for data at address 0xc000000080000014
        ...
        CFAR: c000000000626b98 DAR: c000000080000014 DSISR: 42000000 IRQMASK: 0
        GPR00: c0000000006c54fc c00000003e523378 c0000000016de600 0000000000000000
        GPR04: c00c000080000014 0000000000000007 0fffffff000affff 0000000000000030
               ^^^^
        ...
        NIP [c000000000626c5c] .iowrite8+0xec/0x100
        LR [c0000000006c992c] .vp_reset+0x2c/0x90
        Call Trace:
          .pci_bus_read_config_dword+0xc4/0x120 (unreliable)
          .register_virtio_device+0x13c/0x1c0
          .virtio_pci_probe+0x148/0x1f0
          .local_pci_probe+0x68/0x140
          .pci_device_probe+0x164/0x220
          .really_probe+0x274/0x3b0
          .driver_probe_device+0x80/0x170
          .__driver_attach+0x14c/0x150
          .bus_for_each_dev+0xb8/0x130
          .driver_attach+0x34/0x50
          .bus_add_driver+0x178/0x2f0
          .driver_register+0x90/0x1a0
          .__pci_register_driver+0x6c/0x90
          .virtio_pci_driver_init+0x2c/0x40
          .do_one_initcall+0x64/0x280
          .kernel_init_freeable+0x36c/0x474
          .kernel_init+0x24/0x160
          .ret_from_kernel_thread+0x58/0x7c
      
      This hasn't been a problem because CONFIG_PPC_IO_WORKAROUNDS which
      enables this code is usually not enabled. It is only enabled when it's
      selected by PPC_CELL_NATIVE which is only selected by
      PPC_IBM_CELL_BLADE and that in turn depends on BIG_ENDIAN. So in order
      to hit the bug you need to build a big endian kernel, with IBM Cell
      Blade support enabled, as well as Radix MMU support, and then boot
      that on Power9 using Radix MMU.
      
      Still we can fix the bug, so let's do that. We simply use fewer bits
      for the token, taking the union of the restrictions on the address
      from both Hash and Radix, we end up with 8 bits we can use for the
      token. The only user of the token is iowa_mem_find_bus() which only
      supports 8 token values, so 8 bits is plenty for that.
      
      Fixes: 566ca99a ("powerpc/mm/radix: Add dummy radix_enabled()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b7718632
    • S
      KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE · 8d976d7a
      Scott Wood 提交于
      [ Upstream commit 28c5bcf74fa07c25d5bd118d1271920f51ce2a98 ]
      
      TRACE_INCLUDE_PATH and TRACE_INCLUDE_FILE are used by
      <trace/define_trace.h>, so like that #include, they should
      be outside #ifdef protection.
      
      They also need to be #undefed before defining, in case multiple trace
      headers are included by the same C file.  This became the case on
      book3e after commit cf4a6085151a ("powerpc/mm: Add missing tracepoint for
      tlbie"), leading to the following build error:
      
         CC      arch/powerpc/kvm/powerpc.o
      In file included from arch/powerpc/kvm/powerpc.c:51:0:
      arch/powerpc/kvm/trace.h:9:0: error: "TRACE_INCLUDE_PATH" redefined
      [-Werror]
        #define TRACE_INCLUDE_PATH .
        ^
      In file included from arch/powerpc/kvm/../mm/mmu_decl.h:25:0,
                        from arch/powerpc/kvm/powerpc.c:48:
      ./arch/powerpc/include/asm/trace.h:224:0: note: this is the location of
      the previous definition
        #define TRACE_INCLUDE_PATH asm
        ^
      cc1: all warnings being treated as errors
      Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: NScott Wood <oss@buserror.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8d976d7a
  8. 21 11月, 2018 11 次提交
    • C
      Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" · ce65f0f6
      Christophe Leroy 提交于
      commit cc4ebf5c0a3440ed0a32d25c55ebdb6ce5f3c0bc upstream.
      
      This reverts commit 4f94b2c7.
      
      That commit was buggy, as it used rlwinm instead of rlwimi.
      Instead of fixing that bug, we revert the previous commit in order to
      reduce the dependency between L1 entries and L2 entries
      
      Fixes: 4f94b2c7 ("powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce65f0f6
    • R
      powerpc/memtrace: Remove memory in chunks · 75837895
      Rashmica Gupta 提交于
      [ Upstream commit 3f7daf3d7582dc6628ac40a9045dd1bbd80c5f35 ]
      
      When hot-removing memory release_mem_region_adjustable() splits iomem
      resources if they are not the exact size of the memory being
      hot-deleted. Adding this memory back to the kernel adds a new resource.
      
      Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from
      0xf40000000 results in the single resource 0x0-0xfffffffff being split
      into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff.
      
      When we hot-add the memory back we now have three resources:
      0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff.
      
      This is an issue if we try to remove some memory that overlaps
      resources. Eg when trying to remove 2GB at address 0xf40000000,
      release_mem_region_adjustable() fails as it expects the chunk of memory
      to be within the boundaries of a single resource. We then get the
      warning: "Unable to release resource" and attempting to use memtrace
      again gives us this error: "bash: echo: write error: Resource
      temporarily unavailable"
      
      This patch makes memtrace remove memory in chunks that are always the
      same size from an address that is always equal to end_of_memory -
      n*size, for some n. So hotremoving and hotadding memory of different
      sizes will now not attempt to remove memory that spans multiple
      resources.
      Signed-off-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75837895
    • J
      powerpc/boot: Ensure _zimage_start is a weak symbol · ce62cb99
      Joel Stanley 提交于
      [ Upstream commit ee9d21b3b3583712029a0db65a4b7c081d08d3b3 ]
      
      When building with clang crt0's _zimage_start is not marked weak, which
      breaks the build when linking the kernel image:
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058 g       .text  0000000000000000 _zimage_start
      
       ld: arch/powerpc/boot/wrapper.a(crt0.o): in function '_zimage_start':
       (.text+0x58): multiple definition of '_zimage_start';
       arch/powerpc/boot/pseries-head.o:(.text+0x0): first defined here
      
      Clang requires the .weak directive to appear after the symbol is
      declared. The binutils manual says:
      
       This directive sets the weak attribute on the comma separated list of
       symbol names. If the symbols do not already exist, they will be
       created.
      
      So it appears this is different with clang. The only reference I could
      see for this was an OpenBSD mailing list post[1].
      
      Changing it to be after the declaration fixes building with Clang, and
      still works with GCC.
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058  w      .text	0000000000000000 _zimage_start
      
      Reported to clang as https://bugs.llvm.org/show_bug.cgi?id=38921
      
      [1] https://groups.google.com/forum/#!topic/fa.openbsd.tech/PAgKKen2YCYSigned-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce62cb99
    • C
      powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak · 5b248698
      Christophe Leroy 提交于
      [ Upstream commit 803d690e68f0c5230183f1a42c7d50a41d16e380 ]
      
      When a process allocates a hugepage, the following leak is
      reported by kmemleak. This is a false positive which is
      due to the pointer to the table being stored in the PGD
      as physical memory address and not virtual memory pointer.
      
      unreferenced object 0xc30f8200 (size 512):
        comm "mmap", pid 374, jiffies 4872494 (age 627.630s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<e32b68da>] huge_pte_alloc+0xdc/0x1f8
          [<9e0df1e1>] hugetlb_fault+0x560/0x8f8
          [<7938ec6c>] follow_hugetlb_page+0x14c/0x44c
          [<afbdb405>] __get_user_pages+0x1c4/0x3dc
          [<b8fd7cd9>] __mm_populate+0xac/0x140
          [<3215421e>] vm_mmap_pgoff+0xb4/0xb8
          [<c148db69>] ksys_mmap_pgoff+0xcc/0x1fc
          [<4fcd760f>] ret_from_syscall+0x0/0x38
      
      See commit a984506c ("powerpc/mm: Don't report PUDs as
      memory leaks when using kmemleak") for detailed explanation.
      
      To fix that, this patch tells kmemleak to ignore the allocated
      hugepage table.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b248698
    • D
      powerpc/nohash: fix undefined behaviour when testing page size support · b0f85991
      Daniel Axtens 提交于
      [ Upstream commit f5e284803a7206d43e26f9ffcae5de9626d95e37 ]
      
      When enumerating page size definitions to check hardware support,
      we construct a constant which is (1U << (def->shift - 10)).
      
      However, the array of page size definitions is only initalised for
      various MMU_PAGE_* constants, so it contains a number of 0-initialised
      elements with def->shift == 0. This means we end up shifting by a
      very large number, which gives the following UBSan splat:
      
      ================================================================================
      UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
      shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
      Call Trace:
      [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
      [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
      [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
      [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
      [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
      [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
      ================================================================================
      
      Fix this by first checking if the element exists (shift != 0) before
      constructing the constant.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0f85991
    • S
      powerpc/eeh: Fix possible null deref in eeh_dump_dev_log() · 234bee8b
      Sam Bobroff 提交于
      [ Upstream commit f9bc28aedfb5bbd572d2d365f3095c1becd7209b ]
      
      If an error occurs during an unplug operation, it's possible for
      eeh_dump_dev_log() to be called when edev->pdn is null, which
      currently leads to dereferencing a null pointer.
      
      Handle this by skipping the error log for those devices.
      Signed-off-by: NSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      234bee8b
    • J
      powerpc/Makefile: Fix PPC_BOOK3S_64 ASFLAGS · 32f2674c
      Joel Stanley 提交于
      [ Upstream commit 960e30029863db95ec79a71009272d4661db5991 ]
      
      Ever since commit 15a3204d ("powerpc/64s: Set assembler machine type
      to POWER4") we force -mpower4 to be passed to the assembler
      irrespective of the CFLAGS used (for Book3s 64).
      
      When building a powerpc64 kernel with clang, clang will not add -many
      to the assembler flags, so any instructions that the compiler has
      generated that are not available on power4 will cause an error:
      
        /usr/bin/as -a64 -mppc64 -mlittle-endian -mpower8 \
         -I ./arch/powerpc/include -I ./arch/powerpc/include/generated \
         -I ./include -I ./arch/powerpc/include/uapi \
         -I ./arch/powerpc/include/generated/uapi -I ./include/uapi \
         -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc \
         -maltivec -mpower4 -o init/do_mounts.o /tmp/do_mounts-3b0a3d.s
        /tmp/do_mounts-51ce54.s:748: Error: unrecognized opcode: `isel'
      
      GCC does include -many, so the GCC driven gas call will succeed:
      
        as -v -I ./arch/powerpc/include -I ./arch/powerpc/include/generated -I
        ./include -I ./arch/powerpc/include/uapi
        -I ./arch/powerpc/include/generated/uapi -I ./include/uapi
        -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc
         -a64 -mpower8 -many -mlittle -maltivec -mpower4 -o init/do_mounts.o
      
      Note that isel is power7 and above for IBM CPUs. GCC only generates it
      for Power9 and above, but the above test was run against the clang
      generated assembly.
      
      Peter Bergner explains:
      
        When using -many -mpower4, gas will first try and find a matching
        power4 mnemonic and failing that, it will then allow any valid
        mnemonic that gas knows about. GCC's use of -many predates me
        though.
      
        IIRC, Alan looked at trying to remove it, but I forget why he
        didn't. Could be either a gcc or gas issue at the time. I'm not sure
        whether issue still exists or not. He and I have modified how gas
        works internally a fair amount since he tried removing gcc use of
        -many.
      
        I will also note that when using -many, gas will choose the first
        mnemonic that matches in the mnemonic table and we have (mostly)
        sorted the table so that server mnemonics show up earlier in the
        table than other mnemonics, so they'll be seen/chosen first.
      
      By explicitly setting -many we can build with Clang and GCC while
      retaining the -mpower4 option.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32f2674c
    • C
      powerpc/mm: fix always true/false warning in slice.c · 56e4367f
      Christophe Leroy 提交于
      [ Upstream commit 37e9c674e7e6f445e12cb1151017bd4bacdd1e2d ]
      
      This patch fixes the following warnings (obtained with make W=1).
      
      arch/powerpc/mm/slice.c: In function 'slice_range_to_mask':
      arch/powerpc/mm/slice.c:73:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:81:20: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if ((start + len) > SLICE_LOW_TOP) {
                          ^
      arch/powerpc/mm/slice.c: In function 'slice_mask_for_free':
      arch/powerpc/mm/slice.c:136:17: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (high_limit <= SLICE_LOW_TOP)
                       ^
      arch/powerpc/mm/slice.c: In function 'slice_check_range_fits':
      arch/powerpc/mm/slice.c:185:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:195:39: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
                                             ^
      arch/powerpc/mm/slice.c: In function 'slice_scan_available':
      arch/powerpc/mm/slice.c:306:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      arch/powerpc/mm/slice.c: In function 'get_slice_psize':
      arch/powerpc/mm/slice.c:709:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56e4367f
    • M
      powerpc/mm: Fix page table dump to work on Radix · 287deb22
      Michael Ellerman 提交于
      [ Upstream commit 0d923962ab69c27cca664a2d535e90ef655110ca ]
      
      When we're running on Book3S with the Radix MMU enabled the page table
      dump currently prints the wrong addresses because it uses the wrong
      start address.
      
      Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      287deb22
    • N
      powerpc/64/module: REL32 relocation range check · 63880b2f
      Nicholas Piggin 提交于
      [ Upstream commit b851ba02a6f3075f0f99c60c4bc30a4af80cf428 ]
      
      The recent module relocation overflow crash demonstrated that we
      have no range checking on REL32 relative relocations. This patch
      implements a basic check, the same kernel that previously oopsed
      and rebooted now continues with some of these errors when loading
      the module:
      
        module_64: x_tables: REL32 527703503449812 out of range!
      
      Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have
      overflow checks.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63880b2f
    • C
      powerpc/traps: restore recoverability of machine_check interrupts · 54de3d7d
      Christophe Leroy 提交于
      [ Upstream commit daf00ae71dad8aa05965713c62558aeebf2df48e ]
      
      commit b96672dd ("powerpc: Machine check interrupt is a non-
      maskable interrupt") added a call to nmi_enter() at the beginning of
      machine check restart exception handler. Due to that, in_interrupt()
      always returns true regardless of the state before entering the
      exception, and die() panics even when the system was not already in
      interrupt.
      
      This patch calls nmi_exit() before calling die() in order to restore
      the interrupt state we had before calling nmi_enter()
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54de3d7d
  9. 14 11月, 2018 4 次提交
  10. 10 10月, 2018 1 次提交
    • J
      mm: Preserve _PAGE_DEVMAP across mprotect() calls · 4628a645
      Jan Kara 提交于
      Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a
      result we will see warnings such as:
      
      BUG: Bad page map in process JobWrk0013  pte:800001803875ea25 pmd:7624381067
      addr:00007f0930720000 vm_flags:280000f9 anon_vma:          (null) mapping:ffff97f2384056f0 index:0
      file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:          (null)
      CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G        W          4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased)
      Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
      Call Trace:
       dump_stack+0x5a/0x75
       print_bad_pte+0x217/0x2c0
       ? enqueue_task_fair+0x76/0x9f0
       _vm_normal_page+0xe5/0x100
       zap_pte_range+0x148/0x740
       unmap_page_range+0x39a/0x4b0
       unmap_vmas+0x42/0x90
       unmap_region+0x99/0xf0
       ? vma_gap_callbacks_rotate+0x1a/0x20
       do_munmap+0x255/0x3a0
       vm_munmap+0x54/0x80
       SyS_munmap+0x1d/0x30
       do_syscall_64+0x74/0x150
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      ...
      
      when mprotect(2) gets used on DAX mappings. Also there is a wide variety
      of other failures that can result from the missing _PAGE_DEVMAP flag
      when the area gets used by get_user_pages() later.
      
      Fix the problem by including _PAGE_DEVMAP in a set of flags that get
      preserved by mprotect(2).
      
      Fixes: 69660fd7 ("x86, mm: introduce _PAGE_DEVMAP")
      Fixes: ebd31197 ("powerpc/mm: Add devmap support for ppc64")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4628a645
  11. 05 10月, 2018 2 次提交
    • S
      powerpc/numa: Skip onlining a offline node in kdump path · ac1788cc
      Srikar Dronamraju 提交于
      With commit 2ea62630 ("powerpc/topology: Get topology for shared
      processors at boot"), kdump kernel on shared LPAR may crash.
      
      The necessary conditions are
      - Shared LPAR with at least 2 nodes having memory and CPUs.
      - Memory requirement for kdump kernel must be met by the first N-1
        nodes where there are at least N nodes with memory and CPUs.
      
      Example numactl of such a machine.
        $ numactl -H
        available: 5 nodes (0,2,5-7)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 2 cpus:
        node 2 size: 255 MB
        node 2 free: 189 MB
        node 5 cpus: 24 25 26 27 28 29 30 31
        node 5 size: 4095 MB
        node 5 free: 4024 MB
        node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
        node 6 size: 6353 MB
        node 6 free: 5998 MB
        node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
        node 7 size: 7640 MB
        node 7 free: 7164 MB
        node distances:
        node   0   2   5   6   7
          0:  10  40  40  40  40
          2:  40  10  40  40  40
          5:  40  40  10  40  40
          6:  40  40  40  10  20
          7:  40  40  40  20  10
      
      Steps to reproduce.
      1. Load / start kdump service.
      2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)
      
      When booting a kdump kernel with 2048M:
      
        kexec: Starting switchover sequence.
        I'm in purgatory
        Using 1TB segments
        hash-mmu: Initializing hash mmu with SLB
        Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
        Found initrd at 0xc000000009e70000:0xc00000000ae554b4
        Using pSeries machine description
        -----------------------------------------------------
        ppc64_pft_size    = 0x1e
        phys_mem_size     = 0x88000000
        dcache_bsize      = 0x80
        icache_bsize      = 0x80
        cpu_features      = 0x000000ff8f5d91a7
          possible        = 0x0000fbffcf5fb1a7
          always          = 0x0000006f8b5c91a1
        cpu_user_features = 0xdc0065c2 0xef000000
        mmu_features      = 0x7c006001
        firmware_features = 0x00000007c45bfc57
        htab_hash_mask    = 0x7fffff
        physical_start    = 0x8000000
        -----------------------------------------------------
        numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
        numa:     NODE_DATA(0) on node 6
        numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
        Top of RAM: 0x88000000, Total RAM: 0x88000000
        Memory hole size: 0MB
        Zone ranges:
          DMA      [mem 0x0000000000000000-0x0000000087ffffff]
          DMA32    empty
          Normal   empty
        Movable zone start for each node
        Early memory node ranges
          node   6: [mem 0x0000000000000000-0x0000000087ffffff]
        Could not find start_pfn for node 0
        Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
        On node 0 totalpages: 0
        Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff]
        On node 6 totalpages: 34816
      
        Unable to handle kernel paging request for data at address 0x00000060
        Faulting instruction address: 0xc000000008703a54
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
        NIP:  c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000
        REGS: c00000000b673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022022  XER: 20000002
        CFAR: c0000000086fc238 IRQMASK: 0
        GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000
        GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220
        GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008
        GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000
        GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008
        GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78
        GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400
        NIP [c000000008703a54] bus_add_device+0x84/0x1e0
        LR [c000000008703a38] bus_add_device+0x68/0x1e0
        Call Trace:
        [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable)
        [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0
        [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240
        [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180
        [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90
        [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190
        [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580
        [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108
        [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c
        [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450
        [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160
        [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80
        Instruction dump:
        60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c
        e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378
        ---[ end trace 593577668c2daa65 ]---
      
      However a regular kernel with 4096M (2048 gets reserved for crash
      kernel) boots properly.
      
      Unlike regular kernels, which mark all available nodes as online,
      kdump kernel only marks just enough nodes as online and marks the rest
      as offline at boot. However kdump kernel boots with all available
      CPUs. With Commit 2ea62630 ("powerpc/topology: Get topology for
      shared processors at boot"), all CPUs are onlined on their respective
      nodes at boot time. try_online_node() tries to online the offline
      nodes but fails as all needed subsystems are not yet initialized.
      
      As part of fix, detect and skip early onlining of a offline node.
      
      Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot")
      Reported-by: NPavithra Prakash <pavrampu@in.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: NHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac1788cc
    • M
      powerpc: Don't print kernel instructions in show_user_instructions() · a932ed3b
      Michael Ellerman 提交于
      Recently we implemented show_user_instructions() which dumps the code
      around the NIP when a user space process dies with an unhandled
      signal. This was modelled on the x86 code, and we even went so far as
      to implement the exact same bug, namely that if the user process
      crashed with its NIP pointing into the kernel we will dump kernel text
      to dmesg. eg:
      
        bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1
        bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6
        bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048
      
      This was discovered on x86 by Jann Horn and fixed in commit
      342db04a ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").
      
      Fix it by checking the adjusted NIP value (pc) and number of
      instructions against USER_DS, and bail if we fail the check, eg:
      
        bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1
        bad-bctr[2969]: Bad NIP, not dumping instructions.
      
      Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a932ed3b