1. 19 10月, 2017 3 次提交
  2. 13 10月, 2017 1 次提交
    • A
      powerpc/perf: Fix IMC initialization crash · 0d8ba162
      Anju T Sudhakar 提交于
      Panic observed with latest firmware, and upstream kernel:
      
       NIP init_imc_pmu+0x8c/0xcf0
       LR  init_imc_pmu+0x2f8/0xcf0
       Call Trace:
         init_imc_pmu+0x2c8/0xcf0 (unreliable)
         opal_imc_counters_probe+0x300/0x400
         platform_drv_probe+0x64/0x110
         driver_probe_device+0x3d8/0x580
         __driver_attach+0x14c/0x1a0
         bus_for_each_dev+0x8c/0xf0
         driver_attach+0x34/0x50
         bus_add_driver+0x298/0x350
         driver_register+0x9c/0x180
         __platform_driver_register+0x5c/0x70
         opal_imc_driver_init+0x2c/0x40
         do_one_initcall+0x64/0x1d0
         kernel_init_freeable+0x280/0x374
         kernel_init+0x24/0x160
         ret_from_kernel_thread+0x5c/0x74
      
      While registering nest imc at init, cpu-hotplug callback
      nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
      the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
      memory and cpuhotplug setup.
      
      But when cleaning up the attribute group, we are dereferencing the
      attribute element array without checking whether the backing element
      is not NULL. This causes the kernel panic.
      
      Add a check for the backing element prior to dereferencing the
      attribute element, to handle the failing case gracefully.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      [mpe: Trim change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d8ba162
  3. 12 10月, 2017 2 次提交
    • A
      powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30
      Anju T Sudhakar 提交于
      Stack trace output during a stress test:
       [    4.310049] Freeing initrd memory: 22592K
      [    4.310646] rtas_flash: no firmware flash support
      [    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
      [    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
      [    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
      [    4.313588] Call Trace:
      [    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
      [    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
      [    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
      [    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
      [    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
      [    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
      [    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
      [    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
      [    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
      [    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
      [    4.314313] Mem-Info:
      [    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0
      
      core_imc_mem_init() at system boot use alloc_pages_node() to get memory
      and alloc_pages_node() throws this stack dump when tried to allocate
      memory from a node which has no memory behind it. Add a ___GFP_NOWARN
      flag in allocation request as a fix.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd4f2b30
    • A
      powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820
      Anju T Sudhakar 提交于
      Nest/core pmu units are enabled only when it is used. A reference count is
      maintained for the events which uses the nest/core pmu units. Currently in
      *_imc_counters_release function a WARN() is used for notification of any
      underflow of ref count.
      
      The case where event ref count hit a negative value is, when perf session is
      started, followed by offlining of all cpus in a given core.
      i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
      ref->count to zero, if the current cpu which is about to offline is the last
      cpu in a given core and make an OPAL call to disable the engine in that core.
      And on perf session termination, perf->destroy (core_imc_counters_release) will
      first decrement the ref->count for this core and based on the ref->count value
      an opal call is made to disable the core-imc engine.
      Now, since cpuhotplug path already clears the ref->count for core and disabled
      the engine, perf->destroy() decrementing again at event termination make it
      negative which in turn fires the WARN_ON. The same happens for nest units.
      
      Add a check to see if the reference count is alreday zero, before decrementing
      the count, so that the ref count will not hit a negative value.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d923820
  4. 10 10月, 2017 3 次提交
    • T
      powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology() · 6b2c08f9
      Thiago Jung Bauermann 提交于
      It turns out that not all paths calling arch_update_cpu_topology() hold
      cpu_hotplug_lock, but that's OK because those paths can't race with
      any concurrent hotplug events.
      
      Warnings were reported with the following trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        sched_init_domains
        sched_init_smp
        kernel_init_freeable
        kernel_init
        ret_from_kernel_thread
      
      Which is safe because it's called early in boot when hotplug is not
      live yet.
      
      And also this trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        partition_sched_domains
        cpuset_update_active_cpus
        sched_cpu_deactivate
        cpuhp_invoke_callback
        cpuhp_down_callbacks
        cpuhp_thread_fun
        smpboot_thread_fn
        kthread
        ret_from_kernel_thread
      
      Which is safe because it's called as part of CPU hotplug, so although
      we don't hold the CPU hotplug lock, there is another thread driving
      the CPU hotplug operation which does hold the lock, and there is no
      race.
      
      Thanks to tglx for deciphering it for us.
      
      Fixes: 3e401f7a ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd")
      Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b2c08f9
    • S
      powerpc/lib/sstep: Fix count leading zeros instructions · b0490a04
      Sandipan Das 提交于
      According to the GCC documentation, the behaviour of __builtin_clz()
      and __builtin_clzl() is undefined if the value of the input argument
      is zero. Without handling this special case, these builtins have been
      used for emulating the following instructions:
        * Count Leading Zeros Word (cntlzw[.])
        * Count Leading Zeros Doubleword (cntlzd[.])
      
      This fixes the emulated behaviour of these instructions by adding an
      additional check for this special case.
      
      Fixes: 3cdfcbfd ("powerpc: Change analyse_instr so it doesn't modify *regs")
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b0490a04
    • K
      powerpc/livepatch: Fix livepatch stack access · e36a82ee
      Kamalesh Babulal 提交于
      While running stress test with livepatch module loaded, kernel bug was
      triggered.
      
        cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
        5:mon> t
        [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
        [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
         --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74
        [c0000000eb9d4120] 0000000057ac6e9d (unreliable)
        [d0000000089d9f78] 2e0965747962382e
        SP (965747962342e09) is in userspace
      
      When an interrupt occurs during the livepatch_handler execution, it's
      possible for the livepatch_stack and/or thread_info to be corrupted.
      eg:
      
        Task A                        Interrupt Handler
        =========                     =================
        livepatch_handler:
        mr r0, r1
        ld r1, TI_livepatch_sp(r12)
                                      hardware_interrupt_common:
                                        do_IRQ+0x8:
                                          mflr    r0          <- saved stack pointer is overwritten
                                          bl      _mcount
                                          ...
                                          std     r27,-40(r1) <- overwrite of thread_info()
      
        lis r2, STACK_END_MAGIC@h
        ori r2, r2, STACK_END_MAGIC@l
        ld  r12, -8(r1)
      
      Fix the corruption by using r11 register for livepatch stack
      manipulation, instead of shuffling task stack and livepatch stack into
      r1 register. Using r11 register also avoids disabling/enabling irq's
      while setting up the livepatch stack.
      Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e36a82ee
  5. 06 10月, 2017 3 次提交
    • G
      powerpc/tm: Fix illegal TM state in signal handler · 044215d1
      Gustavo Romero 提交于
      Currently it's possible that on returning from the signal handler
      through the restore_tm_sigcontexts() code path (e.g. from a signal
      caught due to a `trap` instruction executed in the middle of an HTM
      block, or a deliberately constructed sigframe) an illegal TM state
      (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
      implicitly the MSR register from SRR1 register on return to userspace
      it causes a TM Bad Thing exception.
      
      That illegal state can be set (a) by a malicious user that disables
      the TM bit by tweaking the bits in uc_mcontext before returning from
      the signal handler or (b) by a sufficient number of context switches
      occurring such that the load_tm counter overflows and TM is disabled
      whilst in the signal handler.
      
      This commit fixes the illegal TM state by ensuring that TM bit is
      always enabled before we return from restore_tm_sigcontexts(). A small
      comment correction is made as well.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      044215d1
    • C
      powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks · 265e60a1
      Cyril Bur 提交于
      When using transactional memory (TM), the CPU can be in one of six
      states as far as TM is concerned, encoded in the Machine State
      Register (MSR). Certain state transitions are illegal and if attempted
      trigger a "TM Bad Thing" type program check exception.
      
      If we ever hit one of these exceptions it's treated as a bug, ie. we
      oops, and kill the process and/or panic, depending on configuration.
      
      One case where we can trigger a TM Bad Thing, is when returning to
      userspace after a system call or interrupt, using RFID. When this
      happens the CPU first restores the user register state, in particular
      r1 (the stack pointer) and then attempts to update the MSR. However
      the MSR update is not allowed and so we take the program check with
      the user register state, but the kernel MSR.
      
      This tricks the exception entry code into thinking we have a bad
      kernel stack pointer, because the MSR says we're coming from the
      kernel, but r1 is pointing to userspace.
      
      To avoid this we instead always switch to the emergency stack if we
      take a TM Bad Thing from the kernel. That way none of the user
      register values are used, other than for printing in the oops message.
      
      This is the fix for CVE-2017-1000255.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Rewrite change log & comments, tweak asm slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      265e60a1
    • A
      powerpc/powernv: Increase memory block size to 1GB on radix · 53ecde0b
      Anton Blanchard 提交于
      Memory hot unplug on PowerNV radix hosts is broken. Our memory block
      size is 256MB but since we map the linear region with very large
      pages, each pte we tear down maps 1GB.
      
      A hot unplug of one 256MB memory block results in 768MB of memory
      getting unintentionally unmapped. At this point we are likely to oops.
      
      Fix this by increasing our memory block size to 1GB on PowerNV radix
      hosts.
      
      Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53ecde0b
  6. 04 10月, 2017 7 次提交
    • G
      powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled · 7c6a4f3b
      Guenter Roeck 提交于
      flush_tlb_kernel_range() may call smp_call_function_many() which expects
      interrupts to be enabled. This results in a traceback.
      
      WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f560 #1
      task: cf830000 task.stack: cf82e000
      NIP:  c00a93c8 LR: c00a9634 CTR: 00000001
      REGS: cf82fde0 TRAP: 0700   Not tainted  (4.14.0-rc1-00009-g0666f560)
      MSR:  00021000 <CE,ME>  CR: 24000082  XER: 00000000
      
      GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001
      GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000
      GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000
      GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000
      NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc
      LR [c00a9634] smp_call_function+0x3c/0x50
      Call Trace:
      [cf82fe90] [00000010] 0x10 (unreliable)
      [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50
      [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38
      [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c
      [cf82ff20] [c001484c] free_initmem+0x20/0x4c
      [cf82ff30] [c000316c] kernel_init+0x1c/0x108
      [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64
      Instruction dump:
      7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac
      3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78
      
      Fixes: 3184cc4b ("powerpc/mm: Fix kernel RAM protection after freeing ...")
      Fixes: e611939f ("powerpc/mm: Ensure change_page_attr() doesn't ...")
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7c6a4f3b
    • C
      powerpc/xive: Clear XIVE internal structures when a CPU is removed · cc569398
      Cédric Le Goater 提交于
      Commit eac1e731 ("powerpc/xive: guest exploitation of the XIVE
      interrupt controller") introduced support for the XIVE exploitation
      mode of the P9 interrupt controller on the pseries platform.
      
      At that time, support for CPU removal was not complete on PowerVM and
      CPU hot unplug remained untested. It appears that some cleanups of the
      XIVE internal structures are required before releasing the CPU,
      without which the kernel crashes in a RTAS call doing the CPU
      isolation.
      
      These changes fix the crash by deconfiguring the IPI interrupt source
      and clearing the event queues of the CPU when it is removed.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc569398
    • C
      powerpc/xive: Fix IPI reset · 74f12821
      Cédric Le Goater 提交于
      When resetting an IPI, hw_ipi should also be set to zero.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      74f12821
    • T
      powerpc/watchdog: Make use of watchdog_nmi_probe() · 34ddaa3e
      Thomas Gleixner 提交于
      The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
      on powerpc because it is called multiple times for the boot CPU.
      
      The first call is via:
      
        start_wd_on_cpu+0x80/0x2f0
        watchdog_nmi_reconfigure+0x124/0x170
        softlockup_reconfigure_threads+0x110/0x130
        lockup_detector_init+0xbc/0xe0
        kernel_init_freeable+0x18c/0x37c
        kernel_init+0x2c/0x160
        ret_from_kernel_thread+0x5c/0xbc
      
      And then again via the CPU hotplug registration:
      
        start_wd_on_cpu+0x80/0x2f0
        cpuhp_invoke_callback+0x194/0x620
        cpuhp_thread_fun+0x7c/0x1b0
        smpboot_thread_fn+0x290/0x2a0
        kthread+0x168/0x1b0
        ret_from_kernel_thread+0x5c/0xbc
      
      This can be avoided by setting up the cpu hotplug state with nocalls and
      move the initialization to the watchdog_nmi_probe() function. That
      initializes the hotplug callbacks without invoking the callback and the
      following core initialization function then configures the watchdog for the
      online CPUs (in this case CPU0) via softlockup_reconfigure_threads().
      Reported-and-tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      34ddaa3e
    • T
      watchdog/core, powerpc: Lock cpus across reconfiguration · e31d6883
      Thomas Gleixner 提交于
      Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
      threads and reaquiring for restart, the code and the protection rules
      become more obvious when holding cpu hotplug lock across the full
      reconfiguration.
      Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
      e31d6883
    • T
      watchdog/core, powerpc: Replace watchdog_nmi_reconfigure() · 6b9dc480
      Thomas Gleixner 提交于
      The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
      into two stages. One to stop the NMI and one to restart it after
      reconfiguration. That was done by adding a boolean 'run' argument to the
      code, which is functionally correct but not necessarily a piece of art.
      
      Replace it by two explicit functions: watchdog_nmi_stop() and
      watchdog_nmi_start().
      
      Fixes: 6592ad2f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
      Requested-by: NLinus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: NThomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
      6b9dc480
    • I
      rapidio: remove global irq spinlocks from the subsystem · 31d1e130
      Ioan Nicu 提交于
      Locking of config and doorbell operations should be done only if the
      underlying hardware requires it.
      
      This patch removes the global spinlocks from the rapidio subsystem and
      moves them to the mport drivers (fsl_rio and tsi721), only to the
      necessary places.  For example, local config space read and write
      operations (lcread/lcwrite) are atomic in all existing drivers, so there
      should be no need for locking, while the cread/cwrite operations which
      generate maintenance transactions need to be synchronized with a lock.
      
      Later, each driver could chose to use a per-port lock instead of a
      global one, or even more granular locking.
      
      Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.comSigned-off-by: NIoan Nicu <ioan.nicu.ext@nokia.com>
      Signed-off-by: NFrank Kunz <frank.kunz@nokia.com>
      Acked-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31d1e130
  7. 03 10月, 2017 3 次提交
    • S
      KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive() · 2fb1e946
      Sam Bobroff 提交于
      In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the
      value of state->guest_server as "server". However, this value is not
      set by it's counterpart kvmppc_xive_set_xive(). When the guest uses
      this interface to migrate interrupts away from a CPU that is going
      offline, it sees all interrupts as belonging to CPU 0, so they are
      left assigned to (now) offline CPUs.
      
      This patch removes the guest_server field from the state, and returns
      act_server in it's place (that is, the CPU actually handling the
      interrupt, which may differ from the one requested).
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2fb1e946
    • C
      powerpc/4xx: Fix compile error with 64K pages on 40x, 44x · 070e0049
      Christian Lamparter 提交于
      The mmu context on the 40x, 44x does not define pte_frag entry. This
      causes gcc abort the compilation due to:
      
        setup-common.c: In function ‘setup_arch’:
        setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’
      
      This patch fixes the issue by removing the pte_frag initialization in
      setup-common.c.
      
      This is possible, because the compiler will do the initialization,
      since the mm_context is a sub struct of init_mm. init_mm is declared
      in mm_types.h as external linkage.
      
      According to C99 6.2.4.3:
        An object whose identifier is declared with external linkage
        [...] has static storage duration.
      
      C99 defines in 6.7.8.10 that:
        If an object that has static storage duration is not
        initialized explicitly, then:
        - if it has pointer type, it is initialized to a null pointer
      
      Fixes: b1923caa ("powerpc: Merge 32-bit and 64-bit setup_arch()")
      Signed-off-by: NChristian Lamparter <chunkeey@gmail.com>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      070e0049
    • J
      powerpc: Fix action argument for cpufeatures-based TLB flush · 3b7af5c0
      Jeremy Kerr 提交于
      Commit 41d0c2ec ("powerpc/powernv: Fix local TLB flush for boot
      and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the
      cpufeatures code, specifying the number of sets to flush.
      
      However, these functions take an action argument, not a number of
      sets. This means we hit the BUG() in __flush_tlb_{206,300} when using
      cpufeatures-style configuration.
      
      This change passes TLB_INVAL_SCOPE_GLOBAL instead.
      
      Fixes: 41d0c2ec ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3b7af5c0
  8. 29 9月, 2017 1 次提交
  9. 26 9月, 2017 1 次提交
    • M
      powerpc: Handle MCE on POWER9 with only DSISR bit 30 set · d8bd9f3f
      Michael Neuling 提交于
      On POWER9 DD2.1 and below, it's possible for a paste instruction to
      cause a Machine Check Exception (MCE) where only DSISR bit 30 (IBM 33)
      is set. This will result in the MCE handler seeing an unknown event,
      which triggers linux to crash.
      
      We change this by detecting unknown events caused by load/stores in
      the MCE handler and marking them as handled so that we no longer
      crash.
      
      An MCE that occurs like this is spurious, so we don't need to do
      anything in terms of servicing it. If there is something that needs to
      be serviced, the CPU will raise the MCE again with the correct DSISR
      so that it can be serviced properly.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: Nicholas Piggin <npiggin@gmail.com
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Expand comment with details from change log, use normal bit #s]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d8bd9f3f
  10. 22 9月, 2017 1 次提交
    • M
      KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception · e001fa78
      Michael Neuling 提交于
      On POWER9 DD2.1 and below, sometimes on a Hypervisor Data Storage
      Interrupt (HDSI) the HDSISR is not be updated at all.
      
      To work around this we put a canary value into the HDSISR before
      returning to a guest and then check for this canary when we take a
      HDSI. If we find the canary on a HDSI, we know the hardware didn't
      update the HDSISR. In this case we return to the guest to retake the
      HDSI which should correctly update the HDSISR the second time HDSI
      entry.
      
      After talking to Paulus we've applied this workaround to all POWER9
      CPUs. The workaround of returning to the guest shouldn't ever be
      triggered on well behaving CPU. The extra instructions should have
      negligible performance impact.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e001fa78
  11. 21 9月, 2017 3 次提交
    • T
      powerpc/pseries: Fix parent_dn reference leak in add_dt_node() · b537ca6f
      Tyrel Datwyler 提交于
      A reference to the parent device node is held by add_dt_node() for the
      node to be added. If the call to dlpar_configure_connector() fails
      add_dt_node() returns ENOENT and that reference is not freed.
      
      Add a call to of_node_put(parent_dn) prior to bailing out after a
      failed dlpar_configure_connector() call.
      
      Fixes: 8d5ff320 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b537ca6f
    • T
      powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPAR · 087ff6a5
      Tyrel Datwyler 提交于
      Commit 215ee763 ("powerpc: pseries: remove dlpar_attach_node
      dependency on full path") reworked dlpar_attach_node() to no longer
      look up the parent node "/cpus", but instead to have the parent node
      passed by the caller in the function parameter list.
      
      As a result dlpar_attach_node() is no longer responsible for freeing
      the reference to the parent node. However, commit 215ee763 failed
      to remove the of_node_put(parent) call in dlpar_attach_node(), or to
      take into account that the reference to the parent in the caller
      dlpar_cpu_add() needs to be held until after dlpar_attach_node()
      returns.
      
      As a result doing repeated cpu add/remove dlpar operations will
      eventually result in the following error:
      
        OF: ERROR: Bad of_node_put() on /cpus
        CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
        Call Trace:
         dump_stack+0x15c/0x1f8 (unreliable)
         of_node_release+0x1a4/0x1c0
         kobject_put+0x1a8/0x310
         kobject_del+0xbc/0xf0
         __of_detach_node_sysfs+0x144/0x210
         of_detach_node+0xf0/0x180
         dlpar_detach_node+0xc4/0x120
         dlpar_cpu_remove+0x280/0x560
         dlpar_cpu_release+0xbc/0x1b0
         arch_cpu_release+0x6c/0xb0
         cpu_release_store+0xa0/0x100
         dev_attr_store+0x68/0xa0
         sysfs_kf_write+0xa8/0xf0
         kernfs_fop_write+0x2cc/0x400
         __vfs_write+0x5c/0x340
         vfs_write+0x1a8/0x3d0
         SyS_write+0xa8/0x1a0
         system_call+0x58/0x6c
      
      Fix the issue by removing the of_node_put(parent) call from
      dlpar_attach_node(), and ensuring that the reference to the parent
      node is properly held and released by the caller dlpar_cpu_add().
      
      Fixes: 215ee763 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      [mpe: Add a comment in the code and frob the change log slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      087ff6a5
    • B
      powerpc/eeh: Create PHB PEs after EEH is initialized · 3e77adee
      Benjamin Herrenschmidt 提交于
      Otherwise we end up not yet having computed the right diag data size
      on powernv where EEH initialization is delayed, thus causing memory
      corruption later on when calling OPAL.
      
      Fixes: 5cb1f8fd ("powerpc/powernv/pci: Dynamically allocate PHB diag data")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3e77adee
  12. 20 9月, 2017 8 次提交
  13. 15 9月, 2017 2 次提交
  14. 14 9月, 2017 2 次提交
    • T
      watchdog/hardlockup: Clean up hotplug locking mess · ab5fe3ff
      Thomas Gleixner 提交于
      All watchdog thread related functions are delegated to the smpboot thread
      infrastructure, which handles serialization against CPU hotplug correctly.
      
      The sysctl interface is completely decoupled from anything which requires
      CPU hotplug protection.
      
      No need to protect the sysctl writes against cpu hotplug anymore. Remove it
      and add the now required protection to the powerpc arch_nmi_watchdog
      implementation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab5fe3ff
    • T
      watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage · 6592ad2f
      Thomas Gleixner 提交于
      Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
      need to be done in two steps.
      
           1) Stop all NMIs
           2) Read the new parameters and start NMIs
      
      Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
      clean reconfiguration add a 'run' argument and split the functionality in
      powerpc.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6592ad2f