1. 19 6月, 2019 9 次提交
    • N
      powerpc/64s/radix: ioremap use ioremap_page_range · d38153f9
      Nicholas Piggin 提交于
      Radix can use ioremap_page_range for ioremap, after slab is available.
      This makes it possible to enable huge ioremap mapping support.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d38153f9
    • N
      powerpc/64: __ioremap_at clean up in the error case · a72808a7
      Nicholas Piggin 提交于
      __ioremap_at error handling is wonky, it requires caller to clean up
      after it. Implement a helper that does the map and error cleanup and
      remove the requirement from the caller.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a72808a7
    • A
      powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units. · 9c9f8fb7
      Anju T Sudhakar 提交于
      Nest and core IMC (In-Memory Collection counters) assigns a particular
      cpu as the designated target for counter data collection. During
      system boot, the first online cpu in a chip gets assigned as the
      designated cpu for that chip(for nest-imc) and the first online cpu in
      a core gets assigned as the designated cpu for that core(for
      core-imc).
      
      If the designated cpu goes offline, the next online cpu from the same
      chip(for nest-imc)/core(for core-imc) is assigned as the next target,
      and the event context is migrated to the target cpu. Currently,
      cpumask_any_but() function is used to find the target cpu. Though this
      function is expected to return a `random` cpu, this always returns the
      next online cpu.
      
      If all cpus in a chip/core is offlined in a sequential manner,
      starting from the first cpu, the event migration has to happen for all
      the cpus which goes offline. Since the migration process involves a
      grace period, the total time taken to offline all the cpus will be
      significantly high.
      
      Example:
        In a system which has 2 sockets, with
        NUMA node0 CPU(s):     0-87
        NUMA node8 CPU(s):     88-175
      
        Time taken to offline cpu 88-175:
        real    2m56.099s
        user    0m0.191s
        sys     0m0.000s
      
      Use cpumask_last() to choose the target cpu, when the designated cpu
      goes online, so the migration will happen only when the last_cpu in
      the mask goes offline. This way the time taken to offline all cpus in
      a chip/core can be reduced.
      
      With the patch:
      
        Time taken  to offline cpu 88-175:
        real    0m12.207s
        user    0m0.171s
        sys     0m0.000s
      
      Offlining all cpus in reverse order is also taken care because,
      cpumask_any_but() is used to find the designated cpu if the last cpu
      in the mask goes offline. Since cpumask_any_but() always return the
      first cpu in the mask, that becomes the designated cpu and migration
      will happen only when the first_cpu in the mask goes offline.
      
      Example: With the patch,
      
        Time taken to offline cpu from 175-88:
        real    0m9.330s
        user    0m0.110s
        sys     0m0.000s
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9c9f8fb7
    • S
      powerpc/64s: Fix misleading SPR and timebase information · 87997471
      Shaokun Zhang 提交于
      pr_info shows SPR and timebase as a decimal value with a '0x'
      prefix, which is somewhat misleading.
      
      Fix it to print hexadecimal, as was intended.
      
      Fixes: 10d91611 ("powerpc/64s: Reimplement book3s idle code in C")
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87997471
    • N
      powerpc/pseries: avoid blocking in irq when queuing hotplug events · 348ea30f
      Nathan Lynch 提交于
      A couple of bugs in queue_hotplug_event():
      
      1. Unchecked kmalloc result which could lead to an oops.
      2. Use of GFP_KERNEL allocations in interrupt context (this code's
         only caller is ras_hotplug_interrupt()).
      
      Use kmemdup to avoid open-coding the allocation+copy and check for
      failure; use GFP_ATOMIC for both allocations.
      
      Ultimately it probably would be better to avoid or reduce allocations
      in this path if possible.
      Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      348ea30f
    • R
      powerpc/watchpoint: Restore NV GPRs while returning from exception · f474c28f
      Ravi Bangoria 提交于
      powerpc hardware triggers watchpoint before executing the instruction.
      To make trigger-after-execute behavior, kernel emulates the
      instruction. If the instruction is 'load something into non-volatile
      register', exception handler should restore emulated register state
      while returning back, otherwise there will be register state
      corruption. eg, adding a watchpoint on a list can corrput the list:
      
        # cat /proc/kallsyms | grep kthread_create_list
        c00000000121c8b8 d kthread_create_list
      
      Add watchpoint on kthread_create_list->prev:
      
        # perf record -e mem:0xc00000000121c8c0
      
      Run some workload such that new kthread gets invoked. eg, I just
      logged out from console:
      
        list_add corruption. next->prev should be prev (c000000001214e00), \
      	but was c00000000121c8b8. (next=c00000000121c8b8).
        WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
        CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
        ...
        NIP __list_add_valid+0xb4/0xc0
        LR __list_add_valid+0xb0/0xc0
        Call Trace:
        __list_add_valid+0xb0/0xc0 (unreliable)
        __kthread_create_on_node+0xe0/0x260
        kthread_create_on_node+0x34/0x50
        create_worker+0xe8/0x260
        worker_thread+0x444/0x560
        kthread+0x160/0x1a0
        ret_from_kernel_thread+0x5c/0x70
      
      List corruption happened because it uses 'load into non-volatile
      register' instruction:
      
      Snippet from __kthread_create_on_node:
      
        c000000000136be8:     addis   r29,r2,-19
        c000000000136bec:     ld      r29,31424(r29)
              if (!__list_add_valid(new, prev, next))
        c000000000136bf0:     mr      r3,r30
        c000000000136bf4:     mr      r5,r28
        c000000000136bf8:     mr      r4,r29
        c000000000136bfc:     bl      c00000000059a2f8 <__list_add_valid+0x8>
      
      Register state from WARN_ON():
      
        GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
        GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
        GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
        GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
        GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
        GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
        GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
        GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
      
      Watchpoint hit at 0xc000000000136bec.
      
        addis   r29,r2,-19
         => r29 = 0xc000000001344e00 + (-19 << 16)
         => r29 = 0xc000000001214e00
      
        ld      r29,31424(r29)
         => r29 = *(0xc000000001214e00 + 31424)
         => r29 = *(0xc00000000121c8c0)
      
      0xc00000000121c8c0 is where we placed a watchpoint and thus this
      instruction was emulated by emulate_step. But because handle_dabr_fault
      did not restore emulated register state, r29 still contains stale
      value in above register state.
      
      Fixes: 5aae8a53 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: stable@vger.kernel.org # 2.6.36+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f474c28f
    • G
      powerpc/ps3: Use [] to denote a flexible array member · 0b1be03f
      Geert Uytterhoeven 提交于
      Flexible array members should be denoted using [] instead of [0], else
      gcc will not warn when they are no longer at the end of the structure.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0b1be03f
    • A
      powerpc/mm/32s: fix condition that is always true · 46c2478a
      Andreas Schwab 提交于
      Move a misplaced paren that makes the condition always true.
      
      Fixes: 63b2bc61 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      46c2478a
    • C
      powerpc/32s: fix suspend/resume when IBATs 4-7 are used · 6ecb78ef
      Christophe Leroy 提交于
      Previously, only IBAT1 and IBAT2 were used to map kernel linear mem.
      Since commit 63b2bc61 ("powerpc/mm/32s: Use BATs for
      STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping
      kernel text. But the suspend/restore functions only save/restore
      BATs 0 to 3, and clears BATs 4 to 7.
      
      Make suspend and restore functions respectively save and reload
      the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature.
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6ecb78ef
  2. 15 6月, 2019 4 次提交
  3. 14 6月, 2019 2 次提交
    • N
      powerpc/pseries: Fix oops in hotplug memory notifier · 0aa82c48
      Nathan Lynch 提交于
      During post-migration device tree updates, we can oops in
      pseries_update_drconf_memory() if the source device tree has an
      ibm,dynamic-memory-v2 property and the destination has a
      ibm,dynamic_memory (v1) property. The notifier processes an "update"
      for the ibm,dynamic-memory property but it's really an add in this
      scenario. So make sure the old property object is there before
      dereferencing it.
      
      Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0aa82c48
    • D
      powerpc/pseries/hvconsole: Fix stack overread via udbg · 934bda59
      Daniel Axtens 提交于
      While developing KASAN for 64-bit book3s, I hit the following stack
      over-read.
      
      It occurs because the hypercall to put characters onto the terminal
      takes 2 longs (128 bits/16 bytes) of characters at a time, and so
      hvc_put_chars() would unconditionally copy 16 bytes from the argument
      buffer, regardless of supplied length. However, udbg_hvc_putc() can
      call hvc_put_chars() with a single-byte buffer, leading to the error.
      
        ==================================================================
        BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110
        Read of size 8 at addr c0000000023e7a90 by task swapper/0
      
        CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113
        Call Trace:
          dump_stack+0x104/0x154 (unreliable)
          print_address_description+0xa0/0x30c
          __kasan_report+0x20c/0x224
          kasan_report+0x18/0x30
          __asan_report_load8_noabort+0x24/0x40
          hvc_put_chars+0xdc/0x110
          hvterm_raw_put_chars+0x9c/0x110
          udbg_hvc_putc+0x154/0x200
          udbg_write+0xf0/0x240
          console_unlock+0x868/0xd30
          register_console+0x970/0xe90
          register_early_udbg_console+0xf8/0x114
          setup_arch+0x108/0x790
          start_kernel+0x104/0x784
          start_here_common+0x1c/0x534
      
        Memory state around the buggy address:
         c0000000023e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         c0000000023e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
        >c0000000023e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
                                 ^
         c0000000023e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         c0000000023e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
      
      Document that a 16-byte buffer is requred, and provide it in udbg.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      934bda59
  4. 02 6月, 2019 6 次提交
    • G
      powerpc/pseries: Fix xive=off command line · a3bf9fbd
      Greg Kurz 提交于
      On POWER9, if the hypervisor supports XIVE exploitation mode, the
      guest OS will unconditionally requests for the XIVE interrupt mode
      even if XIVE was deactivated with the kernel command line xive=off.
      Later on, when the spapr XIVE init code handles xive=off, it disables
      XIVE and tries to fall back on the legacy mode XICS.
      
      This discrepency causes a kernel panic because the hypervisor is
      configured to provide the XIVE interrupt mode to the guest :
      
        kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
        ...
        NIP xics_smp_probe+0x38/0x98
        LR  xics_smp_probe+0x2c/0x98
        Call Trace:
          xics_smp_probe+0x2c/0x98 (unreliable)
          pSeries_smp_probe+0x40/0xa0
          smp_prepare_cpus+0x62c/0x6ec
          kernel_init_freeable+0x148/0x448
          kernel_init+0x2c/0x148
          ret_from_kernel_thread+0x5c/0x68
      
      Look for xive=off during prom_init and don't ask for XIVE in this
      case. One exception though: if the host only supports XIVE, we still
      want to boot so we ignore xive=off.
      
      Similarly, have the spapr XIVE init code to looking at the interrupt
      mode negotiated during CAS, and ignore xive=off if the hypervisor only
      supports XIVE.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.20
      Reported-by: NPavithra R. Prakash <pavrampu@in.ibm.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a3bf9fbd
    • G
      powerpc/powernv/npu: Fix reference leak · 02c5f539
      Greg Kurz 提交于
      Since 902bdc57, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
      has the effect of incrementing the reference count of the PCI device, as
      explained in drivers/pci/search.c:
      
       * Given a PCI domain, bus, and slot/function number, the desired PCI
       * device is located in the list of PCI devices. If the device is
       * found, its reference count is increased and this function returns a
       * pointer to its data structure.  The caller must decrement the
       * reference count by calling pci_dev_put().  If no device is found,
       * %NULL is returned.
      
      Nothing was done to call pci_dev_put() and the reference count of GPU and
      NPU PCI devices rockets up.
      
      A natural way to fix this would be to teach the callers about the change,
      so that they call pci_dev_put() when done with the pointer. This turns
      out to be quite intrusive, as it affects many paths in npu-dma.c,
      pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
      some affected code got moved around since then: it would be problematic
      to backport the fix to stable releases.
      
      All that code never cared for reference counting anyway. Call pci_dev_put()
      from get_pci_dev() to revert to the previous behavior.
      
      Fixes: 902bdc57 ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
      Cc: stable@vger.kernel.org # v4.16
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      02c5f539
    • M
      powerpc: Remove variable ‘path’ since not used · c806a6fd
      Mathieu Malaterre 提交于
      In commit eab00a20 ("powerpc: Move `path` variable inside
      DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning
      (treated as error with W=1):
      
        arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]
      
      Rework the original patch and simplify the code, by removing the
      variable ‘path’ completely. Fix line over 90 characters.
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c806a6fd
    • F
      powerpc/powernv: Show checkstop reason for NPU2 HMIs · 89d87bcb
      Frederic Barrat 提交于
      If the kernel is notified of an HMI caused by the NPU2, it's currently
      not being recognized and it logs the default message:
      
          Unknown Malfunction Alert of type 3
      
      The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
      possible causes, but we should at least log that it's an NPU problem
      and report which FIR and which bit were raised if opal gave us the
      information.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      89d87bcb
    • S
      powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI · 1549c42d
      Stewart Smith 提交于
      The first machines to ship with OPAL firmware all got firmware updates
      that have the new call, but just in case someone is foolish enough to
      believe the first 4 months of firmware is the best, we keep this code
      around.
      
      Comment is updated to not refer to late 2014 as recent or the future.
      Signed-off-by: NStewart Smith <stewart@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1549c42d
    • G
      powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() · efa9ace6
      Gen Zhang 提交于
      In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
      kstrdup() may return NULL, so it should be checked and handle error.
      And prop should be freed if 'prop->name' is NULL.
      Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      efa9ace6
  5. 28 5月, 2019 3 次提交
  6. 24 5月, 2019 8 次提交
  7. 21 5月, 2019 5 次提交
  8. 19 5月, 2019 1 次提交
  9. 18 5月, 2019 1 次提交
  10. 17 5月, 2019 1 次提交