1. 31 8月, 2017 11 次提交
    • S
      powerpc/powernv/vas: Define vas_rx_win_open() interface · 62c4eda4
      Sukadev Bhattiprolu 提交于
      Define the vas_rx_win_open() interface. This interface is intended to
      be used by the Nest Accelerator (NX) driver(s) to setup receive
      windows for one or more NX engines (which implement compression &
      encryption algorithms in the hardware).
      
      Follow-on patches will provide an interface to close the window and to
      open a send window that kernel subsystems can use to access the NX
      engines.
      
      The interface to open a receive window is expected to be invoked for
      each instance of VAS in the system.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      62c4eda4
    • S
      powerpc/powernv: Move GET_FIELD/SET_FIELD to vas.h · b6622a33
      Sukadev Bhattiprolu 提交于
      Move the GET_FIELD and SET_FIELD macros to vas.h as VAS and other
      users of VAS, including NX-842 can use those macros.
      
      There is a lot of related code between the VAS/NX kernel drivers
      and skiboot. For consistency, switch the order of parameters in
      SET_FIELD to match the order in skiboot.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Reviewed-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6622a33
    • S
      powerpc/powernv/vas: Define macros, register fields and structures · 96768914
      Sukadev Bhattiprolu 提交于
      Define macros for the VAS hardware registers and bit-fields as well
      as couple of data structures needed by the VAS driver.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      [mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      96768914
    • A
      powerpc/pci: Remove OF node back pointer from pci_dn · f1e08232
      Alexey Kardashevskiy 提交于
      The check_req() helper uses pci_get_pdn() to get an OF node pointer.
      pci_get_pdn() returns a pci_dn pointer which either:
      1) from the OF node returned by pci_device_to_OF_node();
      2) from the parent child_list where entries don't have OF node pointers.
      Since check_req() does not care about 2), it can call
      pci_device_to_OF_node() directly, hence the change.
      
      The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
      also stored in edev->pdev so let's take a shortcut and call
      pci_device_to_OF_node() directly.
      
      With these 2 changes, we can finally get rid of the OF node back pointer.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1e08232
    • A
      powerpc/eeh: Remove unnecessary config_addr from eeh_dev · 405b33a7
      Alexey Kardashevskiy 提交于
      The eeh_dev struct hold a config space address of an associated node
      and the very same address is also stored in the pci_dn struct which
      is always present during the eeh_dev lifetime.
      
      This uses bus:devfn directly from pci_dn instead of cached and packed
      config_addr.
      
      Since config_addr is made from device's bus:dev.fn, there is no point
      in keeping it in the debugfs either so remove that too.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      405b33a7
    • A
      powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev · 69672bd7
      Alexey Kardashevskiy 提交于
      The eeh_dev struct already holds a pointer to pci_dn which it does not
      exist without and pci_dn itself holds the very same pointer so just
      use it.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69672bd7
    • A
      powerpc/eeh: Reduce to one the number of places where edev is allocated · 8bae6a23
      Alexey Kardashevskiy 提交于
      arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
      is allocated; other 2 places allocate it on stack and in the heap for
      a very short period of time to use eeh_pe_get() as takes edev.
      
      This changes eeh_pe_get() to receive required parameters explicitly.
      
      This removes unnecessary temporary allocation of edev.
      
      This uses the "pe_no" name instead of the "pe_config_addr" name as
      it actually is a PE number and not a config space address as it seemed.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8bae6a23
    • N
      powerpc/powernv: Use kernel crash path for machine checks · 6fcd6baa
      Nicholas Piggin 提交于
      There are quite a few machine check exceptions that can be caused by
      kernel bugs. To make debugging easier, use the kernel crash path in
      cases of synchronous machine checks that occur in kernel mode, if that
      would not result in the machine going straight to panic or crash dump.
      
      There is a downside here that die()ing the process in kernel mode can
      still leave the system unstable. panic_on_oops will always force the
      system to fail-stop, so systems where that behaviour is important will
      still do the right thing.
      
      As a test, when triggering an i-side 0111b error (ifetch from foreign
      address) in kernel mode process context on POWER9, the kernel currently
      dies quickly like this:
      
        Severe Machine check interrupt [Not recovered]
          NIP [ffff000000000000]: 0xffff000000000000
          Initiator: CPU
          Error type: Real address [Instruction fetch (foreign)]
        [  127.426651616,0] OPAL: Reboot requested due to Platform error.
            Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
        opal: Reboot type 1 not supported
        Kernel panic - not syncing: PowerNV Unrecovered Machine Check
        CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #35
        Call Trace:
        [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
          buggy/mising code in OPAL for this BMC
          Rebooting in 10 seconds..
        Trying to free IRQ 496 from IRQ context!
      
      After this patch, the process is killed and the kernel continues with
      this message, which gives enough information to identify the offending
      branch (i.e., with CFAR):
      
        Severe Machine check interrupt [Not recovered]
          NIP [ffff000000000000]: 0xffff000000000000
          Initiator: CPU
          Error type: Real address [Instruction fetch (foreign)]
            Effective address: ffff000000000000
        Oops: Machine check, sig: 7 [#1]
        SMP NR_CPUS=2048
        NUMA
        PowerNV
        Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
        CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #36
        task: c000000932300000 task.stack: c000000932380000
        NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
        REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a26-dirty)
        MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
          CR: 24000484  XER: 20000000
        CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
        GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
        GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
        GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
        GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
        NIP [ffff000000000000] 0xffff000000000000
        LR [00000000217706a4] 0x217706a4
        Call Trace:
        Instruction dump:
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6fcd6baa
    • N
      powerpc/powernv: Flush console before platform error reboot · b746e3e0
      Nicholas Piggin 提交于
      Unrecovered MCE and HMI errors are sent through a special restart OPAL
      call to log the platform error. The downside is that they don't go
      through normal Linux crash paths, so they don't give much information
      to the Linux console.
      
      Change this by providing a special crash function which does some of
      the console flushing from the panic() path before calling firmware to
      reboot.
      
      The downside of this is a little more code to execute before reaching
      the firmware reboot. However in practice, it's critical to get the
      Linux console messages output in order to debug a problem. So this is
      a desirable tradeoff.
      
      Note on the implementation: It is difficult to plumb a custom reboot
      handler into the panic path, because panic does a little bit too much
      work. For example, it will try to delay with the timebase, but that
      may be corrupted in some cases resulting in a hang without reaching
      the platform reboot. Another problem is that panic can invoke the
      crash dump code which is not what we want in the case of a hardware
      platform error. Long-term the best solution will be to rework the
      panic path so it can be suitable for this kind of panic, but for now
      we just duplicate a bit of the code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b746e3e0
    • N
      powerpc: Do not call ppc_md.panic in fadump panic notifier · a3b2cb30
      Nicholas Piggin 提交于
      If fadump is not registered, and no other crash or debug handlers are
      registered, the powerpc panic handler stops the guest before the
      generic panic code can push out debug information to the console.
      
      Currently, system reset injection causes the guest to silently stop.
      
      Stop calling ppc_md.panic in the panic notifier. crash_fadump already
      does rtas_os_term() to terminate the guest if fadump is registered.
      
      Remove ppc_md.panic. Move fadump panic notifier into fadump code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a3b2cb30
    • N
      powerpc/64: Fix watchdog configuration regressions · 70412c55
      Nicholas Piggin 提交于
      This fixes a couple more bits of fallout from the new hard lockup watchdog
      patch.
      
      It restores the required hw_nmi_get_sample_period() function for the
      perf watchdog, and removes some function declarations on 64e that are only
      defined for 64s. This fixes the 64e build when the hardlockup detector is
      enabled.
      
      It restores the default behaviour of disabling the perf watchdog, and also
      fixes disabling the 64s watchdog when running as a guest.
      
      Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      70412c55
  2. 29 8月, 2017 2 次提交
  3. 23 8月, 2017 4 次提交
  4. 18 8月, 2017 1 次提交
  5. 17 8月, 2017 3 次提交
    • A
      powerpc/mm: Don't send IPI to all cpus on THP updates · fa4531f7
      Aneesh Kumar K.V 提交于
      Now that we made sure that lockless walk of linux page table is mostly
      limitted to current task(current->mm->pgdir) we can update the THP
      update sequence to only send IPI to CPUs on which this task has run.
      This helps in reducing the IPI overload on systems with large number
      of CPUs.
      
      WRT kvm even though kvm is walking page table with vpc->arch.pgdir,
      it is done only on secondary CPUs and in that case we have primary CPU
      added to task's mm cpumask. Sending an IPI to primary will force the
      secondary to do a vm exit and hence this mm cpumask usage is safe
      here.
      
      WRT CAPI, we still end up walking linux page table with capi context
      MM. For now the pte lookup serialization sends an IPI to all CPUs in
      CPI is in use. We can further improve this by adding the CAPI
      interrupt handling CPU to task mm cpumask. That will be done in a
      later patch.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fa4531f7
    • A
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V 提交于
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94171b19
    • N
      powerpc/string: Implement optimized memset variants · 694fc88c
      Naveen N. Rao 提交于
      Based on Matthew Wilcox's patches for other architectures.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      694fc88c
  6. 16 8月, 2017 3 次提交
  7. 15 8月, 2017 5 次提交
  8. 10 8月, 2017 10 次提交
  9. 08 8月, 2017 1 次提交
    • M
      powerpc/mm/hash64: Make vmalloc 56T on hash · 21a0e8c1
      Michael Ellerman 提交于
      On 64-bit book3s, with the hash MMU, we currently define the kernel
      virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a
      leftover from pre v3.7 when our user VM was also 16T.
      
      Of that 16T we split it 50/50, with half used for PCI IO and ioremap
      and the other 8T for vmalloc.
      
      We never bothered to make it any bigger because 8T of vmalloc ought to
      be enough for anybody. But it turns out that's not true, the per cpu
      allocator wants large amounts of vmalloc space, not to make large
      allocations, but to allow a large stride between allocations, because
      we use pcpu_embed_first_chunk().
      
      With a bit of juggling we can increase the entire kernel virtual space
      to 64T. The only real complication is the check of the address in the
      SLB miss handler, see the comment in the code.
      
      Although we could continue to split virtual space 50/50 as we do now,
      no one seems to be running out of PCI IO or ioremap space. So instead
      keep that as 8T, and use the remaining 56T for vmalloc.
      
      In future we should be able to increase the kernel virtual space to
      512T, the code already supports that, it just needs testing on older
      hardware.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      21a0e8c1