1. 31 8月, 2017 8 次提交
    • S
      powerpc/powernv/vas: Define vas_init() and vas_exit() · 4dea2d1a
      Sukadev Bhattiprolu 提交于
      Implement vas_init() and vas_exit() functions for a new VAS module.
      This VAS module is essentially a library for other device drivers
      and kernel users of the NX coprocessors like NX-842 and NX-GZIP.
      In the future this will be extended to add support for user space
      to access the NX coprocessors.
      
      VAS is currently only supported with 64K page size.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4dea2d1a
    • S
      powerpc/powernv/vas: Define macros, register fields and structures · 96768914
      Sukadev Bhattiprolu 提交于
      Define macros for the VAS hardware registers and bit-fields as well
      as couple of data structures needed by the VAS driver.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      [mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      96768914
    • A
      powerpc/eeh: Remove unnecessary config_addr from eeh_dev · 405b33a7
      Alexey Kardashevskiy 提交于
      The eeh_dev struct hold a config space address of an associated node
      and the very same address is also stored in the pci_dn struct which
      is always present during the eeh_dev lifetime.
      
      This uses bus:devfn directly from pci_dn instead of cached and packed
      config_addr.
      
      Since config_addr is made from device's bus:dev.fn, there is no point
      in keeping it in the debugfs either so remove that too.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      405b33a7
    • A
      powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev · 69672bd7
      Alexey Kardashevskiy 提交于
      The eeh_dev struct already holds a pointer to pci_dn which it does not
      exist without and pci_dn itself holds the very same pointer so just
      use it.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69672bd7
    • A
      powerpc/eeh: Reduce to one the number of places where edev is allocated · 8bae6a23
      Alexey Kardashevskiy 提交于
      arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
      is allocated; other 2 places allocate it on stack and in the heap for
      a very short period of time to use eeh_pe_get() as takes edev.
      
      This changes eeh_pe_get() to receive required parameters explicitly.
      
      This removes unnecessary temporary allocation of edev.
      
      This uses the "pe_no" name instead of the "pe_config_addr" name as
      it actually is a PE number and not a config space address as it seemed.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8bae6a23
    • N
      powerpc/powernv: Use kernel crash path for machine checks · 6fcd6baa
      Nicholas Piggin 提交于
      There are quite a few machine check exceptions that can be caused by
      kernel bugs. To make debugging easier, use the kernel crash path in
      cases of synchronous machine checks that occur in kernel mode, if that
      would not result in the machine going straight to panic or crash dump.
      
      There is a downside here that die()ing the process in kernel mode can
      still leave the system unstable. panic_on_oops will always force the
      system to fail-stop, so systems where that behaviour is important will
      still do the right thing.
      
      As a test, when triggering an i-side 0111b error (ifetch from foreign
      address) in kernel mode process context on POWER9, the kernel currently
      dies quickly like this:
      
        Severe Machine check interrupt [Not recovered]
          NIP [ffff000000000000]: 0xffff000000000000
          Initiator: CPU
          Error type: Real address [Instruction fetch (foreign)]
        [  127.426651616,0] OPAL: Reboot requested due to Platform error.
            Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
        opal: Reboot type 1 not supported
        Kernel panic - not syncing: PowerNV Unrecovered Machine Check
        CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #35
        Call Trace:
        [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
          buggy/mising code in OPAL for this BMC
          Rebooting in 10 seconds..
        Trying to free IRQ 496 from IRQ context!
      
      After this patch, the process is killed and the kernel continues with
      this message, which gives enough information to identify the offending
      branch (i.e., with CFAR):
      
        Severe Machine check interrupt [Not recovered]
          NIP [ffff000000000000]: 0xffff000000000000
          Initiator: CPU
          Error type: Real address [Instruction fetch (foreign)]
            Effective address: ffff000000000000
        Oops: Machine check, sig: 7 [#1]
        SMP NR_CPUS=2048
        NUMA
        PowerNV
        Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
        CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #36
        task: c000000932300000 task.stack: c000000932380000
        NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
        REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a26-dirty)
        MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
          CR: 24000484  XER: 20000000
        CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
        GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
        GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
        GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
        GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
        NIP [ffff000000000000] 0xffff000000000000
        LR [00000000217706a4] 0x217706a4
        Call Trace:
        Instruction dump:
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6fcd6baa
    • N
      powerpc/powernv: Flush console before platform error reboot · b746e3e0
      Nicholas Piggin 提交于
      Unrecovered MCE and HMI errors are sent through a special restart OPAL
      call to log the platform error. The downside is that they don't go
      through normal Linux crash paths, so they don't give much information
      to the Linux console.
      
      Change this by providing a special crash function which does some of
      the console flushing from the panic() path before calling firmware to
      reboot.
      
      The downside of this is a little more code to execute before reaching
      the firmware reboot. However in practice, it's critical to get the
      Linux console messages output in order to debug a problem. So this is
      a desirable tradeoff.
      
      Note on the implementation: It is difficult to plumb a custom reboot
      handler into the panic path, because panic does a little bit too much
      work. For example, it will try to delay with the timebase, but that
      may be corrupted in some cases resulting in a hang without reaching
      the platform reboot. Another problem is that panic can invoke the
      crash dump code which is not what we want in the case of a hardware
      platform error. Long-term the best solution will be to rework the
      panic path so it can be suitable for this kind of panic, but for now
      we just duplicate a bit of the code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b746e3e0
    • N
      powerpc/powernv: powernv platform is not constrained by RMA · 76b42e28
      Nicholas Piggin 提交于
      Remove incorrect comment about real mode address restrictions on
      powernv (bare metal), and unnecessary clamping to ppc64_rma_size.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      76b42e28
  2. 28 8月, 2017 1 次提交
  3. 24 8月, 2017 1 次提交
    • R
      powerpc/powernv: Enable removal of memory for in memory tracing · 9d5171a8
      Rashmica Gupta 提交于
      The hardware trace macro feature requires access to a chunk of real
      memory. This patch provides a debugfs interface to do this. By
      writing an integer containing the size of memory to be unplugged into
      /sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to
      remove that much memory from the end of each NUMA node.
      
      This patch also adds additional debugsfs files for each node that
      allows the tracer to interact with the removed memory, as well as
      a trace file that allows userspace to read the generated trace.
      
      Note that this patch does not invoke the hardware trace macro, it
      only allows memory to be removed during runtime for the trace macro
      to utilise.
      Signed-off-by: NRashmica Gupta <rashmica.g@gmail.com>
      [mpe: Minor formatting etc fixups]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9d5171a8
  4. 23 8月, 2017 1 次提交
  5. 17 8月, 2017 1 次提交
  6. 10 8月, 2017 3 次提交
  7. 08 8月, 2017 3 次提交
    • G
      powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails · 785a12af
      Gautham R. Shenoy 提交于
      Currently, we use the opal call opal_slw_set_reg() to inform the
      Sleep-Winkle Engine (SLW) to restore the contents of some of the
      Hypervisor state on wakeup from deep idle states that lose full
      hypervisor context (characterized by the flag
      OPAL_PM_LOSE_FULL_CONTEXT).
      
      However, the current code has a bug in that if opal_slw_set_reg()
      fails, we don't disable the use of these deep states (winkle on
      POWER8, stop4 onwards on POWER9).
      
      This patch fixes this bug by ensuring that if programing the
      sleep-winkle engine to restore the hypervisor states in
      pnv_save_sprs_for_deep_states() fails, then we exclude such states by
      clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from
      supported_cpuidle_states. As a result POWER8 will be prevented from
      using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
      the default stop state when available.
      
      Further, we ensure in the initialization of the cpuidle-powernv driver
      to only include those states whose flags are present in
      supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT
      states when they have been disabled due to stop-api failure.
      
      Fixes: 1e1601b3 ("powerpc/powernv/idle: Restore SPRs for deep idle
      states via stop API.")
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      785a12af
    • M
      powerpc/powernv: Use darn instruction for get_random_seed() on Power9 · e66ca3db
      Matt Brown 提交于
      This adds powernv_get_random_darn() which utilises the darn instruction,
      introduced in ISA v3.0/POWER9.
      
      The darn instruction can potentially return an error, which is supported
      by the get_random_seed() API, in normal usage if we see an error we just
      return that to the caller.
      
      However when detecting whether darn is functional at boot we try up to
      10 times, before deciding that darn doesn't work and failing the
      registration of get_random_seed(). That way an intermittent failure
      at boot doesn't deprive the system of randomness until the next reboot.
      Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com>
      [mpe: Move init into a function, tweak change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e66ca3db
    • F
      powerpc/powernv: Enable PCI peer-to-peer · 25529100
      Frederic Barrat 提交于
      P9 has support for PCI peer-to-peer, enabling a device to write in the
      MMIO space of another device directly, without interrupting the CPU.
      
      This patch adds support for it on powernv, by adding a new API to be
      called by drivers. The pnv_pci_set_p2p(...) call configures an
      'initiator', i.e the device which will issue the MMIO operation, and a
      'target', i.e. the device on the receiving side.
      
      P9 really only supports MMIO stores for the time being but that's
      expected to change in the future, so the API allows to define both
      load and store operations.
      
        /* PCI p2p descriptor */
        #define OPAL_PCI_P2P_ENABLE           0x1
        #define OPAL_PCI_P2P_LOAD             0x2
        #define OPAL_PCI_P2P_STORE            0x4
      
        int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                            u64 desc)
      
      It uses a new OPAL call, as the configuration magic is done on the
      PHBs by skiboot.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25529100
  8. 01 8月, 2017 1 次提交
    • G
      powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug · 24be85a2
      Gautham R. Shenoy 提交于
      Currently we use the stop-api provided by the firmware to program the
      SLW engine to restore the values of hypervisor resources that get lost
      on deeper idle states (such as winkle). Since the deep states were
      only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
      to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
      woken up by decrementer.
      
      On POWER9, some of the deep platform idle states such as stop4 can be
      used in cpuidle as well. In this case, we want the CPU in stop4 to be
      woken up by the decrementer when some timer on the CPU expires.
      
      In this patch, we program the stop-api for LPCR with PECE1
      bit cleared only when we are offlining the CPU and set it
      back once the CPU is online.
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      24be85a2
  9. 28 7月, 2017 1 次提交
  10. 25 7月, 2017 2 次提交
  11. 24 7月, 2017 3 次提交
  12. 17 7月, 2017 1 次提交
    • M
      powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores() · a70b487b
      Michael Ellerman 提交于
      In commit 1c0eaf0f ("powerpc/powernv: Tell OPAL about our MMU mode
      on POWER9"), we added additional flags to the OPAL call to configure
      CPUs at boot.
      
      These flags only work on Power9 firmwares, and worse can cause boot
      failures on Power8 machines, so we check for CPU_FTR_ARCH_300 (aka POWER9)
      before adding the extra flags.
      
      Unfortunately we forgot that opal_configure_cores() is called before
      the CPU feature checks are dynamically patched, meaning the check
      always returns true.
      
      We definitely need to do something to make the CPU feature checks less
      prone to bugs like this, but for now the minimal fix is to use
      early_cpu_has_feature().
      Reported-and-tested-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Fixes: 1c0eaf0f ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a70b487b
  13. 10 7月, 2017 1 次提交
  14. 28 6月, 2017 2 次提交
  15. 27 6月, 2017 5 次提交
    • R
      powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space · 8e3f1b1d
      Russell Currey 提交于
      On PHB3/POWER8 systems, devices can select between two different sections
      of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
      that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
      with the capability of addressing over 4GB, is performed by setting bit 59
      of a PCI address.
      
      However, some devices aren't capable of addressing at least 59 bits, but
      still want more than 4GB of DMA space.  In order to enable this, reconfigure
      TVE#0 to be suitable for 64-bit devices by allocating memory past the
      initial 4GB that is inaccessible by 64-bit DMAs.
      
      This bypass mode is only enabled if a device requests 4GB or more of DMA
      address space, if the system has PHB3 (POWER8 systems), and if the device
      does not share a PE with any devices from different vendors.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8e3f1b1d
    • R
      powerpc/powernv/pci: Add helper to check if a PE has a single vendor · a0f98629
      Russell Currey 提交于
      Add a helper that determines if all the devices contained in a given PE
      are all from the same vendor or not.  This can be useful in determining
      if it's okay to make PE-wide changes that may be suitable for some
      devices but not for others.
      
      This is used later in the series.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a0f98629
    • R
      powerpc/powernv/pci: Add support for PHB4 diagnostics · a4b48ba9
      Russell Currey 提交于
      As with P7IOC and PHB3, add kernel-side support for decoding and printing
      diagnostic data for PHB4.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4b48ba9
    • R
      powerpc/powernv/pci: Dynamically allocate PHB diag data · 5cb1f8fd
      Russell Currey 提交于
      Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
      This is simple, but either wastes memory (though only a few kilobytes) or
      in the case of PHB4 isn't enough to fit the whole data blob.
      
      For machines that don't describe the diagnostic data size in the device
      tree, use the hardcoded buffer size as before.  For those that do, only
      allocate exactly what's needed.
      
      In the special case of P7IOC (which has two types of diag data), the larger
      should be specified in the device tree.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5cb1f8fd
    • R
      powerpc/powernv/pci: Reduce spam when dumping PEST · 31bbd45a
      Russell Currey 提交于
      Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs
      are affected, especially in the case where the whole PHB is frozen and 512
      lines get printed.  Check for duplicates when dumping the PEST to reduce
      useless output.
      
      For example:
      
          PE[0f8] A/B: 9700002600000000 80000080d00000f8
          PE[0f9] A/B: 8000000000000000 0000000000000000
          PE[..0fe] A/B: as above
          PE[0ff] A/B: 8440002b00000000 0000000000000000
      
      instead of:
      
          PE[0f8] A/B: 9700002600000000 80000080d00000f8
          PE[0f9] A/B: 8000000000000000 0000000000000000
          PE[0fa] A/B: 8000000000000000 0000000000000000
          PE[0fb] A/B: 8000000000000000 0000000000000000
          PE[0fc] A/B: 8000000000000000 0000000000000000
          PE[0fd] A/B: 8000000000000000 0000000000000000
          PE[0fe] A/B: 8000000000000000 0000000000000000
          PE[0ff] A/B: 8440002b00000000 0000000000000000
      
      and you can imagine how much worse it can get for 512 PEs.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      31bbd45a
  16. 22 6月, 2017 1 次提交
  17. 19 6月, 2017 4 次提交
  18. 14 6月, 2017 1 次提交