1. 21 7月, 2016 1 次提交
  2. 08 7月, 2016 1 次提交
  3. 29 6月, 2016 1 次提交
  4. 09 2月, 2016 1 次提交
  5. 27 12月, 2015 2 次提交
    • A
      powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery() · dc3799bb
      Andrew Donnellan 提交于
      Fix off-by-one error in opal_mce_check_early_recovery() when checking
      whether the NIP falls within OPAL space.
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dc3799bb
    • R
      powerpc/powernv: Add a kmsg_dumper that flushes console output on panic · affddff6
      Russell Currey 提交于
      On BMC machines, console output is controlled by the OPAL firmware and is
      only flushed when its pollers are called.  When the kernel is in a panic
      state, it no longer calls these pollers and thus console output does not
      completely flush, causing some output from the panic to be lost.
      
      Output is only actually lost when the kernel is configured to not power off
      or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
      flushes the console buffer as part of its power down routines.  Before this
      patch, however, only partial output would be printed during the timeout wait.
      
      This patch adds a new kmsg_dumper which gets called at panic time to ensure
      panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
      in the OPAL API, and if that is not available, the pollers are called enough
      times to (hopefully) completely flush the buffer.
      
      The flushing mechanism will only affect output printed at and before the
      kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
      message may still be truncated as follows:
      
      >Call Trace:
      >[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
      >[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
      >[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
      >[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
      >[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
      >[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
      >[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
      >---[ end Kernel panic - not
      
      This functionality is implemented as a kmsg_dumper as it seems to be the
      most sensible way to introduce platform-specific functionality to the
      panic function.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      affddff6
  6. 17 12月, 2015 4 次提交
  7. 09 10月, 2015 1 次提交
    • D
      powerpc/powernv: Panic on unhandled Machine Check · f2dd80ec
      Daniel Axtens 提交于
      All unrecovered machine check errors on PowerNV should cause an
      immediate panic. There are 2 reasons that this is the right policy:
      it's not safe to continue, and we're already trying to reboot.
      
      Firstly, if we go through the recovery process and do not successfully
      recover, we can't be sure about the state of the machine, and it is
      not safe to recover and proceed.
      
      Linux knows about the following sources of Machine Check Errors:
      - Uncorrectable Errors (UE)
      - Effective - Real Address Translation (ERAT)
      - Segment Lookaside Buffer (SLB)
      - Translation Lookaside Buffer (TLB)
      - Unknown/Unrecognised
      
      In the SLB, TLB and ERAT cases, we can further categorise these as
      parity errors, multihit errors or unknown/unrecognised.
      
      We can handle SLB errors by flushing and reloading the SLB. We can
      handle TLB and ERAT multihit errors by flushing the TLB. (It appears
      we may not handle TLB and ERAT parity errors: I will investigate
      further and send a followup patch if appropriate.)
      
      This leaves us with uncorrectable errors. Uncorrectable errors are
      usually the result of ECC memory detecting an error that it cannot
      correct, but they also crop up in the context of PCI cards failing
      during DMA writes, and during CAPI error events.
      
      There are several types of UE, and there are 3 places a UE can occur:
      Skiboot, the kernel, and userspace. For Skiboot errors, we have the
      facility to make some recoverable. For userspace, we can simply kill
      (SIGBUS) the affected process. We have no meaningful way to deal with
      UEs in kernel space or in unrecoverable sections of Skiboot.
      
      Currently, these unrecovered UEs fall through to
      machine_check_expection() in traps.c, which calls die(), which OOPSes
      and sends SIGBUS to the process. This sometimes allows us to stumble
      onwards. For example we've seen UEs kill the kernel eehd and
      khugepaged. However, the process killed could have held a lock, or it
      could have been a more important process, etc: we can no longer make
      any assertions about the state of the machine. Similarly if we see a
      UE in skiboot (and again we've seen this happen), we're not in a
      position where we can make any assertions about the state of the
      machine.
      
      Likewise, for unknown or unrecognised errors, we're not able to say
      anything about the state of the machine.
      
      Therefore, if we have an unrecovered MCE, the most appropriate thing
      to do is to panic.
      
      The second reason is that since e784b649 ("powerpc/powernv: Invoke
      opal_cec_reboot2() on unrecoverable machine check errors."), we
      attempt a special OPAL reboot on an unhandled MCE. This is so the
      hardware can record error data for later debugging.
      
      The comments in that commit assert that we are heading down the panic
      path anyway. At the moment this is not always true. With UEs in kernel
      space, for instance, they are marked as recoverable by the hardware,
      so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't
      panic() but would simply die() and OOPS. It doesn't make sense to be
      staggering on if we've just tried to reboot: we should panic().
      
      Explicitly panic() on unrecovered MCEs on PowerNV.
      Update the comments appropriately.
      
      This fixes some hangs following EEH events on cxlflash setups.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2dd80ec
  8. 20 8月, 2015 1 次提交
  9. 06 8月, 2015 1 次提交
  10. 18 6月, 2015 1 次提交
  11. 05 6月, 2015 3 次提交
  12. 04 6月, 2015 1 次提交
  13. 22 5月, 2015 4 次提交
  14. 11 4月, 2015 1 次提交
  15. 31 3月, 2015 1 次提交
  16. 25 3月, 2015 3 次提交
  17. 16 3月, 2015 2 次提交
  18. 28 1月, 2015 3 次提交
  19. 27 1月, 2015 1 次提交
  20. 15 12月, 2014 1 次提交
  21. 14 12月, 2014 1 次提交
  22. 02 12月, 2014 1 次提交
  23. 17 11月, 2014 1 次提交
    • N
      rtc/tpo: Driver to support rtc and wakeup on PowerNV platform · 16b1d26e
      Neelesh Gupta 提交于
      The patch implements the OPAL rtc driver that binds with the rtc
      driver subsystem. The driver uses the platform device infrastructure
      to probe the rtc device and register it to rtc class framework. The
      'wakeup' is supported depending upon the property 'has-tpo' present
      in the OF node. It provides a way to load the generic rtc driver in
      in the absence of an OPAL driver.
      
      The patch also moves the existing OPAL rtc get/set time interfaces to the
      new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.
      
      Test results:
      -------------
      Host:
      [root@tul169p1 ~]# ls -l /sys/class/rtc/
      total 0
      lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
      [root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
      08:10:07
      [root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
      [root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
      1413274345
      [root@tul169p1 ~]#
      
      FSP:
      $ smgr mfgState
      standby
      $ rtim timeofday
      
      System time is valid: 2014/10/14 08:12:04.225115
      
      $ smgr mfgState
      ipling
      $
      
      CC: devicetree@vger.kernel.org
      CC: tglx@linutronix.de
      CC: rtc-linux@googlegroups.com
      CC: a.zummo@towertech.it
      Signed-off-by: NNeelesh Gupta <neelegup@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      16b1d26e
  24. 12 11月, 2014 1 次提交
  25. 13 10月, 2014 1 次提交
    • M
      powerpc/powernv: Fallback to old HMI handling behavior for old firmware · 6507955c
      Mahesh Salgaonkar 提交于
      Recently we moved HMI handling into Linux kernel instead of taking
      HMI directly in OPAL. This new change is dependent on new OPAL call
      for HMI recovery which was introduced in newer firmware. While this new
      change works fine with latest OPAL firmware, we broke the HMI handling
      if we run newer kernel on old OPAL firmware that results in system hang.
      
      This patch fixes this issue by falling back to old HMI behavior on older
      OPAL firmware.
      
      This patch introduces a check for opal token OPAL_HANDLE_HMI to see
      if we are running on newer firmware or old firmware. On newer firmware
      this check would return OPAL_TOKEN_PRESENT, otherwise we are running on
      old firmware and fallback to old HMI behavior.
      
      Old firmware: POWER8 System Firmware Release as of today <= SV810_087
      Action: Let OPAL handle HMIs
      
      Newer firmware: in development/yet to be released.
      Action: Let Linux host handle HMIs.
      
      This patch depends on opal check token patch posted at ppc-devel
      https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-August/120224.htmlSigned-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      [mpe: Minor comment and printk rewording]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6507955c
  26. 02 10月, 2014 1 次提交