1. 28 4月, 2014 26 次提交
    • G
      powerpc/powernv: Reset root port in firmware · fd5cee7c
      Gavin Shan 提交于
      Resetting root port has more stuff to do than that for PCIe switch
      ports and we should have resetting root port done in firmware instead
      of the kernel itself. The problem was introduced by commit 5b2e198e
      ("powerpc/powernv: Rework EEH reset").
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fd5cee7c
    • G
      powerpc/powernv: Fix endless reporting frozen PE · 63796558
      Gavin Shan 提交于
      Once one specific PE has been marked as EEH_PE_ISOLATED, it's in
      the middile of recovery or removed permenently. We needn't report
      the frozen PE again. Otherwise, we will have endless reporting
      same frozen PE.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      63796558
    • G
      powerpc/eeh: No hotplug on permanently removed dev · d2b0f6f7
      Gavin Shan 提交于
      The issue was detected in a bit complicated test case where
      we have multiple hierarchical PEs shown as following figure:
      
                      +-----------------+
                      | PE#3     p2p#0  |
                      |          p2p#1  |
                      +-----------------+
                              |
                      +-----------------+
                      | PE#4     pdev#0 |
                      |          pdev#1 |
                      +-----------------+
      
      PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
      bridges. We accidentally had less-known scenario: PE#4 was removed
      permanently from the system because of permanent failure (e.g.
      exceeding the max allowd failure times in last hour), then we detects
      EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
      for pdev#0/1 were not detached from PE#4, which was still connected to
      PE#3. All of that was because of the fact that we rely on count-based
      pcibios_release_device(), which isn't reliable enough. When doing
      recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
      are not valid any more. Eventually, we run into kernel crash.
      
      The patch fixes above issue from two aspects. For unplug, we simply
      skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
      && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
      than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
      removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
      its PCI config so that PCI core will omit them.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d2b0f6f7
    • G
      powerpc/eeh: Allow to disable EEH · 7f52a526
      Gavin Shan 提交于
      The patch introduces bootarg "eeh=off" to disable EEH functinality.
      Also, it creates /sys/kerenl/debug/powerpc/eeh_enable to disable
      or enable EEH functionality. By default, we have the functionality
      enabled.
      
      For PowerNV platform, we will restore to have the conventional
      mechanism of clearing frozen PE during PCI config access if we're
      going to disable EEH functionality. Conversely, we will rely on
      EEH for error recovery.
      
      The patch also fixes the issue that we missed to cover the case
      of disabled EEH functionality in function ioda_eeh_event(). Those
      events driven by interrupt should be cleared to avoid endless
      reporting.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7f52a526
    • G
      powerpc/eeh: Use cached capability for log dump · 2a18dfc6
      Gavin Shan 提交于
      When calling into eeh_gather_pci_data() on pSeries platform, we
      possiblly don't have pci_dev instance yet, but eeh_dev is always
      ready. So we use cached capability from eeh_dev instead of pci_dev
      for log dump there. In order to keep things unified, we also cache
      PCI capability positions to eeh_dev for PowerNV as well.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2a18dfc6
    • G
      powerpc/eeh: Avoid I/O access during PE reset · 78954700
      Gavin Shan 提交于
      We have suffered recrusive frozen PE a lot, which was caused
      by IO accesses during the PE reset. Ben came up with the good
      idea to keep frozen PE until recovery (BAR restore) gets done.
      With that, IO accesses during PE reset are dropped by hardware
      and wouldn't incur the recrusive frozen PE any more.
      
      The patch implements the idea. We don't clear the frozen state
      until PE reset is done completely. During the period, the EEH
      core expects unfrozen state from backend to keep going. So we
      have to reuse EEH_PE_RESET flag, which has been set during PE
      reset, to return normal state from backend. The side effect is
      we have to clear frozen state for towice (PE reset and clear it
      explicitly), but that's harmless.
      
      We have some limitations on pHyp. pHyp doesn't allow to enable
      IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE
      in eeh_pci_enable(). We have to enable IO before grabbing logs on
      pHyp. Otherwise, 0xFF's is always returned from PCI config space.
      Also, we had wrong return value from eeh_pci_enable() for
      EEH_OPT_THAW_DMA case. The patch fixes it too.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      78954700
    • G
      powerpc/powernv: Use EEH PCI config accessors · 1d9a5446
      Gavin Shan 提交于
      For EEH PowerNV backends, they need use their own PCI config
      accesors as the normal one could be blocked during PE reset.
      The patch also removes necessary parameter "hose" for the
      function ioda_eeh_bridge_reset().
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1d9a5446
    • G
      powerpc/eeh: Block PCI-CFG access during PE reset · d0914f50
      Gavin Shan 提交于
      We've observed multiple PE reset failures because of PCI-CFG
      access during that period. Potentially, some device drivers
      can't support EEH very well and they can't put the device to
      motionless state before PE reset. So those device drivers might
      produce PCI-CFG accesses during PE reset. Also, we could have
      PCI-CFG access from user space (e.g. "lspci"). Since access to
      frozen PE should return 0xFF's, we can block PCI-CFG access
      during the period of PE reset so that we won't get recrusive EEH
      errors.
      
      The patch adds flag EEH_PE_RESET, which is kept during PE reset.
      The PowerNV/pSeries PCI-CFG accessors reuse the flag to block
      PCI-CFG accordingly.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d0914f50
    • G
      powerpc/powernv: Remove fields in PHB diag-data dump · b34497d1
      Gavin Shan 提交于
      For some fields (e.g. LEM, MMIO, DMA) in PHB diag-data dump, it's
      meaningless to print them if they have non-zero value in the
      corresponding mask registers because we always have non-zero values
      in the mask registers. The patch only prints those fieds if we
      have non-zero values in the primary registers (e.g. LEM, MMIO, DMA
      status) so that we can save couple of lines. The patch also removes
      unnecessary spare line before "brdgCtl:" and two leading spaces as
      prefix in each line as Ben suggested.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b34497d1
    • G
      powerpc/powernv: Move PNV_EEH_STATE_ENABLED around · f5bc6b70
      Gavin Shan 提交于
      The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
      which is protected by CONFIG_EEH. We needn't that. Instead, we
      can have pnv_phb::flags and maintain all flags there, which is
      the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
      to PNV_PHB_FLAG_EEH.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5bc6b70
    • G
      powerpc/powernv: Remove PNV_EEH_STATE_REMOVED · 467f79a9
      Gavin Shan 提交于
      The PHB state PNV_EEH_STATE_REMOVED maintained in pnv_phb isn't
      so useful any more and it's duplicated to EEH_PE_ISOLATED. The
      patch replaces PNV_EEH_STATE_REMOVED with EEH_PE_ISOLATED.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      467f79a9
    • P
      ppc/powernv: Set the runlatch bits correctly for offline cpus · f2038911
      Preeti U Murthy 提交于
      Up until now we have been setting the runlatch bits for a busy CPU and
      clearing it when a CPU enters idle state. The runlatch bit has thus
      been consistent with the utilization of a CPU as long as the CPU is online.
      
      However when a CPU is hotplugged out the runlatch bit is not cleared. It
      needs to be cleared to indicate an unused CPU. Hence this patch has the
      runlatch bit cleared for an offline CPU just before entering an idle state
      and sets it immediately after it exits the idle state.
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f2038911
    • A
    • A
      powerpc/powernv: Create OPAL sglist helper functions and fix endian issues · 3441f04b
      Anton Blanchard 提交于
      We have two copies of code that creates an OPAL sg list. Consolidate
      these into a common set of helpers and fix the endian issues.
      
      The flash interface embedded a version number in the num_entries
      field, whereas the dump interface did did not. Since versioning
      wasn't added to the flash interface and it is impossible to add
      this in a backwards compatible way, just remove it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3441f04b
    • A
      powerpc/powernv: Fix little endian issues in OPAL error log code · 14ad0c58
      Anton Blanchard 提交于
      Fix little endian issues with the OPAL error log code.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Reviewed-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      14ad0c58
    • A
      powerpc/powernv: Fix little endian issues with opal_do_notifier calls · 56b4c993
      Anton Blanchard 提交于
      The bitmap in opal_poll_events and opal_handle_interrupt is
      big endian, so we need to byteswap it on little endian builds.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      56b4c993
    • A
      powerpc/powernv: Use uint64_t instead of size_t in OPAL APIs · 2bad7423
      Anton Blanchard 提交于
      Using size_t in our APIs is asking for trouble, especially
      when some OPAL calls use size_t pointers.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Reviewed-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2bad7423
    • W
      powerpc/powernv: Release the refcount for pci_dev · 4966bfa1
      Wei Yang 提交于
      On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
      leads to the pci_dev is not destroyed when hotplugging a pci device.
      
      This patch release the unnecessary refcount.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4966bfa1
    • W
      powerpc/powernv: Reduce multi-hit of iommu_add_device() · 3f28c5af
      Wei Yang 提交于
      During the EEH hotplug event, iommu_add_device() will be invoked three times
      and two of them will trigger warning or error.
      
      The three times to invoke the iommu_add_device() are:
      
          pci_device_add
             ...
             set_iommu_table_base_and_group   <- 1st time, fail
          device_add
             ...
             tce_iommu_bus_notifier           <- 2nd time, succees
          pcibios_add_pci_devices
             ...
             pcibios_setup_bus_devices        <- 3rd time, re-attach
      
      The first time fails, since the dev->kobj->sd is not initialized. The
      dev->kobj->sd is initialized in device_add().
      The third time's warning is triggered by the re-attach of the iommu_group.
      
      After applying this patch, the error
      
          iommu_tce: 0003:05:00.0 has not been added, ret=-14
      
      and the warning
      
          [  204.123609] ------------[ cut here ]------------
          [  204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
          [  204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
          [  204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
          [  204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
          [  204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
          [  204.124506] REGS: c0000027ed50f440 TRAP: 0700   Not tainted  (3.14.0-rc5yw+)
          [  204.124558] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 88008084  XER: 20000000
          [  204.124682] CFAR: c00000000006c644 SOFTE: 1
          GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
          GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
          GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
          GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
          GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
          GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
          GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
          GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
          [  204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
          [  204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
          [  204.125443] Call Trace:
          [  204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
          [  204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
          [  204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
          [  204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
          [  204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
          [  204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
          [  204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
          [  204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
          [  204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
          [  204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
          [  204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
          [  204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
          [  204.126556] Instruction dump:
          [  204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
          [  204.126787] 60000000 e87e0298 3143ffff 7d2a1910 <0b090000> 2fa90000 40de00c8 ebfe0218
          [  204.126966] ---[ end trace 6e7aefd80add2973 ]---
      
      are cleared.
      
      This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
      revert part of the change in commit d905c5df(PPC: POWERNV: move
      iommu_add_device earlier).
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3f28c5af
    • A
      powerpc/powernv: Fix little endian issues in OPAL flash code · cc146d1d
      Anton Blanchard 提交于
      With this patch I was able to update firmware on an LE kernel.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cc146d1d
    • B
      powerpc/powernv: Fix kexec races going back to OPAL · 298b34d7
      Benjamin Herrenschmidt 提交于
      We have a subtle race when sending CPUs back to OPAL on kexec.
      
      We mark them as "in real mode" right before we send them down. Once
      we've booted the new kernel, it might try to call opal_reinit_cpus()
      to change endianness, and that requires all CPUs to be spinning inside
      OPAL.
      
      However there is no synchronization here and we've observed cases
      where the returning CPUs hadn't established their new state inside
      OPAL before opal_reinit_cpus() is called, causing it to fail.
      
      The proper fix is to actually wait for them to go down all the way
      from the kexec'ing kernel.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      298b34d7
    • J
      powerpc/powernv: Check sysparam size before creation · 63aecfb2
      Joel Stanley 提交于
      The size of the sysparam sysfs files is determined from the device tree
      at boot. However the buffer is hard coded to 64 bytes. If we encounter a
      parameter that is larger than 64, or miss-parse the device tree, the
      buffer will overflow when reading or writing to the parameter.
      
      Check it at discovery time, and if the parameter is too large, do not
      create a sysfs entry for it.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      63aecfb2
    • J
      16003d23
    • J
      powerpc/powernv: Check sysfs size before copying · 85390378
      Joel Stanley 提交于
      The sysparam code currently uses the userspace supplied number of
      bytes when memcpy()ing in to a local 64-byte buffer.
      
      Limit the maximum number of bytes by the size of the buffer.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      85390378
    • J
      powerpc/powernv: Use ssize_t for sysparam return values · b8569d23
      Joel Stanley 提交于
      The OPAL calls are returning int64_t values, which the sysparam code
      stores in an int, and the sysfs callback returns ssize_t. Make code a
      easier to read by consistently using ssize_t.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b8569d23
    • J
      powerpc/powernv: Fix sysparam sysfs error handling · ba9a32b1
      Joel Stanley 提交于
      When a sysparam query in OPAL returned a negative value (error code),
      sysfs would spew out a decent chunk of memory; almost 64K more than
      expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
      sysfs code at sys_param_show.
      
      The return value of sys_param_show is a ssize_t, calculated using
      
        return ret ? ret : attr->param_size;
      
      Alan Modra explains:
      
        "attr->param_size" is an unsigned int, "ret" an int, so the overall
        expression has type unsigned int.  Result is that ret is cast to
        unsigned int before being cast to ssize_t.
      
      Instead of using the ternary operator, set ret to the param_size if an
      error is not detected. The same bug exists in the sysfs write callback;
      this patch fixes it in the same way.
      
      A note on debugging this next time: on my system gcc will warn about
      this if compiled with -Wsign-compare, which is not enabled by -Wall,
      only -Wextra.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba9a32b1
  2. 09 4月, 2014 5 次提交
  3. 07 4月, 2014 3 次提交
  4. 24 3月, 2014 3 次提交
  5. 07 3月, 2014 3 次提交
    • S
      powerpc/powernv Platform dump interface · c7e64b9c
      Stewart Smith 提交于
      This enables support for userspace to fetch and initiate FSP and
      Platform dumps from the service processor (via firmware) through sysfs.
      
      Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
      
      Flow:
        - We register for OPAL notification events.
        - OPAL sends new dump available notification.
        - We make information on dump available via sysfs
        - Userspace requests dump contents
        - We retrieve the dump via OPAL interface
        - User copies the dump data
        - userspace sends ack for dump
        - We send ACK to OPAL.
      
      sysfs files:
        - We add the /sys/firmware/opal/dump directory
        - echoing 1 (well, anything, but in future we may support
          different dump types) to /sys/firmware/opal/dump/initiate_dump
          will initiate a dump.
        - Each dump that we've been notified of gets a directory
          in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
          as this is what's used elsewhere to identify the dump).
        - Each dump has files: id, type, dump and acknowledge
          dump is binary and is the dump itself.
          echoing 'ack' to acknowledge (currently any string will do) will
          acknowledge the dump and it will soon after disappear from sysfs.
      
      OPAL APIs:
        - opal_dump_init()
        - opal_dump_info()
        - opal_dump_read()
        - opal_dump_ack()
        - opal_dump_resend_notification()
      
      Currently we are only ever notified for one dump at a time (until
      the user explicitly acks the current dump, then we get a notification
      of the next dump), but this kernel code should "just work" when OPAL
      starts notifying us of all the dumps present.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c7e64b9c
    • S
      powerpc/powernv: Read OPAL error log and export it through sysfs · 774fea1a
      Stewart Smith 提交于
      Based on a patch by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      
      This patch adds support to read error logs from OPAL and export
      them to userspace through a sysfs interface.
      
      We export each log entry as a directory in /sys/firmware/opal/elog/
      
      Currently, OPAL will buffer up to 128 error log records, we don't
      need to have any knowledge of this limit on the Linux side as that
      is actually largely transparent to us.
      
      Each error log entry has the following files: id, type, acknowledge, raw.
      Currently we just export the raw binary error log in the 'raw' attribute.
      In a future patch, we may parse more of the error log to make it a bit
      easier for userspace (e.g. to be able to display a brief summary in
      petitboot without having to have a full parser).
      
      If we have >128 logs from OPAL, we'll only be notified of 128 until
      userspace starts acknowledging them. This limitation may be lifted in
      the future and with this patch, that should "just work" from the linux side.
      
      A userspace daemon should:
      - wait for error log entries using normal mechanisms (we announce creation)
      - read error log entry
      - save error log entry safely to disk
      - acknowledge the error log entry
      - rinse, repeat.
      
      On the Linux side, we read the error log when we're notified of it. This
      possibly isn't ideal as it would be better to only read them on-demand.
      However, this doesn't really work with current OPAL interface, so we
      read the error log immediately when notified at the moment.
      
      I've tested this pretty extensively and am rather confident that the
      linux side of things works rather well. There is currently an issue with
      the service processor side of things for >128 error logs though.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      774fea1a
    • M
      powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO. · 55672ecf
      Mahesh Salgaonkar 提交于
      Detect and recover from machine check when inside opal on a special
      scom load instructions. On specific SCOM read via MMIO we may get a machine
      check exception with SRR0 pointing inside opal. To recover from MC
      in this scenario, get a recovery instruction address and return to it from
      MC.
      
      OPAL will export the machine check recoverable ranges through
      device tree node mcheck-recoverable-ranges under ibm,opal:
      
      # hexdump /proc/device-tree/ibm,opal/mcheck-recoverable-ranges
      0000000 0000 0000 3000 2804 0000 000c 0000 0000
      0000010 3000 2814 0000 0000 3000 27f0 0000 000c
      0000020 0000 0000 3000 2814 xxxx xxxx xxxx xxxx
      0000030 llll llll yyyy yyyy yyyy yyyy
      ...
      ...
      #
      
      where:
      	xxxx xxxx xxxx xxxx = Starting instruction address
      	llll llll           = Length of the address range.
      	yyyy yyyy yyyy yyyy = recovery address
      
      Each recoverable address range entry is (start address, len,
      recovery address), 2 cells each for start and recovery address, 1 cell for
      len, totalling 5 cells per entry. During kernel boot time, build up the
      recovery table with the list of recovery ranges from device-tree node which
      will be used during machine check exception to recover from MMIO SCOM UE.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      55672ecf