1. 20 5月, 2010 6 次提交
    • H
      ACPI, APEI, Error Record Serialization Table (ERST) support · a08f82d0
      Huang Ying 提交于
      ERST is a way provided by APEI to save and retrieve hardware error
      record to and from some simple persistent storage (such as flash).
      
      The Linux kernel support implementation is quite simple and workable
      in NMI context. So it can be used to save hardware error record into
      flash in hardware error exception or NMI handler, where other more
      complex persistent storage such as disk is not usable. After saving
      hardware error records via ERST in hardware error exception or NMI
      handler, the error records can be retrieved and logged into disk or
      network after a clean reboot.
      
      For more information about ERST, please refer to ACPI Specification
      version 4.0, section 17.4.
      
      This patch incorporate fixes from Jin Dongming.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a08f82d0
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491
    • H
      ACPI, APEI, UEFI Common Platform Error Record (CPER) header · 06d65dea
      Huang Ying 提交于
      CPER stands for Common Platform Error Record, it is the hardware error
      record format used to describe platform hardware error by various APEI
      tables, such as ERST, BERT and HEST etc.
      
      For more information about CPER, please refer to Appendix N of UEFI
      Specification version 2.3.
      
      This patch mainly includes the data structure difinition header file
      used by other files.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      06d65dea
    • H
      ACPI, APEI, EINJ support · e4021345
      Huang Ying 提交于
      EINJ provides a hardware error injection mechanism, this is useful for
      debugging and testing of other APEI and RAS features.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      e4021345
    • H
      ACPI, APEI, HEST table parsing · 9dc96664
      Huang Ying 提交于
      HEST describes error sources in detail; communicating operational
      parameters (i.e. severity levels, masking bits, and threshold values)
      to OS as necessary. It also allows the platform to report error
      sources for which OS would typically not implement support (for
      example, chipset-specific error registers).
      
      HEST information may be needed by other subsystems. For example, HEST
      PCIE AER error source information describes whether a PCIE root port
      works in "firmware first" mode, this is needed by general PCIE AER
      error subsystem. So a public HEST tabling parsing interface is
      provided.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      9dc96664
    • H
      ACPI, APEI, APEI supporting infrastructure · a643ce20
      Huang Ying 提交于
      APEI stands for ACPI Platform Error Interface, which allows to report
      errors (for example from the chipset) to the operating system. This
      improves NMI handling especially. In addition it supports error
      serialization and error injection.
      
      For more information about APEI, please refer to ACPI Specification
      version 4.0, chapter 17.
      
      This patch provides some common functions used by more than one APEI
      tables, mainly framework of interpreter for EINJ and ERST.
      
      A machine readable language is defined for EINJ and ERST for OS to
      execute, and so to drive the firmware to fulfill the corresponding
      functions. The machine language for EINJ and ERST is compatible, so a
      common framework is defined for them.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a643ce20