1. 03 8月, 2011 2 次提交
    • H
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying 提交于
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      152cef40
    • H
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying 提交于
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      67eb2e99
  2. 14 7月, 2011 10 次提交
  3. 13 7月, 2011 1 次提交
  4. 12 7月, 2011 4 次提交
  5. 11 7月, 2011 6 次提交
  6. 10 7月, 2011 3 次提交
  7. 09 7月, 2011 10 次提交
  8. 08 7月, 2011 4 次提交