1. 29 11月, 2012 1 次提交
  2. 12 6月, 2012 1 次提交
  3. 17 1月, 2012 4 次提交
    • M
      ACPI APEI: Convert atomicio routines · 700130b4
      Myron Stowe 提交于
      APEI needs memory access in interrupt context.  The obvious choice is
      acpi_read(), but originally it couldn't be used in interrupt context
      because it makes temporary mappings with ioremap().  Therefore, we added
      drivers/acpi/atomicio.c, which provides:
          acpi_pre_map_gar()     -- ioremap in process context
      	acpi_atomic_read()     -- memory access in interrupt context
      	acpi_post_unmap_gar()  -- iounmap
      
      Later we added acpi_os_map_generic_address() (29718521) and enhanced
      acpi_read() so it works in interrupt context as long as the address has
      been previously mapped (620242ae).  Now this sequence:
          acpi_os_map_generic_address()    -- ioremap in process context
          acpi_read()/apei_read()          -- now OK in interrupt context
          acpi_os_unmap_generic_address()
      is equivalent to what atomicio.c provides.
      
      This patch introduces apei_read() and apei_write(), which currently are
      functional equivalents of acpi_read() and acpi_write().  This is mainly
      proactive, to prevent APEI breakages if acpi_read() and acpi_write()
      are ever augmented to support the 'bit_offset' field of GAS, as APEI's
      __apei_exec_write_register() precludes splitting up functionality
      related to 'bit_offset' and APEI's 'mask' (see its
      APEI_EXEC_PRESERVE_REGISTER block).
      
      With apei_read() and apei_write() in place, usages of atomicio routines
      are converted to apei_read()/apei_write() and existing calls within
      osl.c and the CA, based on the re-factoring that was done in an earlier
      patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
          acpi_pre_map_gar()     -->  acpi_os_map_generic_address()
          acpi_post_unmap_gar()  -->  acpi_os_unmap_generic_address()
          acpi_atomic_read()     -->  apei_read()
          acpi_atomic_write()    -->  apei_write()
      
      Note that acpi_read() and acpi_write() currently use 'bit_width'
      for accessing GARs which seems incorrect.  'bit_width' is the size of
      the register, while 'access_width' is the size of the access the
      processor must generate on the bus.  The 'access_width' may be larger,
      for example, if the hardware only supports 32-bit or 64-bit reads.  I
      wanted to minimize any possible impacts with this patch series so I
      did *not* change this behavior.
      Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      700130b4
    • H
      ACPI, APEI, Printk queued error record before panic · 46d12f0b
      Huang Ying 提交于
      Because printk is not safe inside NMI handler, the recoverable error
      records received in NMI handler will be queued to be printked in a
      delayed IRQ context via irq_work.  If a fatal error occurs after the
      recoverable error and before the irq_work processed, we lost a error
      report.
      
      To solve the issue, the queued error records are printked in NMI
      handler if system will go panic.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      46d12f0b
    • H
      ACPI, APEI, GHES, Distinguish interleaved error report in kernel log · 5ba82ab5
      Huang Ying 提交于
      In most cases, printk only guarantees messages from different printk
      calling will not be interleaved between each other.  But, one APEI
      GHES hardware error report will involve multiple printk calling,
      normally each for one line.  So it is possible that the hardware error
      report comes from different generic hardware error source will be
      interleaved.
      
      In this patch, a sequence number is prefixed to each line of error
      report.  So that, even if they are interleaved, they still can be
      distinguished by the prefixed sequence number.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      5ba82ab5
    • H
      ACPI, APEI, GHES: Add PCIe AER recovery support · a654e5ee
      Huang Ying 提交于
      aer_recover_queue() is called when recoverable PCIe AER errors are
      notified by firmware to do the recovery work.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a654e5ee
  4. 13 1月, 2012 1 次提交
  5. 10 10月, 2011 1 次提交
  6. 03 8月, 2011 4 次提交
    • L
      APEI GHES: 32-bit buildfix · 70cb6e1d
      Len Brown 提交于
      drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
      drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression
      
      ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
        in function ghes_estatus_cache_add().
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      70cb6e1d
    • H
      ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a
      Huang Ying 提交于
      memory_failure_queue() is called when recoverable memory errors are
      notified by firmware to do the recovery work.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ba61ca4a
    • H
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying 提交于
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      152cef40
    • H
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying 提交于
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      67eb2e99
  7. 14 7月, 2011 3 次提交
  8. 31 3月, 2011 1 次提交
  9. 12 1月, 2011 1 次提交
    • H
      ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support · 81e88fdc
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      This patch adds POLL/IRQ/NMI notification types support.
      
      Because the memory area used to transfer hardware error information
      from BIOS to Linux can be determined only in NMI, IRQ or timer
      handler, but general ioremap can not be used in atomic context, so a
      special version of atomic ioremap is implemented for that.
      
      Known issue:
      
      - Error information can not be printed for recoverable errors notified
        via NMI, because printk is not NMI-safe. Will fix this via delay
        printing to IRQ context via irq_work or make printk NMI-safe.
      
      v2:
      
      - adjust printk format per comments.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      81e88fdc
  10. 14 12月, 2010 1 次提交
  11. 30 9月, 2010 1 次提交
  12. 09 8月, 2010 2 次提交
  13. 20 5月, 2010 1 次提交
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491