1. 14 9月, 2015 1 次提交
    • J
      acpi/apei: Use appropriate pgprot_t to map GHES memory · 8ece249a
      Jonathan (Zhixiong) Zhang 提交于
      If the ACPI APEI firmware handles hardware error first (called
      "firmware first handling"), the firmware updates the GHES memory
      region with hardware error record (called "generic hardware
      error record"). Essentially the firmware writes hardware error
      records in the GHES memory region, triggers an NMI/interrupt,
      then the GHES driver goes off and grabs the error record from
      the GHES region.
      
      The kernel currently maps the GHES memory region as cacheable
      (PAGE_KERNEL) for all architectures. However, on some arm64
      platforms, there is a mismatch between how the kernel maps the
      GHES region (PAGE_KERNEL) and how the firmware maps it
      (EFI_MEMORY_UC, ie. uncacheable), leading to the possibility of
      the kernel GHES driver reading stale data from the cache when it
      receives the interrupt.
      
      With stale data being read, the kernel is unaware there is new
      hardware error to be handled when there actually is; this may
      lead to further damage in various scenarios, such as error
      propagation caused data corruption. If uncorrected error (such
      as double bit ECC error) happened in memory operation and if the
      kernel is unaware of such an event happening, errorneous data may
      be propagated to the disk.
      
      Instead GHES memory region should be mapped with page protection
      type according to what is returned from arch_apei_get_mem_attribute().
      Signed-off-by: NJonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      [ Small stylistic tweaks. ]
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441372302-23242-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8ece249a
  2. 08 7月, 2015 1 次提交
  3. 28 4月, 2015 5 次提交
  4. 22 10月, 2014 2 次提交
  5. 20 10月, 2014 1 次提交
  6. 23 7月, 2014 3 次提交
  7. 17 6月, 2014 1 次提交
    • L
      ACPICA: Restore error table definitions to reduce code differences between... · 0a00fd5e
      Lv Zheng 提交于
      ACPICA: Restore error table definitions to reduce code differences between Linux and ACPICA upstream.
      
      The following commit has changed ACPICA table header definitions:
      
       Commit: 88f074f4
       Subject: ACPI, CPER: Update cper info
      
      While such definitions are currently maintained in ACPICA. As the
      modifications applying to the table definitions affect other OSPMs'
      drivers, it is very difficult for ACPICA to initiate a process to
      complete the merge. Thus this commit finally only leaves us divergences.
      
      Revert such naming modifications to reduce the source code differecnes
      between Linux and ACPICA upstream. No functional changes.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Cc: Bob Moore <robert.moore@intel.com>
      Cc: Chen, Gong <gong.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0a00fd5e
  8. 21 12月, 2013 2 次提交
  9. 07 12月, 2013 1 次提交
  10. 24 10月, 2013 1 次提交
  11. 22 10月, 2013 1 次提交
  12. 11 7月, 2013 1 次提交
  13. 07 6月, 2013 1 次提交
  14. 05 6月, 2013 1 次提交
  15. 31 5月, 2013 1 次提交
  16. 26 2月, 2013 1 次提交
  17. 22 2月, 2013 1 次提交
  18. 29 11月, 2012 1 次提交
  19. 22 11月, 2012 1 次提交
  20. 12 6月, 2012 1 次提交
  21. 17 1月, 2012 4 次提交
    • M
      ACPI APEI: Convert atomicio routines · 700130b4
      Myron Stowe 提交于
      APEI needs memory access in interrupt context.  The obvious choice is
      acpi_read(), but originally it couldn't be used in interrupt context
      because it makes temporary mappings with ioremap().  Therefore, we added
      drivers/acpi/atomicio.c, which provides:
          acpi_pre_map_gar()     -- ioremap in process context
      	acpi_atomic_read()     -- memory access in interrupt context
      	acpi_post_unmap_gar()  -- iounmap
      
      Later we added acpi_os_map_generic_address() (29718521) and enhanced
      acpi_read() so it works in interrupt context as long as the address has
      been previously mapped (620242ae).  Now this sequence:
          acpi_os_map_generic_address()    -- ioremap in process context
          acpi_read()/apei_read()          -- now OK in interrupt context
          acpi_os_unmap_generic_address()
      is equivalent to what atomicio.c provides.
      
      This patch introduces apei_read() and apei_write(), which currently are
      functional equivalents of acpi_read() and acpi_write().  This is mainly
      proactive, to prevent APEI breakages if acpi_read() and acpi_write()
      are ever augmented to support the 'bit_offset' field of GAS, as APEI's
      __apei_exec_write_register() precludes splitting up functionality
      related to 'bit_offset' and APEI's 'mask' (see its
      APEI_EXEC_PRESERVE_REGISTER block).
      
      With apei_read() and apei_write() in place, usages of atomicio routines
      are converted to apei_read()/apei_write() and existing calls within
      osl.c and the CA, based on the re-factoring that was done in an earlier
      patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
          acpi_pre_map_gar()     -->  acpi_os_map_generic_address()
          acpi_post_unmap_gar()  -->  acpi_os_unmap_generic_address()
          acpi_atomic_read()     -->  apei_read()
          acpi_atomic_write()    -->  apei_write()
      
      Note that acpi_read() and acpi_write() currently use 'bit_width'
      for accessing GARs which seems incorrect.  'bit_width' is the size of
      the register, while 'access_width' is the size of the access the
      processor must generate on the bus.  The 'access_width' may be larger,
      for example, if the hardware only supports 32-bit or 64-bit reads.  I
      wanted to minimize any possible impacts with this patch series so I
      did *not* change this behavior.
      Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      700130b4
    • H
      ACPI, APEI, Printk queued error record before panic · 46d12f0b
      Huang Ying 提交于
      Because printk is not safe inside NMI handler, the recoverable error
      records received in NMI handler will be queued to be printked in a
      delayed IRQ context via irq_work.  If a fatal error occurs after the
      recoverable error and before the irq_work processed, we lost a error
      report.
      
      To solve the issue, the queued error records are printked in NMI
      handler if system will go panic.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      46d12f0b
    • H
      ACPI, APEI, GHES, Distinguish interleaved error report in kernel log · 5ba82ab5
      Huang Ying 提交于
      In most cases, printk only guarantees messages from different printk
      calling will not be interleaved between each other.  But, one APEI
      GHES hardware error report will involve multiple printk calling,
      normally each for one line.  So it is possible that the hardware error
      report comes from different generic hardware error source will be
      interleaved.
      
      In this patch, a sequence number is prefixed to each line of error
      report.  So that, even if they are interleaved, they still can be
      distinguished by the prefixed sequence number.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      5ba82ab5
    • H
      ACPI, APEI, GHES: Add PCIe AER recovery support · a654e5ee
      Huang Ying 提交于
      aer_recover_queue() is called when recoverable PCIe AER errors are
      notified by firmware to do the recovery work.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a654e5ee
  22. 13 1月, 2012 1 次提交
  23. 10 10月, 2011 1 次提交
  24. 03 8月, 2011 4 次提交
    • L
      APEI GHES: 32-bit buildfix · 70cb6e1d
      Len Brown 提交于
      drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
      drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression
      
      ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
        in function ghes_estatus_cache_add().
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      70cb6e1d
    • H
      ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a
      Huang Ying 提交于
      memory_failure_queue() is called when recoverable memory errors are
      notified by firmware to do the recovery work.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ba61ca4a
    • H
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying 提交于
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      152cef40
    • H
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying 提交于
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      67eb2e99
  25. 14 7月, 2011 2 次提交