1. 29 11月, 2012 1 次提交
  2. 14 7月, 2012 1 次提交
  3. 12 6月, 2012 1 次提交
  4. 30 3月, 2012 5 次提交
    • J
      CPER failed to handle generic error records with multiple sections · 37d2a362
      Jiang Liu 提交于
      The function apei_estatus_print() and apei_estatus_check() forget to move ahead
      the gdata pointer when dealing with multiple generic error data sections.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      37d2a362
    • G
      ACPI, APEI: Fix incorrect APEI register bit width check and usage · 15afae60
      Gary Hade 提交于
      The current code incorrectly assumes that
      (1) the APEI register bit width is always 8, 16, 32, or 64 and
      (2) the APEI register bit width is always equal to the APEI
          register access width.
      
      ERST serialization instructions entries such as:
      
      [030h 0048   1]                       Action : 00 [Begin Write Operation]
      [031h 0049   1]                  Instruction : 03 [Write Register Value]
      [032h 0050   1]        Flags (decoded below) : 01
                            Preserve Register Bits : 1
      [033h 0051   1]                     Reserved : 00
      
      [034h 0052  12]              Register Region : [Generic Address Structure]
      [034h 0052   1]                     Space ID : 00 [SystemMemory]
      [035h 0053   1]                    Bit Width : 03
      [036h 0054   1]                   Bit Offset : 00
      [037h 0055   1]         Encoded Access Width : 03 [DWord Access:32]
      [038h 0056   8]                      Address : 000000007F2D7038
      
      [040h 0064   8]                        Value : 0000000000000001
      [048h 0072   8]                         Mask : 0000000000000007
      
      break this assumption by yielding:
        [Firmware Bug]: APEI: Invalid bit width in GAR [0x7f2d7038/3/0]
      
      I have found no ACPI specification requirements corresponding
      with the above assumptions.  There is even a good example in
      the Serialization Instruction Entries section (ACPI 4.0 section
      17.4,1.2, ACPI 4.0a section 2.5.1.2, ACPI 5.0 section 18.5.1.2)
      that mentions a serialization instruction with a bit range of
      [6:2] which is 5 bits wide, _not_ 8, 16, 32, or 64 bits wide.
      
      Compile and boot tested with 3.3.0-rc7 on a IBM HX5.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      15afae60
    • C
      ACPI, APEI, EINJ, new parameter to control trigger action · ee49089d
      Chen Gong 提交于
      Some APEI firmware implementation will access injected address
      specified in param1 to trigger the error when injecting memory
      error, which means if one SRAR error is injected, the crash
      always happens because it is executed in kernel context. This
      new parameter can disable trigger action and control is taken
      over by the user. In this way, an SRAR error can happen in user
      context instead of crashing the system. This function is highly
      depended on BIOS implementation so please ensure you know the
      BIOS trigger procedure before you enable this switch.
      
      v2:
        notrigger should be created together with param1/param2
      Tested-by: NTony Luck <tony.luck@lintel.com>
      Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ee49089d
    • C
      ACPI, APEI, EINJ, limit the range of einj_param · 185210cc
      Chen Gong 提交于
      On the platforms with ACPI4.x support, parameter extension
      is not always doable, which means only parameter extension
      is enabled, einj_param can take effect.
      
      v2->v1: stopping early in einj_get_parameter_address for einj_param
      Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      185210cc
    • J
      ACPI, APEI, Fix ERST header length check · 7ed28f2e
      Jiang Liu 提交于
      This fixes a trivial copy & paste error in ERST header length check.
      It's just for future safety because sizeof(struct acpi_table_einj)
      equals to sizeof(struct acpi_table_erst) with current ACPI5.0
      specification. It applies to v3.3-rc6.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      7ed28f2e
  5. 22 3月, 2012 1 次提交
  6. 24 1月, 2012 3 次提交
  7. 21 1月, 2012 1 次提交
    • M
      ACPI, APEI: Add 64-bit read/write support for APEI on i386 · e615bf5b
      Myron Stowe 提交于
      Base ACPI (CA) currently does not support atomic 64-bit reads and writes
      (acpi_read() and acpi_write() split 64-bit loads/stores into two
      32-bit transfers) yet APEI expects 64-bit transfer capability, even
      when running on 32-bit systems.
      
      This patch implements 64-bit read and write routines for APEI usage.
      
      This patch re-factors similar functionality introduced in commit
      04c25997, bringing it into the ACPI subsystem in preparation for
      removing ./drivers/acpi/atomicio.[ch].  In the implementation I have
      replicated acpi_os_read_memory() and acpi_os_write_memory(), creating
      64-bit versions for APEI to utilize, as opposed to something more
      elegant.  My thinking is that we should attempt to see if we can get
      ACPI's CA/OSL changed so that the existing acpi_read() and acpi_write()
      interfaces are natively 64-bit capable and then subsequently remove the
      replication.
      Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      e615bf5b
  8. 18 1月, 2012 1 次提交
  9. 17 1月, 2012 9 次提交
  10. 13 1月, 2012 1 次提交
  11. 18 11月, 2011 2 次提交
    • K
      pstore: pass reason to backend write callback · 3d6d8d20
      Kees Cook 提交于
      This allows a backend to filter on the dmesg reason as well as the pstore
      reason. When ramoops is switched to pstore, this is needed since it has
      no interest in storing non-crash dmesg details.
      
      Drop pstore_write() as it has no users, and handling the "reason" here
      has no obviously correct value.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      3d6d8d20
    • K
      pstore: pass allocated memory region back to caller · f6f82851
      Kees Cook 提交于
      The buf_lock cannot be held while populating the inodes, so make the backend
      pass forward an allocated and filled buffer instead. This solves the following
      backtrace. The effect is that "buf" is only ever used to notify the backends
      that something was written to it, and shouldn't be used in the read path.
      
      To replace the buf_lock during the read path, isolate the open/read/close
      loop with a separate mutex to maintain serialized access to the backend.
      
      Note that is is up to the pstore backend to cope if the (*write)() path is
      called in the middle of the read path.
      
      [   59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
      [   59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
      [   59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
      [   59.691019] Call Trace:
      [   59.691019]  [<810252d5>] __might_sleep+0xc3/0xca
      [   59.691019]  [<810a26e6>] kmem_cache_alloc+0x32/0xf3
      [   59.691019]  [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4
      [   59.691019]  [<810b68b1>] alloc_inode+0x2a/0x64
      [   59.691019]  [<810b6903>] new_inode+0x18/0x43
      [   59.691019]  [<81142447>] pstore_get_inode.isra.1+0x11/0x98
      [   59.691019]  [<81142623>] pstore_mkfile+0xae/0x26f
      [   59.691019]  [<810a2a66>] ? kmem_cache_free+0x19/0xb1
      [   59.691019]  [<8116c821>] ? ida_get_new_above+0x140/0x158
      [   59.691019]  [<811708ea>] ? __init_rwsem+0x1e/0x2c
      [   59.691019]  [<810b67e8>] ? inode_init_always+0x111/0x1b0
      [   59.691019]  [<8102127e>] ? should_resched+0xd/0x27
      [   59.691019]  [<8137977f>] ? _cond_resched+0xd/0x21
      [   59.691019]  [<81142abf>] pstore_get_records+0x52/0xa7
      [   59.691019]  [<8114254b>] pstore_fill_super+0x7d/0x91
      [   59.691019]  [<810a7ff5>] mount_single+0x46/0x82
      [   59.691019]  [<8114231a>] pstore_mount+0x15/0x17
      [   59.691019]  [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98
      [   59.691019]  [<810a8199>] mount_fs+0x5a/0x12d
      [   59.691019]  [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a
      [   59.691019]  [<810b9474>] vfs_kern_mount+0x4f/0x7d
      [   59.691019]  [<810b9d7e>] do_kern_mount+0x34/0xb2
      [   59.691019]  [<810bb15f>] do_mount+0x5fc/0x64a
      [   59.691019]  [<810912fb>] ? strndup_user+0x2e/0x3f
      [   59.691019]  [<810bb3cb>] sys_mount+0x66/0x99
      [   59.691019]  [<8137b537>] sysenter_do_call+0x12/0x26
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      f6f82851
  12. 13 10月, 2011 1 次提交
  13. 10 10月, 2011 1 次提交
  14. 04 10月, 2011 1 次提交
  15. 17 8月, 2011 1 次提交
    • D
      pstore: change mutex locking to spin_locks · abd4d558
      Don Zickus 提交于
      pstore was using mutex locking to protect read/write access to the
      backend plug-ins.  This causes problems when pstore is executed in
      an NMI context through panic() -> kmsg_dump().
      
      This patch changes the mutex to a spin_lock_irqsave then also checks to
      see if we are in an NMI context.  If we are in an NMI and can't get the
      lock, just print a message stating that and blow by the locking.
      
      All this is probably a hack around the bigger locking problem but it
      solves my current situation of trying to sleep in an NMI context.
      
      Tested by loading the lkdtm module and executing a HARDLOCKUP which
      will cause the machine to panic inside the nmi handler.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NMatthew Garrett <mjg@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      abd4d558
  16. 12 8月, 2011 2 次提交
  17. 03 8月, 2011 5 次提交
    • H
      ACPI, APEI, EINJ Param support is disabled by default · c3e6088e
      Huang Ying 提交于
      EINJ parameter support is only usable for some specific BIOS.
      Originally, it is expected to have no harm for BIOS does not support
      it.  But now, we found it will cause issue (memory overwriting) for
      some BIOS.  So param support is disabled by default and only enabled
      when newly added module parameter named "param_extension" is
      explicitly specified.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      c3e6088e
    • L
      APEI GHES: 32-bit buildfix · 70cb6e1d
      Len Brown 提交于
      drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
      drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression
      
      ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
        in function ghes_estatus_cache_add().
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      70cb6e1d
    • H
      ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a
      Huang Ying 提交于
      memory_failure_queue() is called when recoverable memory errors are
      notified by firmware to do the recovery work.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ba61ca4a
    • H
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying 提交于
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      152cef40
    • H
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying 提交于
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      67eb2e99
  18. 23 7月, 2011 3 次提交