1. 09 1月, 2013 1 次提交
    • B
      x86, MCE: Retract most UAPI exports · f51bde6f
      Borislav Petkov 提交于
      Retract back most macro definitions which went into the
      user-visible mce.h header. Even though those bits are mostly
      hardware-defined/-architectural, their naming is not. If we export them
      to userspace, any kernel unification/renaming/cleanup cannot be done
      anymore since those are effectively cast in stone. Besides, if userspace
      wants those definitions, they can write their own defines and go crazy.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      f51bde6f
  2. 15 12月, 2012 1 次提交
  3. 26 10月, 2012 4 次提交
  4. 28 9月, 2012 1 次提交
  5. 18 9月, 2012 1 次提交
  6. 26 7月, 2012 1 次提交
  7. 23 2月, 2012 1 次提交
  8. 17 1月, 2012 1 次提交
  9. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  10. 17 12月, 2011 1 次提交
  11. 14 12月, 2011 1 次提交
  12. 08 11月, 2011 1 次提交
  13. 27 7月, 2011 1 次提交
  14. 16 6月, 2011 2 次提交
  15. 21 4月, 2011 1 次提交
  16. 04 1月, 2011 1 次提交
  17. 11 6月, 2010 2 次提交
  18. 20 5月, 2010 1 次提交
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491
  19. 13 1月, 2010 1 次提交
  20. 10 11月, 2009 1 次提交
    • Y
      x86: Under BIOS control, restore AP's APIC_LVTTHMR to the BSP value · a2202aa2
      Yong Wang 提交于
      On platforms where the BIOS handles the thermal monitor interrupt,
      APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
      and OS must not touch it.
      
      Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
      the LVT entries except the mask bit. Essentially this results in
      all LVT entries including the thermal monitoring interrupt set
      to masked (clearing the bios programmed value for APIC_LVTTHMR).
      
      And this leads to kernel take over the thermal monitoring
      interrupt on AP's but not on BSP (leaving the bios programmed
      value only on BSP).
      
      As a result of this, we have seen system hangs when the thermal
      monitoring interrupt is generated.
      
      Fix this by reading the initial value of thermal LVT entry on
      BSP and if bios has taken over the control, then program the
      same value on all AP's and leave the thermal monitoring
      interrupt control on all the logical cpu's to the bios.
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Reviewed-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      a2202aa2
  21. 16 10月, 2009 1 次提交
    • B
      x86, mce: Fix up MCE naming nomenclature · 5e09954a
      Borislav Petkov 提交于
      Prefix global/setup routines with "mcheck_" thus differentiating
      from the internal facilities prefixed with "mce_". Also, prefix
      the per cpu calls with mcheck_cpu and rename them to reflect the
      MCE setup hierarchy of calls better.
      
      There should be no functionality change resulting from this
      patch.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <1255689093-26921-1-git-send-email-borislav.petkov@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e09954a
  22. 12 10月, 2009 1 次提交
  23. 02 10月, 2009 1 次提交
    • I
      x86: EDAC: MCE: Fix MCE decoding callback logic · f436f8bb
      Ingo Molnar 提交于
      Make decoding of MCEs happen only on AMD hardware by registering a
      non-default callback only on CPU families which support it.
      
      While looking at the interaction of decode_mce() with the other MCE
      code i also noticed a few other things and made the following
      cleanups/fixes:
      
       - Fixed the mce_decode() weak alias - a weak alias is really not
         good here, it should be a proper callback. A weak alias will be
         overriden if a piece of code is built into the kernel - not
         good, obviously.
      
       - The patch initializes the callback on AMD family 10h and 11h.
      
       - Added the more correct fallback printk of:
      
      	No support for human readable MCE decoding on this CPU type.
      	Transcribe the message and run it through 'mcelog --ascii' to decode.
      
         On CPUs that dont have a decoder.
      
       - Made the surrounding code more readable.
      
      Note that the callback allows us to have a default fallback -
      without having to check the CPU versions during the printout
      itself. When an EDAC module registers itself, it can install the
      decode-print function.
      
      (there's no unregister needed as this is core code.)
      
      version -v2 by Borislav Petkov:
      
       - add K8 to the set of supported CPUs
      
       - always build in edac_mce_amd since we use an early_initcall now
      
       - fix checkpatch warnings
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <20091001141432.GA11410@aftab>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f436f8bb
  24. 11 8月, 2009 2 次提交
  25. 10 7月, 2009 2 次提交
  26. 21 6月, 2009 1 次提交
  27. 17 6月, 2009 5 次提交
  28. 11 6月, 2009 1 次提交
    • H
      x86, mce: Add boot options for corrected errors · 62fdac59
      Hidetoshi Seto 提交于
      This patch introduces three boot options (no_cmci, dont_log_ce
      and ignore_ce) to control handling for corrected errors.
      
      The "mce=no_cmci" boot option disables the CMCI feature.
      
      Since CMCI is a new feature so having boot controls to disable
      it will be a help if the hardware is misbehaving.
      
      The "mce=dont_log_ce" boot option disables logging for corrected
      errors. All reported corrected errors will be cleared silently.
      This option will be useful if you never care about corrected
      errors.
      
      The "mce=ignore_ce" boot option disables features for corrected
      errors, i.e. polling timer and cmci.  All corrected events are
      not cleared and kept in bank MSRs.
      
      Usually this disablement is not recommended, however it will be
      a help if there are some conflict with the BIOS or hardware
      monitoring applications etc., that clears corrected events in
      banks instead of OS.
      
      [ And trivial cleanup (space -> tab) for doc is included. ]
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62fdac59
  29. 04 6月, 2009 1 次提交
    • A
      x86, mce: support action-optional machine checks · 9b1beaf2
      Andi Kleen 提交于
      Newer Intel CPUs support a new class of machine checks called recoverable
      action optional.
      
      Action Optional means that the CPU detected some form of corruption in
      the background and tells the OS about using a machine check
      exception. The OS can then take appropiate action, like killing the
      process with the corrupted data or logging the event properly to disk.
      
      This is done by the new generic high level memory failure handler added
      in a earlier patch. The high level handler takes the address with the
      failed memory and does the appropiate action, like killing the process.
      
      In this version of the patch the high level handler is stubbed out
      with a weak function to not create a direct dependency on the hwpoison
      branch.
      
      The high level handler cannot be directly called from the machine check
      exception though, because it has to run in a defined process context to
      be able to sleep when taking VM locks (it is not expected to sleep for a
      long time, just do so in some exceptional cases like lock contention)
      
      Thus the MCE handler has to queue a work item for process context,
      trigger process context and then call the high level handler from there.
      
      This patch adds two path to process context: through a per thread kernel
      exit notify_user() callback or through a high priority work item.
      The first runs when the process exits back to user space, the other when
      it goes to sleep and there is no higher priority process.
      
      The machine check handler will schedule both, and whoever runs first
      will grab the event. This is done because quick reaction to this
      event is critical to avoid a potential more fatal machine check
      when the corruption is consumed.
      
      There is a simple lock less ring buffer to queue the corrupted
      addresses between the exception handler and the process context handler.
      Then in process context it just calls the high level VM code with
      the corrupted PFNs.
      
      The code adds the required code to extract the failed address from
      the CPU's machine check registers. It doesn't try to handle all
      possible cases -- the specification has 6 different ways to specify
      memory address -- but only the linear address.
      
      Most of the required checking has been already done earlier in the
      mce_severity rule checking engine.  Following the Intel
      recommendations Action Optional errors are only enabled for known
      situations (encoded in MCACODs). The errors are ignored otherwise,
      because they are action optional.
      
      v2: Improve comment, disable preemption while processing ring buffer
          (reported by Ying Huang)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9b1beaf2