1. 16 6月, 2011 2 次提交
  2. 16 5月, 2011 1 次提交
    • Y
      x86, apic: Fix spurious error interrupts triggering on all non-boot APs · e503f9e4
      Youquan Song 提交于
      This patch fixes a bug reported by a customer, who found
      that many unreasonable error interrupts reported on all
      non-boot CPUs (APs) during the system boot stage.
      
      According to Chapter 10 of Intel Software Developer Manual
      Volume 3A, Local APIC may signal an illegal vector error when
      an LVT entry is set as an illegal vector value (0~15) under
      FIXED delivery mode (bits 8-11 is 0), regardless of whether
      the mask bit is set or an interrupt actually happen. These
      errors are seen as error interrupts.
      
      The initial value of thermal LVT entries on all APs always reads
      0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
      sequence to them and LVT registers are reset to 0s except for
      the mask bits which are set to 1s when APs receive INIT IPI.
      
      When the BIOS takes over the thermal throttling interrupt,
      the LVT thermal deliver mode should be SMI and it is required
      from the kernel to keep AP's LVT thermal monitoring register
      programmed as such as well.
      
      This issue happens when BIOS does not take over thermal throttling
      interrupt, AP's LVT thermal monitor register will be restored to
      0x10000 which means vector 0 and fixed deliver mode, so all APs will
      signal illegal vector error interrupts.
      
      This patch check if interrupt delivery mode is not fixed mode before
      restoring AP's LVT thermal monitor register.
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NYong Wang <yong.y.wang@intel.com>
      Cc: hpa@linux.intel.com
      Cc: joe@perches.com
      Cc: jbaron@redhat.com
      Cc: trenn@suse.de
      Cc: kent.liu@intel.com
      Cc: chaohong.guo@intel.com
      Cc: <stable@kernel.org> # As far back as possible
      Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e503f9e4
  3. 13 5月, 2011 1 次提交
  4. 21 4月, 2011 1 次提交
  5. 20 4月, 2011 1 次提交
  6. 01 4月, 2011 1 次提交
  7. 29 3月, 2011 1 次提交
  8. 24 3月, 2011 1 次提交
    • R
      x86: Use syscore_ops instead of sysdev classes and sysdevs · f3c6ea1b
      Rafael J. Wysocki 提交于
      Some subsystems in the x86 tree need to carry out suspend/resume and
      shutdown operations with one CPU on-line and interrupts disabled and
      they define sysdev classes and sysdevs or sysdev drivers for this
      purpose.  This leads to unnecessarily complicated code and excessive
      memory usage, so switch them to using struct syscore_ops objects for
      this purpose instead.
      
      Generally, there are three categories of subsystems that use
      sysdevs for implementing PM operations: (1) subsystems whose
      suspend/resume callbacks ignore their arguments entirely (the
      majority), (2) subsystems whose suspend/resume callbacks use their
      struct sys_device argument, but don't really need to do that,
      because they can be implemented differently in an arguably simpler
      way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
      use their struct sys_device argument, but the value of that argument
      is always the same and could be ignored (microcode_core.c).  In all
      of these cases the subsystems in question may be readily converted to
      using struct syscore_ops objects for power management and shutdown.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      f3c6ea1b
  9. 22 3月, 2011 1 次提交
    • H
      ACPI, APEI, Add ERST record ID cache · 885b976f
      Huang Ying 提交于
      APEI ERST firmware interface and implementation has no multiple users
      in mind.  For example, if there is four records in storage with ID: 1,
      2, 3 and 4, if two ERST readers enumerate the records via
      GET_NEXT_RECORD_ID as follow,
      
      reader 1		reader 2
      1
      			2
      3
      			4
      -1
      			-1
      
      where -1 signals there is no more record ID.
      
      Reader 1 has no chance to check record 2 and 4, while reader 2 has no
      chance to check record 1 and 3.  And any other GET_NEXT_RECORD_ID will
      return -1, that is, other readers will has no chance to check any
      record even they are not cleared by anyone.
      
      This makes raw GET_NEXT_RECORD_ID not suitable for used by multiple
      users.
      
      To solve the issue, an in-memory ERST record ID cache is designed and
      implemented.  When enumerating record ID, the ID returned by
      GET_NEXT_RECORD_ID is added into cache in addition to be returned to
      caller.  So other readers can check the cache to get all record ID
      available.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      885b976f
  10. 18 3月, 2011 1 次提交
  11. 26 1月, 2011 1 次提交
    • Y
      x86: Move llc_shared_map out of cpu_info · b3d7336d
      Yinghai Lu 提交于
      cpu_info is already with per_cpu, We can take llc_shared_map out
      of cpu_info, and declare it as per_cpu variable directly.
      
      So later referencing could be simple and directly instead of
      diving to find cpu_info at first.
      
      Also could make smp_store_cpu_info() much simple to avoid to do
      save and restore trick.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Alok N Kataria <akataria@vmware.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Hans J. Koch <hjk@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <4D3A16E8.5020608@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3d7336d
  12. 21 1月, 2011 1 次提交
  13. 07 1月, 2011 2 次提交
    • D
      x86, NMI: Remove DIE_NMI_IPI · c410b830
      Don Zickus 提交于
      With priorities in place and no one really understanding the difference between
      DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
      
      This also simplifies default_do_nmi() a little bit.  Instead of calling the
      die_notifier in both the if and else part, just pull it out and call it before
      the if-statement.  This has the side benefit of avoiding a call to the ioport
      to see if there is an external NMI sitting around until after the (more frequent)
      internal NMIs are dealt with.
      Patch-Inspired-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c410b830
    • D
      x86, NMI: Add priorities to handlers · 166d7514
      Don Zickus 提交于
      In order to consolidate the NMI die_chain events, we need to setup the priorities
      for the die notifiers.
      
      I started by defining a bunch of common priorities that can be used by the
      notifier blocks.  Then I modified the notifier blocks to use the newly created
      priorities.
      
      Now that the priorities are straightened out, it should be easier to remove the
      event DIE_NMI_IPI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      166d7514
  14. 04 1月, 2011 1 次提交
  15. 30 12月, 2010 2 次提交
  16. 26 10月, 2010 4 次提交
  17. 20 10月, 2010 1 次提交
    • R
      apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets · 27afdf20
      Robert Richter 提交于
      We want the BIOS to setup the EILVT APIC registers. The offsets
      were hardcoded and BIOS settings were overwritten by the OS.
      Now, the subsystems for MCE threshold and IBS determine the LVT
      offset from the registers the BIOS has setup. If the BIOS setup
      is buggy on a family 10h system, a workaround enables IBS. If
      the OS determines an invalid register setup, a "[Firmware Bug]:
      " error message is reported.
      
      We need this change also for upcomming cpu families.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27afdf20
  18. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  19. 11 10月, 2010 1 次提交
    • B
      x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order · 6dcbfe4f
      Borislav Petkov 提交于
      This fixes possible cases of not collecting valid error info in
      the MCE error thresholding groups on F10h hardware.
      
      The current code contains a subtle problem of checking only the
      Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
      thresholding group) in its first iteration and breaking out if
      the bit is cleared.
      
      But (!), this MSR contains an offset value, BlkPtr[31:24], which
      points to the remaining MSRs in this thresholding group which
      might contain valid information too. But if we bail out only
      after we checked the valid bit in the first MSR and not the
      block pointer too, we miss that other information.
      
      The thing is, MC4_MISC0[BlkPtr] is not predicated on
      MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
      prior to iterating over the MCI_MISCj thresholding group,
      irrespective of the MC4_MISC0[Valid] setting.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6dcbfe4f
  20. 08 10月, 2010 1 次提交
  21. 06 9月, 2010 1 次提交
    • J
      therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal' · 592091c0
      Jin Dongming 提交于
      In unexpected_thermal_interrupt(), "LVT TMR interrupt" is used
      in error message.
      
      I don't think TMR is a suitable abbreviation for thermal.
        1.TMR has been used in IA32 Architectures Software Developer's
          Manual, and is the abbreviation for Trigger Mode Register.
        2.There is not an standard abbreviation "TMR" defined for thermal
          in IA32 Architectures Software Developer's Manual.
        3.Though we could understand it as Thermal Monitor Register, it is
          easy to be misunderstood as a *TIMER* interrupt also.
      
      I think this patch will fix it.
      Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
      Reviewed-by: NJean Delvare <khali@linux-fr.org>
      Cc: Brown Len <len.brown@intel.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <4C7C492D.5020704@np.css.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      592091c0
  22. 05 9月, 2010 1 次提交
    • A
      x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks · 1389298f
      Andreas Herrmann 提交于
      kobject_add_internal failed for threshold_bank2 with -EEXIST,
      don't try to register things with the same name in the same
      directory:
      
        Pid: 1, comm: swapper Tainted: G        W  2.6.31 #1
        Call Trace:
        [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180
        [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b
        [<ffffffff81161793>] ? kobject_init+0x42/0x82
        [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63
        [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259
        [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8
        [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80
        [<ffffffff81646458>] ? threshold_init_device+0x0/0x80
        [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143
        [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2
        [<ffffffff8100c8da>] ? child_rip+0xa/0x20
        [<ffffffff81641254>] ? kernel_init+0x0/0x1a2
        [<ffffffff8100c8d0>] ? child_rip+0x0/0x20
        kobject_create_and_add: kobject_add error: -17
      
      (Probably the for_each_cpu loop should be entirely removed.)
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20100827092006.GB5348@loge.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1389298f
  23. 21 8月, 2010 1 次提交
  24. 09 8月, 2010 1 次提交
  25. 04 8月, 2010 2 次提交
    • F
      x86, hwmon: Package Level Thermal/Power: power limit · 0199114c
      Fenghua Yu 提交于
      Power limit notification feature is published in Intel 64 and IA-32
      Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.
      
      It is implemented first on Intel Sandy Bridge platform.
      
      The patch handles notification interrupt. Interrupt handler dumps power limit
      information in log_buf, logs the event in mce log, and increases the event
      counters (core_power_limit and package_power_limit). Upper level applications
      could use the data to detect system health or diagnose functionality/performance
      issues.
      
      In the future, the event could be handled in a more fancy way.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0199114c
    • F
      x86, hwmon: Package Level Thermal/Power: thermal throttling handler · 55d435a2
      Fenghua Yu 提交于
      Add package level thermal throttle interrupt support. The interrupt handler
      increases package level thermal throttle count. It also logs the event in MCE
      log.
      
      The package level thermal throttle interrupt happens across threads in a
      package. Each thread handles the interrupt individually. User level application
      is supposed to retrieve correct event count and log based on package/thread
      topology. This is the same situation for core level interrupt handler. In the
      future, interrupt may be reported only per package or per core.
      
      core_throttle_count and package_throttle_count are used for user interface.
      Previously only throttle_count is used for core throttle count. If you think
      new core_throttle_count name breaks user interface, I can change this part.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      55d435a2
  26. 03 8月, 2010 1 次提交
  27. 15 6月, 2010 1 次提交
    • P
      mce: convert to rcu_dereference_index_check() · ec8c27e0
      Paul E. McKenney 提交于
      The mce processing applies rcu_dereference_check() to integers used as
      array indices.  This patch therefore moves mce to the new RCU API
      rcu_dereference_index_check() that avoids the sparse processing that
      would otherwise result in compiler errors.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      ec8c27e0
  28. 11 6月, 2010 3 次提交
  29. 28 5月, 2010 1 次提交
  30. 20 5月, 2010 2 次提交
    • H
      ACPI, APEI, Use ERST for persistent storage of MCE · 482908b4
      Huang Ying 提交于
      Traditionally, fatal MCE will cause Linux print error log to console
      then reboot. Because MCE registers will preserve their content after
      warm reboot, the hardware error can be logged to disk or network after
      reboot. But system may fail to warm reboot, then you may lose the
      hardware error log. ERST can help here. Through saving the hardware
      error log into flash via ERST before go panic, the hardware error log
      can be gotten from the flash after system boot successful again.
      
      The fatal MCE processing procedure with ERST involved is as follow:
      
      - Hardware detect error, MCE raised
      - MCE read MCE registers, check error severity (fatal), prepare error record
      - Write MCE error record into flash via ERST
      - Go panic, then trigger system reboot
      - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
        for error record of previous boot via ERST, and output and clear
        them if available
      - /sbin/mcelog logs error records into disk or network
      
      ERST only accepts CPER record format, but there is no pre-defined CPER
      section can accommodate all information in struct mce, so a customized
      section type is defined to hold struct mce inside a CPER record as an
      error section.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      482908b4
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491