1. 15 9月, 2009 2 次提交
  2. 17 8月, 2009 2 次提交
    • I
      x86, mce: Don't initialize MCEs on unknown CPUs · e412cd25
      Ingo Molnar 提交于
      An older test-box started hanging at the following point during
      bootup:
      
       [    0.022996] Mount-cache hash table entries: 512
       [    0.024996] Initializing cgroup subsys debug
       [    0.025996] Initializing cgroup subsys cpuacct
       [    0.026995] Initializing cgroup subsys devices
       [    0.027995] Initializing cgroup subsys freezer
       [    0.028995] mce: CPU supports 5 MCE banks
      
      I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit
      machine check code on 32bit"), which utilizes the MCE code on
      32-bit systems too.
      
      The problem is caused by this detail in my config:
      
        # CONFIG_CPU_SUP_INTEL is not set
      
      This disables the quirks in mce_cpu_quirks() but still enables
      MCE support - which then hangs due to the missing quirk
      workaround needed on this CPU:
      
      	if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
      		mce_banks[0].init = 0;
      
      The safe solution is to not initialize MCEs if we dont know on
      what CPU we are running (or if that CPU's support code got
      disabled in the config).
      
      Also be a bit more defensive on 32-bit systems: dont do a
      boot-time dump of pending MCEs not just on the specific system
      that we found a problem with (Pentium-M), but earlier ones as
      well.
      
      Now this problem is probably not common and disabling CPU
      support is rare - but still being more defensive in something
      we turned on for a wide range of CPUs is prudent.
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e412cd25
    • B
      x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs · c7f6fa44
      Bartlomiej Zolnierkiewicz 提交于
      On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
      boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
      
      MCE 0
      HARDWARE ERROR. This is *NOT* a software problem!
      Please contact your hardware vendor
      CPU 0 BANK 1 MCG status:
      MCi status:
      Error overflow
      Uncorrected error
      Error enabled
      Processor context corrupt
      MCA: Data CACHE Level-1 UNKNOWN Error
      STATUS f200000000000195 MCGSTATUS 0
      
      [ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
        and f200000000000115 (... READ Error).
      
        To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
        the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
        content of STATUS MSR before it is cleared during initialization. ]
      
      Since the bogus MCE results in a kernel taint (which in turn disables
      lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
      by default ("mce=bootlog" boot parameter can be be used to get the old
      behavior).
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7f6fa44
  3. 22 7月, 2009 1 次提交
  4. 09 7月, 2009 1 次提交
  5. 26 6月, 2009 1 次提交
  6. 18 6月, 2009 3 次提交
  7. 17 6月, 2009 9 次提交
    • A
      x86: mce: Handle banks == 0 case in K7 quirk · 203abd67
      Andi Kleen 提交于
      Vegard Nossum reported:
      
      > I get an MCE-related crash like this in latest linus tree:
      >
      > [    0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
      > [    0.116396] CPU: L2 Cache: 512K (64 bytes/line)
      > [    0.120570] mce: CPU supports 0 MCE banks
      > [    0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
      > [    0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] PGD 0
      > [    0.128001] Thread overran stack, or stack corrupted
      > [    0.128001] Oops: 0002 [#1] PREEMPT SMP
      > [    0.128001] last sysfs file:
      > [    0.128001] CPU 0
      > [    0.128001] Modules linked in:
      > [    0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
      > [    0.128001] RIP: 0010:[<ffffffff813b98ad>]  [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] RSP: 0018:ffffffff81595e38  EFLAGS: 00000246
      > [    0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
      > [    0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
      > [    0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
      > [    0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
      > [    0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
      > [    0.128001] FS:  0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
      > 00000000000
      > [    0.128001] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [    0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
      > [    0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [    0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
      > [    0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
      > ff8152a4a0)
      > [    0.128001] Stack:
      > [    0.128001]  0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
      > 914
      > [    0.128001]  ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
      > 69c
      > [    0.128001]  5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
      > e6e
      > [    0.128001] Call Trace:
      > [    0.128001]  [<ffffffff813b869c>] identify_cpu+0x331/0x392
      > [    0.128001]  [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
      > [    0.128001]  [<ffffffff815a14ac>] check_bugs+0x1c/0x60
      > [    0.128001]  [<ffffffff8159c075>] start_kernel+0x403/0x46e
      > [    0.128001]  [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
      > [    0.128001]  [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
      > [    0.128001]  [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
      
      This happens on QEMU which reports MCA capability, but no banks.
      Without this patch there is a buffer overrun and boot ops because
      the code would try to initialize the 0 element of a zero length
      kmalloc() buffer.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      203abd67
    • H
      x86, mce: make mce_disabled boolean · c6978369
      Hidetoshi Seto 提交于
      The mce_disabled on 32bit is a tristate variable [1,0,-1],
      while 64bit version is boolean [0,1].
      This patch makes mce_disabled always boolean, and use mce_p5_enabled
      to indicate the third state instead.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c6978369
    • H
      x86, mce: unify mce.h · 9e55e44e
      Hidetoshi Seto 提交于
      There are 2 headers:
      	arch/x86/include/asm/mce.h
      	arch/x86/kernel/cpu/mcheck/mce.h
      and in the latter small header:
      	#include <asm/mce.h>
      
      This patch move all contents in the latter header into the former,
      and fix all files using the latter to include the former instead.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9e55e44e
    • H
      x86, mce: sysfs entries for new mce options · 9af43b54
      Hidetoshi Seto 提交于
      Add sysfs interface for admins who want to tweak these options without
      rebooting the system.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9af43b54
    • H
      x86, mce: rename static variables around trigger · 1020bcbc
      Hidetoshi Seto 提交于
      "trigger" is not straight forward name for valiable that holds name
      of user mode helper program which triggered by machine check events.
      
      This patch renames this valiable and kins to more recognizable names.
      
      	trigger		=> mce_helper
      	trigger_argv	=> mce_helper_argv
      	notify_user	=> mce_need_notify
      
      No functional changes.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      1020bcbc
    • H
      x86, mce: add __read_mostly · 4e5b3e69
      Hidetoshi Seto 提交于
      Add __read_mostly to data written during setup.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4e5b3e69
    • H
      x86, mce: cleanup mce_start() · 7fb06fc9
      Hidetoshi Seto 提交于
      Simplify interface of mce_start():
      
      -       no_way_out = mce_start(no_way_out, &order);
      +       order = mce_start(&no_way_out);
      
      Now Monarch and Subjects share same exit(return) in usual path.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7fb06fc9
    • H
      x86, mce: don't init timer if !mce_available · 33edbf02
      Hidetoshi Seto 提交于
      In mce_cpu_restart, mce_init_timer is called unconditionally.
      If !mce_available (e.g. mce is disabled), there are no useful work
      for timer.  Stop running it.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      33edbf02
    • H
      x86, mce: fix a race condition about mce_callin and no_way_out · 184e1fdf
      Huang Ying 提交于
      If one CPU has no_way_out == 1, all other CPUs should have no_way_out
      == 1. But despite global_nwo is read after mce_callin, global_nwo is
      updated after mce_callin too. So it is possible that some CPU read
      global_nwo before some other CPU update global_nwo, so that no_way_out
      == 1 for some CPU, while no_way_out == 0 for some other CPU.
      
      This patch fixes this race condition via moving mce_callin updating
      after global_nwo updating, with a smp_wmb in between. A smp_rmb is
      added between their reading too.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      184e1fdf
  8. 11 6月, 2009 2 次提交
    • H
      x86, mce: Add boot options for corrected errors · 62fdac59
      Hidetoshi Seto 提交于
      This patch introduces three boot options (no_cmci, dont_log_ce
      and ignore_ce) to control handling for corrected errors.
      
      The "mce=no_cmci" boot option disables the CMCI feature.
      
      Since CMCI is a new feature so having boot controls to disable
      it will be a help if the hardware is misbehaving.
      
      The "mce=dont_log_ce" boot option disables logging for corrected
      errors. All reported corrected errors will be cleared silently.
      This option will be useful if you never care about corrected
      errors.
      
      The "mce=ignore_ce" boot option disables features for corrected
      errors, i.e. polling timer and cmci.  All corrected events are
      not cleared and kept in bank MSRs.
      
      Usually this disablement is not recommended, however it will be
      a help if there are some conflict with the BIOS or hardware
      monitoring applications etc., that clears corrected events in
      banks instead of OS.
      
      [ And trivial cleanup (space -> tab) for doc is included. ]
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62fdac59
    • H
      x86, mce: Fix mce printing · 77e26cca
      Hidetoshi Seto 提交于
      This patch:
      
       - Adds print_mce_head() instead of first flag
       - Makes the header to be printed always
       - Stops double printing of corrected errors
      
      [ This portion originates from Huang Ying's patch ]
      
      Originally-From: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      LKML-Reference: <4A30AC83.5010708@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77e26cca
  9. 04 6月, 2009 18 次提交
    • A
      x86, mce: support action-optional machine checks · 9b1beaf2
      Andi Kleen 提交于
      Newer Intel CPUs support a new class of machine checks called recoverable
      action optional.
      
      Action Optional means that the CPU detected some form of corruption in
      the background and tells the OS about using a machine check
      exception. The OS can then take appropiate action, like killing the
      process with the corrupted data or logging the event properly to disk.
      
      This is done by the new generic high level memory failure handler added
      in a earlier patch. The high level handler takes the address with the
      failed memory and does the appropiate action, like killing the process.
      
      In this version of the patch the high level handler is stubbed out
      with a weak function to not create a direct dependency on the hwpoison
      branch.
      
      The high level handler cannot be directly called from the machine check
      exception though, because it has to run in a defined process context to
      be able to sleep when taking VM locks (it is not expected to sleep for a
      long time, just do so in some exceptional cases like lock contention)
      
      Thus the MCE handler has to queue a work item for process context,
      trigger process context and then call the high level handler from there.
      
      This patch adds two path to process context: through a per thread kernel
      exit notify_user() callback or through a high priority work item.
      The first runs when the process exits back to user space, the other when
      it goes to sleep and there is no higher priority process.
      
      The machine check handler will schedule both, and whoever runs first
      will grab the event. This is done because quick reaction to this
      event is critical to avoid a potential more fatal machine check
      when the corruption is consumed.
      
      There is a simple lock less ring buffer to queue the corrupted
      addresses between the exception handler and the process context handler.
      Then in process context it just calls the high level VM code with
      the corrupted PFNs.
      
      The code adds the required code to extract the failed address from
      the CPU's machine check registers. It doesn't try to handle all
      possible cases -- the specification has 6 different ways to specify
      memory address -- but only the linear address.
      
      Most of the required checking has been already done earlier in the
      mce_severity rule checking engine.  Following the Intel
      recommendations Action Optional errors are only enabled for known
      situations (encoded in MCACODs). The errors are ignored otherwise,
      because they are action optional.
      
      v2: Improve comment, disable preemption while processing ring buffer
          (reported by Ying Huang)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9b1beaf2
    • A
      x86, mce: rename mce_notify_user to mce_notify_irq · 9ff36ee9
      Andi Kleen 提交于
      Rename the mce_notify_user function to mce_notify_irq. The next
      patch will split the wakeup handling of interrupt context
      and of process context and it's better to give it a clearer
      name for this.
      
      Contains a fix from Ying Huang
      
      [ Impact: cleanup ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9ff36ee9
    • A
      x86, mce: implement new status bits · ed7290d0
      Andi Kleen 提交于
      The x86 architecture recently added some new machine check status bits:
      S(ignalled) and AR (Action-Required). Signalled allows to check
      if a specific event caused an exception or was just logged through CMCI.
      AR allows the kernel to decide if an event needs immediate action
      or can be delayed or ignored.
      
      Implement support for these new status bits. mce_severity() uses
      the new bits to grade the machine check correctly and decide what
      to do. The exception handler uses AR to decide to kill or not.
      The S bit is used to separate events between the poll/CMCI handler
      and the exception handler.
      
      Classical UC always leads to panic. That was true before anyways
      because the existing CPUs always passed a PCC with it.
      
      Also corrects the rules whether to kill in user or kernel context
      and how to handle missing RIPV.
      
      The machine check handler largely uses the mce-severity grading
      engine now instead of making its own decisions. This means the logic
      is centralized in one place.  This is useful because it has to be
      evaluated multiple times.
      
      v2: Some rule fixes; Add AO events
      Fix RIPV, RIPV|EIPV order (Ying Huang)
      Fix UCNA with AR=1 message (Ying Huang)
      Add comment about panicing in m_c_p.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ed7290d0
    • A
      x86, mce: print header/footer only once for multiple MCEs · 86503560
      Andi Kleen 提交于
      When multiple MCEs are printed print the "HARDWARE ERROR" header
      and "This is not a software error" footer only once. This
      makes the output much more compact with many CPUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86503560
    • A
      x86, mce: default to panic timeout for machine checks · 29b0f591
      Andi Kleen 提交于
      Fatal machine checks can be logged to disk after boot, but only if
      the system did a warm reboot. That's unfortunately difficult with the
      default panic behaviour, which waits forever and the admin has to
      press the power button because modern systems usually miss a reset button.
      This clears the machine checks in the registers and make
      it impossible to log them.
      
      This patch changes the default for machine check panic to always
      reboot after 30s. Then the mce can be successfully logged after
      reboot.
      
      I believe this will improve machine check experience for any
      system running the X server.
      
      This is dependent on successfull boot logging of MCEs. This currently
      only works on Intel systems, on AMD there are quite a lot of systems
      around which leave junk in the machine check registers after boot,
      so it's disabled here. These systems will continue to default
      to endless waiting panic.
      
      v2: Only force panic timeout when it's shorter (H.Seto)
      v3: Only force timeout when there is no timeout
      (based on comment H.Seto)
      
      [ Fix changelog - HS ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      29b0f591
    • H
      x86, mce: improve mce_get_rip · 1b2797dc
      Huang Ying 提交于
      Assume IP on the stack is valid when either EIPV or RIPV are set.
      This influences whether the machine check exception handler decides
      to return or panic.
      
      This fixes a test case in the mce-test suite and is more compliant
      to the specification.
      
      This currently only makes a difference in a artificial testing
      scenario with the mce-test test suite.
      
      Also in addition do not force the EIPV to be valid with the exact
      register MSRs, and keep in trust the CS value on stack even if MSR
      is available.
      
      [AK: combination of patches from Huang Ying and Hidetoshi Seto, with
      new description by me]
      [add some description, no code changed - HS]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      1b2797dc
    • A
      x86, mce: make non Monarch panic message "Fatal machine check" too · ac960375
      Andi Kleen 提交于
      ... instead of "Machine check". This is for consistency with the Monarch
      panic message.
      
      Based on a report from Ying Huang.
      
      v2: But add a descriptive postfix so that the test suite can distingush.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ac960375
    • A
      x86, mce: switch x86 machine check handler to Monarch election. · 3c079792
      Andi Kleen 提交于
      On Intel platforms machine check exceptions are always broadcast to
      all CPUs.  This patch makes the machine check handler synchronize all
      these machine checks, elect a Monarch to handle the event and collect
      the worst event from all CPUs and then process it first.
      
      This has some advantages:
      
      - When there is a truly data corrupting error the system panics as
        quickly as possible. This improves containment of corrupted
        data and makes sure the corrupted data never hits stable storage.
      
      - The panics are synchronized and do not reenter the panic code
        on multiple CPUs (which currently does not handle this well).
      
      - All the errors are reported. Currently it often happens that
        another CPU happens to do the panic first, but reports useless
        information (empty machine check) because the real error
        happened on another CPU which came in later.
        This is a big advantage on Nehalem where the 8 threads per CPU
        lead to often the wrong CPU winning the race and dumping
        useless information on a machine check.  The problem also occurs
        in a less severe form on older CPUs.
      
      - The system can detect when no CPUs detected a machine check
        and shut down the system.  This can happen when one CPU is so
        badly hung that that it cannot process a machine check anymore
        or when some external agent wants to stop the system by
        asserting the machine check pin.  This follows Intel hardware
        recommendations.
      
      - This matches the recommended error model by the CPU designers.
      
      - The events can be output in true severity order
      
      - When a panic happens on another CPU it makes sure to be actually
        be able to process the stop IPI by enabling interrupts.
      
      The code is extremly careful to handle timeouts while waiting
      for other CPUs. It can't rely on the normal timing mechanisms
      (jiffies, ktime_get) because of its asynchronous/lockless nature,
      so it uses own timeouts using ndelay() and a "SPINUNIT"
      
      The timeout is configurable. By default it waits for upto one
      second for the other CPUs.  This can be also disabled.
      
      From some informal testing AMD systems do not see to broadcast
      machine checks, so right now it's always disabled by default on
      non Intel CPUs or also on very old Intel systems.
      
      Includes fixes from Ying Huang
      Fixed a "ecception" in a comment (H.Seto)
      Moved global_nwo reset later based on suggestion from H.Seto
      v2: Avoid duplicate messages
      
      [ Impact: feature, fixes long standing problems. ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      3c079792
    • A
      x86, mce: implement panic synchronization · f94b61c2
      Andi Kleen 提交于
      In some circumstances multiple CPUs can enter mce_panic() in parallel.
      This gives quite confused output because they will all dump the same
      machine check buffer.
      
      The other problem is that they would all panic in parallel, but not
      process each other's shutdown IPIs because interrupts are disabled.
      
      Detect this situation early on in mce_panic(). On the first CPU
      entering will do the panic, the others will just wait to be killed.
      
      For paranoia reasons in case the other CPU dies during the MCE I added
      a 5 seconds timeout. If it expires each CPU will panic on its own again.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f94b61c2
    • A
      x86, mce: implement bootstrapping for machine check wakeups · ccc3c319
      Andi Kleen 提交于
      Machine checks support waking up the mcelog daemon quickly.
      
      The original wake up code for this was pretty ugly, relying on
      a idle notifier and a special process flag. The reason it did
      it this way is that the machine check handler is not subject
      to normal interrupt locking rules so it's not safe
      to call wake_up().  Instead it set a process flag
      and then either did the wakeup in the syscall return
      or in the idle notifier.
      
      This patch adds a new "bootstraping" method as replacement.
      
      The idea is that the handler checks if it's in a state where
      it is unsafe to call wake_up(). If it's safe it calls it directly.
      When it's not safe -- that is it interrupted in a critical
      section with interrupts disables -- it uses a new "self IPI" to trigger
      an IPI to its own CPU. This can be done safely because IPI
      triggers are atomic with some care. The IPI is raised
      once the interrupts are reenabled and can then safely call
      wake_up().
      
      When APICs are disabled the event is just queued and will be picked up
      eventually by the next polling timer. I think that's a reasonable
      compromise, since it should only happen quite rarely.
      
      Contains fixes from Ying Huang.
      
      [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccc3c319
    • A
      x86, mce: check early in exception handler if panic is needed · bd19a5e6
      Andi Kleen 提交于
      The exception handler should behave differently if the exception is
      fatal versus one that can be returned from.  In the first case it should
      never clear any registers because these need to be preserved
      for logging after the next boot. Otherwise it should clear them
      on each CPU step by step so that other CPUs sharing the same bank don't
      see duplicate events. Otherwise we risk reporting events multiple
      times on any CPUs which have shared machine check banks, which
      is a common problem on Intel Nehalem which has both SMT (two
      CPU threads sharing banks) and shared machine check banks in the uncore.
      
      Determine early in a special pass if any event requires a panic.
      This uses the mce_severity() function added earlier.
      
      This is needed for the next patch.
      
      Also fixes a problem together with an earlier patch
      that corrected events weren't logged on a fatal MCE.
      
      [ Impact: Feature ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bd19a5e6
    • A
      x86, mce: remove TSC print heuristic · a0189c70
      Andi Kleen 提交于
      Previously mce_panic used a simple heuristic to avoid printing
      old so far unreported machine check events on a mce panic. This worked
      by comparing the TSC value at the start of the machine check handler
      with the event time stamp and only printing newer ones.
      
      This has a couple of issues, in particular on systems where the TSC
      is not fully synchronized between CPUs it could lose events or print
      old ones.
      
      It is also problematic with full system synchronization as it is
      added by the next patch.
      
      Remove the TSC heuristic and instead replace it with a simple heuristic
      to print corrected errors first and after that uncorrected errors
      and finally the worst machine check as determined by the machine
      check handler.
      
      This simplifies the code because there is no need to pass the
      original TSC value around.
      
      Contains fixes from Ying Huang
      
      [ Impact: bug fix, cleanup ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      a0189c70
    • A
      x86, mce: log corrected errors when panicing · de8a84d8
      Andi Kleen 提交于
      Normally the machine check handler ignores corrected errors and leaves
      them to machine_check_poll(). But when panicing mcp won't run, so
      log all errors.
      
      Note: this can still miss some cases until the "early no way out"
      patch later is applied too.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      de8a84d8
    • A
      x86, mce: extend struct mce user interface with more information. · 8ee08347
      Andi Kleen 提交于
      Experience has shown that struct mce which is used to pass an machine
      check to the user space daemon currently a few limitations.  Also some
      data which is useful to print at panic level is also missing.
      
      This patch addresses most of them. The same information is also
      printed out together with mce panic.
      
      struct mce can be painlessly extended in a compatible way, the mcelog
      user space code just ignores additional fields with a warning.
      
      - It doesn't provide a wall time timestamp. There have been a few
        complaints about that. Fix that by adding a 64bit time_t
      
      - It doesn't provide the exact CPU identification. This makes
        it awkward for mcelog to decode the event correctly, especially
        when there are variations in the supported MCE codes on different
        CPU models or when mcelog is running on a different host after a panic.
        Previously the administrator had to specify the correct CPU
        when mcelog ran on a different host, but with the more variation
        in machine checks now it's better to auto detect that.
        It's also useful for more detailed analysis of CPU events.
        Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.
      
      - Socket ID and initial APIC ID are useful to report because they
        allow to identify the failing CPU in some (not all) cases.
        This is also especially useful for the panic situation.
        This addresses one of the complaints from Thomas Gleixner earlier.
      
      - The MCG capabilities MSR needs to be reported for some advanced
        error processing in mcelog
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8ee08347
    • A
      x86, mce: support more than 256 CPUs in struct mce · d620c67f
      Andi Kleen 提交于
      The old struct mce had a limitation to 256 CPUs. But x86 Linux supports
      more than that now with x2apic. Add a new field extcpu to report the
      extended number.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d620c67f
    • A
      x86, mce: store record length into memory struct mce anchor · f6fb0ac0
      Andi Kleen 提交于
      This makes it easier for tools who want to extract the mcelog out of
      crash images or memory dumps to adapt to changing struct mce size.
      The length field replaces padding, so it's fully compatible.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f6fb0ac0
    • A
      x86, mce: add MCE poll count to /proc/interrupts · ca84f696
      Andi Kleen 提交于
      Keep a count of the machine check polls (or CMCI events) in
      /proc/interrupts.
      
      Andi needs this for debugging, but it's also useful in general
      to see what's going in by the kernel.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ca84f696
    • A
      x86, mce: add machine check exception count in /proc/interrupts · 01ca79f1
      Andi Kleen 提交于
      Useful for debugging, but it's also good general policy
      to have a counter for all special interrupts there. This makes it easier
      to diagnose where a CPU is spending its time.
      
      [ Impact: feature, debugging tool ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      01ca79f1
  10. 29 5月, 2009 1 次提交
    • H
      x86, mce: trivial clean up for mce.c · 14a02530
      Hidetoshi Seto 提交于
      This fixs following checkpatch warnings:
      
      WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
      +#include <asm/uaccess.h>
      
      WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
      +#include <asm/smp.h>
      
      WARNING: line over 80 characters
      +                               set_bit(MCE_OVERFLOW, (unsigned long *)&mcelog.flags);
      
      WARNING: braces {} are not necessary for any arm of this statement
      +       if (mce_notify_user()) {
      [...]
      +       } else {
      [...]
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      14a02530