1. 13 5月, 2008 1 次提交
    • V
      x86: remove 6 bank limitation in 64 bit MCE reporting code · 8edc5cc5
      Venki Pallipadi 提交于
      Eliminate the 6 bank restriction in 64 bit mce reporting code. This
      restriction is artificial (due to static creation of sysfs files) and 32
      bit code does not have any such restriction.
      
      This change helps in reporting the details of machine checks on a
      machine check exception with errors in bank 6 and above on CPUs that
      support those banks. Without the patch, machine check errors in those
      banks are not reported.
      
      We still have 128 (MCE_EXTENDED_BANK) bank restriction instead of max
      256 supported in hardware. That is not changed in the patch below as it
      will have some user level mcelog utility dependency, with bank 128 being
      used for thermal reporting currently.
      
      The patch below does not create sysfs control (bankNctl) for banks
      higher than 6 as well. That needs some pre-cleanup in /sysfs mce layout,
      removal of per cpu /sysfs entries for bankctl as they are really global
      system level control today. That change will follow. This basic change
      is critical to report the detailed errors on banks higher than 6.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8edc5cc5
  2. 26 4月, 2008 1 次提交
    • J
      x86-64: extend MCE CPU quirk handling · 911f6a7b
      Jan Beulich 提交于
      At least on my Barcelona, I see MCE log entries after cold boot caused
      by BIOS not properly clearing the respective registers. Therefore, this
      patch extends the workaround to families 0x10 and 0x11 (the latter just
      for completeness, I have nothing to verify this against).
      At the same time, provide a way to make these entries visible via the
      'mce=bootlog' command line option even on these machines.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      911f6a7b
  3. 20 4月, 2008 2 次提交
  4. 17 4月, 2008 5 次提交
  5. 10 2月, 2008 1 次提交
  6. 30 1月, 2008 15 次提交
  7. 28 1月, 2008 1 次提交
    • G
      x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c · 404aae5d
      Greg Kroah-Hartman 提交于
      This problem is due to the kobject rework recently done in this file.
      
      The mce_amd_64.c code uses some wierd forward calls to back out of the
      recursive way the code creates kobjects.  Because of this, we need to
      verify that we have really created a kobject before calling
      kobject_uevent().
      
      Many thanks to Yinghai Lu <yhlu.kernel@gmail.com> for reporting the
      problem and testing.
      
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Jacob Shin <jacob.shin@amd.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      404aae5d
  8. 25 1月, 2008 4 次提交
  9. 17 11月, 2007 1 次提交
    • A
      x86: fix cpu-hotplug regression · 90367556
      Andreas Herrmann 提交于
      Commit d435d862
      ("cpu hotplug: mce: fix cpu hotplug error handling")
      changed the error handling in mce_cpu_callback.
      
      In cases where not all CPUs are brought up during
      boot (e.g. using maxcpus and additional_cpus parameters)
      mce_cpu_callback now returns NOTFIY_BAD because
      for such CPUs cpu_data is not completely filled when
      the notifier is called. Thus mce_create_device fails right
      at its beginning:
      
              if (!mce_available(&cpu_data[cpu]))
                      return -EIO;
      
      As a quick fix I suggest to check boot_cpu_data for MCE.
      
      To reproduce this regression:
      
      (1) boot with maxcpus=2 addtional_cpus=2 on a 4 CPU x86-64 system
      (2) # echo 1 >/sys/devices/system/cpu/cpu2/online
        -bash: echo: write error: Invalid argument
      
      dmesg shows:
      
      _cpu_up: attempt to bring up CPU 2 failed
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      90367556
  10. 15 11月, 2007 1 次提交
    • A
      x86: don't call mce_create_device on CPU_UP_PREPARE · bae19fe0
      Andreas Herrmann 提交于
      Fix regression introduced with d435d862
      ("cpu hotplug: mce: fix cpu hotplug error handling").
      
      A CPU which was not brought up during boot (using maxcpus and
      additional_cpus parameters) couldn't be onlined anymore.  For such a CPU it
      seemed that MCE was not supported during CPU_UP_PREPARE-time which caused
      mce_cpu_callback to return NOTIFY_BAD to notifier_call_chain.  To fix this
      we:
      
       - call mce_create_device for CPU_ONLINE event (instead of CPU_UP_PREPARE),
       - avoid mce_remove_device() for the CPU that is not correctly initialized
         by mce_create_device() failure,
       - make mce_cpu_callback always return NOTIFY_OK for CPU_ONLINE event.
         Because CPU_ONLINE callback return value is always ignored.
      
      [akinobu.mita@gmail.com: avoid mce_remove_device() for not initialized device]
      [akinobu.mita@gmail.com: make mce_cpu_callback always return NOTIFY_OK]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bae19fe0
  11. 24 10月, 2007 4 次提交
  12. 19 10月, 2007 1 次提交
  13. 18 10月, 2007 2 次提交
    • J
      x86: expand /proc/interrupts to include missing vectors, v2 · 38e760a1
      Joe Korty 提交于
      Add missing IRQs and IRQ descriptions to /proc/interrupts.
      
      /proc/interrupts is most useful when it displays every IRQ vector in use by
      the system, not just those somebody thought would be interesting.
      
      This patch inserts the following vector displays to the i386 and x86_64
      platforms, as appropriate:
      
      	rescheduling interrupts
      	TLB flush interrupts
      	function call interrupts
      	thermal event interrupts
      	threshold interrupts
      	spurious interrupts
      
      A threshold interrupt occurs when ECC memory correction is occuring at too
      high a frequency.  Thresholds are used by the ECC hardware as occasional
      ECC failures are part of normal operation, but long sequences of ECC
      failures usually indicate a memory chip that is about to fail.
      
      Thermal event interrupts occur when a temperature threshold has been
      exceeded for some CPU chip.  IIRC, a thermal interrupt is also generated
      when the temperature drops back to a normal level.
      
      A spurious interrupt is an interrupt that was raised then lowered by the
      device before it could be fully processed by the APIC.  Hence the apic sees
      the interrupt but does not know what device it came from.  For this case
      the APIC hardware will assume a vector of 0xff.
      
      Rescheduling, call, and TLB flush interrupts are sent from one CPU to
      another per the needs of the OS.  Typically, their statistics would be used
      to discover if an interrupt flood of the given type has been occuring.
      
      AK: merged v2 and v4 which had some more tweaks
      AK: replace Local interrupts with Local timer interrupts
      AK: Fixed description of interrupt types.
      
      [ tglx: arch/x86 adaptation ]
      [ mingo: small cleanup ]
      Signed-off-by: NJoe Korty <joe.korty@ccur.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Tim Hockin <thockin@hockin.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      38e760a1
    • S
      i386: Fix section mismatch · 25d1b516
      Satyam Sharma 提交于
      Fix bugzilla #8679
      
      WARNING: arch/i386/kernel/built-in.o(.data+0x2148): Section mismatch: reference
      to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mtrr_mutex')
      
      comes because struct notifier_block thermal_throttle_cpu_notifier in
      arch/i386/kernel/cpu/mcheck/therm_throt.c goes in .data section but the
      notifier callback function itself has been marked __cpuinit which becomes
      __init == .init.text when HOTPLUG_CPU=n.  The warning is bogus because the
      callback will never be called out if HOTPLUG_CPU=n in the first place (as
      one can see from kernel/cpu.c, the cpu_chain itself is __cpuinitdata :-)
      
      So, let's mark thermal_throttle_cpu_notifier as __cpuinitdata to fix
      the section mismatch warning.
      
      [ tglx: arch/x86 adaptation ]
      Signed-off-by: NSatyam Sharma <satyam@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      25d1b516
  14. 11 10月, 2007 1 次提交