1. 25 2月, 2009 7 次提交
    • A
      x86, mce, cmci: add CMCI support · 88ccbedd
      Andi Kleen 提交于
      Impact: Major new feature
      
      Intel CMCI (Corrected Machine Check Interrupt) is a new
      feature on Nehalem CPUs. It allows the CPU to trigger
      interrupts on corrected events, which allows faster
      reaction to them instead of with the traditional
      polling timer.
      
      Also use CMCI to discover shared banks. Machine check banks
      can be shared by CPU threads or even cores. Using the CMCI enable
      bit it is possible to detect the fact that another CPU already
      saw a specific bank. Use this to assign shared banks only
      to one CPU to avoid reporting duplicated events.
      
      On CPU hot unplug bank sharing is re discovered. This is done
      using a thread that cycles through all the CPUs.
      
      To avoid races between the poller and CMCI we only poll
      for banks that are not CMCI capable and only check CMCI
      owned banks on a interrupt.
      
      The shared banks ownership information is currently only used for
      CMCI interrupts, not polled banks.
      
      The sharing discovery code follows the algorithm recommended in the
      IA32 SDM Vol3a 14.5.2.1
      
      The CMCI interrupt handler just calls the machine check poller to
      pick up the machine check event that caused the interrupt.
      
      I decided not to implement a separate threshold event like
      the AMD version has, because the threshold is always one currently
      and adding another event didn't seem to add any value.
      
      Some code inspired by Yunhong Jiang's Xen implementation,
      which was in term inspired by a earlier CMCI implementation
      by me.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      88ccbedd
    • A
      x86, mce, cmci: define MSR names and fields for new CMCI registers · 03195c6b
      Andi Kleen 提交于
      Impact: New register definitions only
      
      CMCI means support for raising an interrupt on a corrected machine
      check event instead of having to poll for it. It's a new feature in
      Intel Nehalem CPUs available on some machine check banks.
      
      For details see the IA32 SDM Vol3a 14.5
      
      Define the registers for it as a preparation for further patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      03195c6b
    • A
      x86, mce, cmci: use polled banks bitmap in machine check poller · ee031c31
      Andi Kleen 提交于
      Define a per cpu bitmap that contains the banks polled by the machine
      check poller. This is needed for the CMCI code in the next patches
      to be able to disable polling on specific banks.
      
      The bank by default contains all banks, so there is no behaviour
      change. Only future code will remove some banks from the polling
      set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ee031c31
    • A
      x86, mce: replace machine check events logged interval with ratelimit · 8457c84d
      Andi Kleen 提交于
      Impact: behavior change, use common code
      
      Use a standard leaky bucket ratelimit for the machine check
      warning print interval instead of waiting every check_interval.
      Also decrease the limit to twice per minute.
      This interacts better with threshold interrupts because
      they can happen more often than check_interval.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8457c84d
    • A
      x86, mce, cmci: avoid potential reentry of threshold interrupt · f9695df4
      Andi Kleen 提交于
      Impact: minor bugfix
      
      The threshold handler on AMD (and soon on Intel) could be theoretically
      reentered by the hardware. This could lead to corrupted events
      because the machine check poll code assumes it is not reentered.
      
      Move the APIC ACK to the end of the interrupt handler to let
      the hardware avoid that.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f9695df4
    • A
      x86, mce, cmci: factor out threshold interrupt handler · b2762686
      Andi Kleen 提交于
      Impact: cleanup; preparation for feature
      
      The mce_amd_64 code has an own private MC threshold vector with an own
      interrupt handler. Since Intel needs a similar handler
      it makes sense to share the vector because both can not
      be active at the same time.
      
      I factored the common APIC handler code into a separate file which can
      be used by both the Intel or AMD MC code.
      
      This is needed for the next patch which adds an Intel specific
      CMCI handler.
      
      This patch should be a nop for AMD, it just moves some code
      around.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b2762686
    • A
      x86, mce, cmci: export MAX_NR_BANKS · 41fdff32
      Andi Kleen 提交于
      Impact: Cleanup (code movement)
      
      Move MAX_NR_BANKS into mce.h because it's needed there
      for followup patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      41fdff32
  2. 24 2月, 2009 2 次提交
  3. 23 2月, 2009 21 次提交
  4. 22 2月, 2009 10 次提交