1. 10 8月, 2012 1 次提交
    • T
      x86/mce: Make cmci_discover() quiet · 4670a300
      Tony Luck 提交于
      cmci_discover() works out which machine check banks support CMCI, and
      which of those are shared by multiple logical processors. It uses this
      information to ensure that exactly one cpu is designated the owner of
      each bank so that when interrupts are broadcast to multiple cpus, only one
      of them will look in a shared bank to log the error and clear the bank.
      
      At boot time cmci_discover() performs this task silently. But during
      certain cpu hotplug operations it prints out a set of summary lines
      like this:
      
      CPU 35 MCA banks CMCI:0 CMCI:1 CMCI:3 CMCI:5 CMCI:6 CMCI:7 CMCI:8 CMCI:9 CMCI:10 CMCI:11
      CPU 1 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 39 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 38 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 32 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 37 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 36 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 34 MCA banks CMCI:0 CMCI:1 CMCI:3
      
      The value of these messages seems very low. A user might painstakingly
      cross-check against the data sheet for a processor to ensure that all
      CMCI supported banks are correctly reported, but this seems improbable.
      If users really wanted to do this, we should print the information at
      boot time too.
      
      Remove the messages.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4670a300
  2. 04 8月, 2012 4 次提交
  3. 20 7月, 2012 2 次提交
  4. 12 7月, 2012 1 次提交
    • T
      x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults · 6751ed65
      Tony Luck 提交于
      In commit dad1743e ("x86/mce: Only restart instruction after machine
      check recovery if it is safe") we fixed mce_notify_process() to force a
      signal to the current process if it was not restartable (RIPV bit not
      set in MCG_STATUS). But doing it here means that the process doesn't
      get told the virtual address of the fault via siginfo_t->si_addr. This
      would prevent application level recovery from the fault.
      
      Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
      that we will provide the right information with the signal.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org    # 3.4+
      6751ed65
  5. 07 6月, 2012 8 次提交
  6. 06 6月, 2012 4 次提交
  7. 31 5月, 2012 1 次提交
  8. 24 5月, 2012 3 次提交
  9. 23 5月, 2012 1 次提交
  10. 15 5月, 2012 2 次提交
  11. 30 4月, 2012 3 次提交
  12. 21 4月, 2012 1 次提交
  13. 20 4月, 2012 1 次提交
    • T
      x86/mce: Avoid reading every machine check bank register twice. · 95022b8c
      Tony Luck 提交于
      Reading machine check bank registers is slow. There is a trend of
      increasing the number of banks, and the number of cores. The main section
      of do_machine_check() is a serialized section where each cpu in turn
      checks every bank. Even on a little two socket SandyBridge-EP system
      that multiplies out as:
      
      	2 sockets * 8 cores * 2 hyperthreads * 20 banks = 640 MSRs
      
      We already scan the banks in parallel in mce_no_way_out() to see if there
      is a fatal error anywhere in the system. If we build a cache of VALID
      bits during this scan, we can avoid uselessly re-reading banks that have
      no data. Note that this cache is only a hint. If the valid bit is set in a
      shared bank, all cpus that share that bank will see it during the parallel
      scan, but the first to find it in the sequential scan will (usually) clear
      the bank.
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      95022b8c
  14. 29 3月, 2012 1 次提交
  15. 07 3月, 2012 1 次提交
    • S
      x86, mce: Fix rcu splat in drain_mce_log_buffer() · b11e3d78
      Srivatsa S. Bhat 提交于
      While booting, the following message is seen:
      
      [   21.665087] ===============================
      [   21.669439] [ INFO: suspicious RCU usage. ]
      [   21.673798] 3.2.0-0.0.0.28.36b5ec9-default #2 Not tainted
      [   21.681353] -------------------------------
      [   21.685864] arch/x86/kernel/cpu/mcheck/mce.c:194 suspicious rcu_dereference_index_check() usage!
      [   21.695013]
      [   21.695014] other info that might help us debug this:
      [   21.695016]
      [   21.703488]
      [   21.703489] rcu_scheduler_active = 1, debug_locks = 1
      [   21.710426] 3 locks held by modprobe/2139:
      [   21.714754]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff8133afd3>] __driver_attach+0x53/0xa0
      [   21.725020]  #1:
      [   21.725323] ioatdma: Intel(R) QuickData Technology Driver 4.00
      [   21.733206]  (&__lockdep_no_validate__){......}, at: [<ffffffff8133afe1>] __driver_attach+0x61/0xa0
      [   21.743015]  #2:  (i7core_edac_lock){+.+.+.}, at: [<ffffffffa01cfa5f>] i7core_probe+0x1f/0x5c0 [i7core_edac]
      [   21.753708]
      [   21.753709] stack backtrace:
      [   21.758429] Pid: 2139, comm: modprobe Not tainted 3.2.0-0.0.0.28.36b5ec9-default #2
      [   21.768253] Call Trace:
      [   21.770838]  [<ffffffff810977cd>] lockdep_rcu_suspicious+0xcd/0x100
      [   21.777366]  [<ffffffff8101aa41>] drain_mcelog_buffer+0x191/0x1b0
      [   21.783715]  [<ffffffff8101aa78>] mce_register_decode_chain+0x18/0x20
      [   21.790430]  [<ffffffffa01cf8db>] i7core_register_mci+0x2fb/0x3e4 [i7core_edac]
      [   21.798003]  [<ffffffffa01cfb14>] i7core_probe+0xd4/0x5c0 [i7core_edac]
      [   21.804809]  [<ffffffff8129566b>] local_pci_probe+0x5b/0xe0
      [   21.810631]  [<ffffffff812957c9>] __pci_device_probe+0xd9/0xe0
      [   21.816650]  [<ffffffff813362e4>] ? get_device+0x14/0x20
      [   21.822178]  [<ffffffff81296916>] pci_device_probe+0x36/0x60
      [   21.828061]  [<ffffffff8133ac8a>] really_probe+0x7a/0x2b0
      [   21.833676]  [<ffffffff8133af23>] driver_probe_device+0x63/0xc0
      [   21.839868]  [<ffffffff8133b01b>] __driver_attach+0x9b/0xa0
      [   21.845718]  [<ffffffff8133af80>] ? driver_probe_device+0xc0/0xc0
      [   21.852027]  [<ffffffff81339168>] bus_for_each_dev+0x68/0x90
      [   21.857876]  [<ffffffff8133aa3c>] driver_attach+0x1c/0x20
      [   21.863462]  [<ffffffff8133a64d>] bus_add_driver+0x16d/0x2b0
      [   21.869377]  [<ffffffff8133b6dc>] driver_register+0x7c/0x160
      [   21.875220]  [<ffffffff81296bda>] __pci_register_driver+0x6a/0xf0
      [   21.881494]  [<ffffffffa01fe000>] ? 0xffffffffa01fdfff
      [   21.886846]  [<ffffffffa01fe047>] i7core_init+0x47/0x1000 [i7core_edac]
      [   21.893737]  [<ffffffff810001ce>] do_one_initcall+0x3e/0x180
      [   21.899670]  [<ffffffff810a9b95>] sys_init_module+0xc5/0x220
      [   21.905542]  [<ffffffff8149bc39>] system_call_fastpath+0x16/0x1b
      
      Fix this by using ACCESS_ONCE() instead of rcu_dereference_check_mce()
      over mcelog.next. Since the access to each entry is controlled by the
      ->finished field, ACCESS_ONCE() should work just fine. An rcu_dereference
      is unnecessary here.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      b11e3d78
  16. 23 2月, 2012 2 次提交
  17. 22 2月, 2012 1 次提交
  18. 27 1月, 2012 1 次提交
  19. 17 1月, 2012 1 次提交
  20. 14 1月, 2012 1 次提交