1. 20 11月, 2014 2 次提交
    • C
      x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll · fa92c586
      Chen Yucong 提交于
      Uncorrected no action required (UCNA) - is a uncorrected recoverable
      machine check error that is not signaled via a machine check exception
      and, instead, is reported to system software as a corrected machine
      check error. UCNA errors indicate that some data in the system is
      corrupted, but the data has not been consumed and the processor state
      is valid and you may continue execution on this processor. UCNA errors
      require no action from system software to continue execution. Note that
      UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
      (MCG_SER_P) is set.
                                                     -- Intel SDM Volume 3B
      
      Deferred errors are errors that cannot be corrected by hardware, but
      do not cause an immediate interruption in program flow, loss of data
      integrity, or corruption of processor state. These errors indicate
      that data has been corrupted but not consumed. Hardware writes information
      to the status and address registers in the corresponding bank that
      identifies the source of the error if deferred errors are enabled for
      logging. Deferred errors are not reported via machine check exceptions;
      they can be seen by polling the MCi_STATUS registers.
                                                      -- AMD64 APM Volume 2
      
      Above two items, both UCNA and Deferred errors belong to detected
      errors, but they can't be corrected by hardware, and this is very
      similar to Software Recoverable Action Optional (SRAO) errors.
      Therefore, we can take some actions that have been used for handling
      SRAO errors to handle UCNA and Deferred errors.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fa92c586
    • C
      x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error · e3480271
      Chen Yucong 提交于
      Until now, the mce_severity mechanism can only identify the severity
      of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
      out DEFERRED error for AMD platform.
      
      This patch extends the mce_severity mechanism for handling
      UCNA/DEFERRED error. In order to do this, the patch introduces a new
      severity level - MCE_UCNA/DEFERRED_SEVERITY.
      
      In addition, mce_severity is specific to machine check exception,
      and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
      mechanism in non-exception context, the patch also introduces a new
      argument (is_excp) for mce_severity. `is_excp' is used to explicitly
      specify the calling context of mce_severity.
      Reviewed-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e3480271
  2. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  3. 09 8月, 2014 1 次提交
  4. 22 7月, 2014 1 次提交
  5. 24 6月, 2014 1 次提交
  6. 23 6月, 2014 1 次提交
  7. 05 6月, 2014 1 次提交
  8. 31 5月, 2014 2 次提交
  9. 17 4月, 2014 1 次提交
  10. 29 3月, 2014 1 次提交
    • C
      x86, CMCI: Add proper detection of end of CMCI storms · 27f6c573
      Chen, Gong 提交于
      When CMCI storm persists for a long time(at least beyond predefined
      threshold. It's 30 seconds for now), we can watch CMCI storm is
      detected immediately after it subsides.
      
      ...
      Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
      Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
      ...
      
      The problem is that our logic that determines that the storm has
      ended is incorrect. We announce the end, re-enable interrupts and
      realize that the storm is still going on, so we switch back to
      polling mode. Rinse, repeat.
      
      When a storm happens we disable signaling of errors via CMCI and begin
      polling machine check banks instead. If we find any logged errors,
      then we need to set a per-cpu flag so that our per-cpu tests that
      check whether the storm is ongoing will see that errors are still
      being logged independently of whether mce_notify_irq() says that the
      error has been fully processed.
      
      cmci_clear() is not the right tool to disable a bank. It disables the
      interrupt for the bank as desired, but it also clears the bit for
      this bank in "mce_banks_owned" so we will skip the bank when polling
      (so we fail to see that the storm continues because we stop looking).
      New cmci_storm_disable_banks() just disables the interrupt while
      allowing polling to continue.
      Reported-by: NWilliam Dauchy <wdauchy@gmail.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      27f6c573
  11. 20 3月, 2014 1 次提交
    • S
      x86, mce: Fix CPU hotplug callback registration · 82a8f131
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the mce code in x86 by using this latter form of callback registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      82a8f131
  12. 12 1月, 2014 1 次提交
    • B
      x86, mce: Fix mce_start_timer semantics · 4f75d841
      Borislav Petkov 提交于
      So mce_start_timer() has a 'cpu' argument which is supposed to mean to
      start a timer on that cpu. However, the code currently starts a timer on
      the *current* cpu the function runs on and causes the sanity-check in
      mce_timer_fn to fire:
      
      	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn
      
      because it is running on the wrong cpu.
      
      This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
      all the cpus in succession.
      
      Then, we were fiddling with the CMCI storm settings when starting the
      timer whereas there's no need for that - if there's storm happening
      on this newly restarted cpu, we're going to be in normal CMCI mode
      initially and then when the CMCI interrupt starts firing, we're going to
      go to the polling mode with the timer real soon.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Reviewed-by: NChen, Gong <gong.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
      4f75d841
  13. 30 11月, 2013 1 次提交
  14. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  15. 09 7月, 2013 1 次提交
    • N
      mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC · c3d1fb56
      Naveen N. Rao 提交于
      The Corrected Machine Check structure (CMC) in HEST has a flag which can be
      set by the firmware to indicate to the OS that it prefers to process the
      corrected error events first. In this scenario, the OS is expected to not
      monitor for corrected errors (through CMCI/polling). Instead, the firmware
      notifies the OS on corrected error events through GHES.
      
      Linux already has support for GHES. This patch adds support for parsing CMC
      structure and to disable CMCI/polling if the firmware first flag is set.
      
      Further, the list of machine check bank structures at the end of CMC is used
      to determine which MCA banks function in FF mode, so that we continue to
      monitor error events on the other banks.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      c3d1fb56
  16. 26 6月, 2013 1 次提交
  17. 03 4月, 2013 1 次提交
    • S
      x86/mce: Rework cmci_rediscover() to play well with CPU hotplug · 7a0c819d
      Srivatsa S. Bhat 提交于
      Dave Jones reports that offlining a CPU leads to this trace:
      
      numa_remove_cpu cpu 1 node 0: mask now 0,2-3
      smpboot: CPU 1 is now offline
      BUG: using smp_processor_id() in preemptible [00000000] code:
      cpu-offline.sh/10591
      caller is cmci_rediscover+0x6a/0xe0
      Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
      Call Trace:
       [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
       [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
       [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
       [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
       [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
       [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
       [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
       [<ffffffff815ef082>] _cpu_down+0x302/0x350
       [<ffffffff815ef106>] cpu_down+0x36/0x50
       [<ffffffff815f1c9d>] store_online+0x8d/0xd0
       [<ffffffff813edc48>] dev_attr_store+0x18/0x30
       [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
       [<ffffffff811adfb2>] vfs_write+0xa2/0x170
       [<ffffffff811ae16c>] sys_write+0x4c/0xa0
       [<ffffffff81613019>] system_call_fastpath+0x16/0x1b
      
      However, a look at cmci_rediscover shows that it can be simplified quite
      a bit, apart from solving the above issue. It invokes functions that
      take spin locks with interrupts disabled, and hence it can run in atomic
      context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
      is already dead and out of the cpu_online_mask. So take these points into
      account and simplify the code, and thereby also fix the above issue.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      7a0c819d
  18. 21 1月, 2013 1 次提交
  19. 29 12月, 2012 1 次提交
    • T
      x86/mce: don't use [delayed_]work_pending() · 4d899be5
      Tejun Heo 提交于
      There's no need to test whether a (delayed) work item in pending
      before queueing, flushing or cancelling it.  Most uses are unnecessary
      and quite a few of them are buggy.
      
      Remove unnecessary pending tests from x86/mce.  Only compile tested.
      
      v2: Local var work removed from mce_schedule_work() as suggested by
          Borislav.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac@vger.kernel.org
      4d899be5
  20. 26 10月, 2012 4 次提交
  21. 19 10月, 2012 1 次提交
  22. 28 9月, 2012 1 次提交
  23. 10 8月, 2012 1 次提交
    • C
      x86/mce: Add CMCI poll mode · 55babd8f
      Chen Gong 提交于
      On Intel systems corrected machine check interrupts (CMCI) may be sent to
      multiple logical processors; possibly to all processors on the affected
      socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface").  This means
      that a persistent error (such as a stuck bit in ECC memory) may cause
      a storm of interrupts that greatly hinders or prevents forward progress
      (probably on many processors).
      
      To solve this we keep track of the rate at which each processor sees
      CMCI. If we exceed a threshold, we disable CMCI delivery and switch to
      polling the machine check banks. If the storm subsides (none of the
      affected processors see any more errors for a complete poll interval) we
      re-enable CMCI.
      
      [Tony: Added console messages when storm begins/ends and increased storm
      threshold from 5 to 15 so we have a few more logged entries before we
      disable interrupts and start dropping reports]
      Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NChen Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      55babd8f
  24. 04 8月, 2012 2 次提交
  25. 26 7月, 2012 1 次提交
  26. 20 7月, 2012 1 次提交
  27. 12 7月, 2012 1 次提交
    • T
      x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults · 6751ed65
      Tony Luck 提交于
      In commit dad1743e ("x86/mce: Only restart instruction after machine
      check recovery if it is safe") we fixed mce_notify_process() to force a
      signal to the current process if it was not restartable (RIPV bit not
      set in MCG_STATUS). But doing it here means that the process doesn't
      get told the virtual address of the fault via siginfo_t->si_addr. This
      would prevent application level recovery from the fault.
      
      Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
      that we will provide the right information with the signal.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org    # 3.4+
      6751ed65
  28. 06 6月, 2012 4 次提交
  29. 31 5月, 2012 1 次提交
  30. 24 5月, 2012 1 次提交
  31. 23 5月, 2012 1 次提交