1. 08 12月, 2014 1 次提交
  2. 20 11月, 2014 2 次提交
    • C
      x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll · fa92c586
      Chen Yucong 提交于
      Uncorrected no action required (UCNA) - is a uncorrected recoverable
      machine check error that is not signaled via a machine check exception
      and, instead, is reported to system software as a corrected machine
      check error. UCNA errors indicate that some data in the system is
      corrupted, but the data has not been consumed and the processor state
      is valid and you may continue execution on this processor. UCNA errors
      require no action from system software to continue execution. Note that
      UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
      (MCG_SER_P) is set.
                                                     -- Intel SDM Volume 3B
      
      Deferred errors are errors that cannot be corrected by hardware, but
      do not cause an immediate interruption in program flow, loss of data
      integrity, or corruption of processor state. These errors indicate
      that data has been corrupted but not consumed. Hardware writes information
      to the status and address registers in the corresponding bank that
      identifies the source of the error if deferred errors are enabled for
      logging. Deferred errors are not reported via machine check exceptions;
      they can be seen by polling the MCi_STATUS registers.
                                                      -- AMD64 APM Volume 2
      
      Above two items, both UCNA and Deferred errors belong to detected
      errors, but they can't be corrected by hardware, and this is very
      similar to Software Recoverable Action Optional (SRAO) errors.
      Therefore, we can take some actions that have been used for handling
      SRAO errors to handle UCNA and Deferred errors.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fa92c586
    • C
      x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error · e3480271
      Chen Yucong 提交于
      Until now, the mce_severity mechanism can only identify the severity
      of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
      out DEFERRED error for AMD platform.
      
      This patch extends the mce_severity mechanism for handling
      UCNA/DEFERRED error. In order to do this, the patch introduces a new
      severity level - MCE_UCNA/DEFERRED_SEVERITY.
      
      In addition, mce_severity is specific to machine check exception,
      and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
      mechanism in non-exception context, the patch also introduces a new
      argument (is_excp) for mce_severity. `is_excp' is used to explicitly
      specify the calling context of mce_severity.
      Reviewed-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e3480271
  3. 01 11月, 2014 1 次提交
  4. 22 10月, 2014 4 次提交
  5. 19 9月, 2014 1 次提交
  6. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  7. 09 8月, 2014 1 次提交
  8. 06 8月, 2014 1 次提交
    • T
      x86: MCE: Add raw_lock conversion again · ed5c41d3
      Thomas Gleixner 提交于
      Commit ea431643 ("x86/mce: Fix CMCI preemption bugs") breaks RT by
      the completely unrelated conversion of the cmci_discover_lock to a
      regular (non raw) spinlock.  This lock was annotated in commit
      59d958d2 ("locking, x86: mce: Annotate cmci_discover_lock as raw")
      with a proper explanation why.
      
      The argument for converting the lock back to a regular spinlock was:
      
       - it does percpu ops without disabling preemption. Preemption is not
         disabled due to the mistaken use of a raw spinlock.
      
      Which is complete nonsense.  The raw_spinlock is disabling preemption in
      the same way as a regular spinlock.  In mainline spinlock maps to
      raw_spinlock, in RT spinlock becomes a "sleeping" lock.
      
      raw_spinlock has on RT exactly the same semantics as in mainline.  And
      because this lock is taken in non preemptible context it must be raw on
      RT.
      
      Undo the locking brainfart.
      Reported-by: NClark Williams <williams@redhat.com>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed5c41d3
  9. 22 7月, 2014 1 次提交
  10. 24 6月, 2014 1 次提交
  11. 23 6月, 2014 1 次提交
  12. 05 6月, 2014 1 次提交
  13. 31 5月, 2014 2 次提交
  14. 06 5月, 2014 1 次提交
  15. 17 4月, 2014 1 次提交
  16. 29 3月, 2014 1 次提交
    • C
      x86, CMCI: Add proper detection of end of CMCI storms · 27f6c573
      Chen, Gong 提交于
      When CMCI storm persists for a long time(at least beyond predefined
      threshold. It's 30 seconds for now), we can watch CMCI storm is
      detected immediately after it subsides.
      
      ...
      Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
      Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
      Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
      ...
      
      The problem is that our logic that determines that the storm has
      ended is incorrect. We announce the end, re-enable interrupts and
      realize that the storm is still going on, so we switch back to
      polling mode. Rinse, repeat.
      
      When a storm happens we disable signaling of errors via CMCI and begin
      polling machine check banks instead. If we find any logged errors,
      then we need to set a per-cpu flag so that our per-cpu tests that
      check whether the storm is ongoing will see that errors are still
      being logged independently of whether mce_notify_irq() says that the
      error has been fully processed.
      
      cmci_clear() is not the right tool to disable a bank. It disables the
      interrupt for the bank as desired, but it also clears the bit for
      this bank in "mce_banks_owned" so we will skip the bank when polling
      (so we fail to see that the storm continues because we stop looking).
      New cmci_storm_disable_banks() just disables the interrupt while
      allowing polling to continue.
      Reported-by: NWilliam Dauchy <wdauchy@gmail.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      27f6c573
  17. 20 3月, 2014 3 次提交
    • S
      x86, therm_throt.c: Remove unused therm_cpu_lock · 7b7139d4
      Srivatsa S. Bhat 提交于
      After fixing the CPU hotplug callback registration code, the callbacks
      invoked for each online CPU, during the initialization phase in
      thermal_throttle_init_device(), can no longer race with the actual CPU
      hotplug notifier callbacks (in thermal_throttle_cpu_callback). Hence the
      therm_cpu_lock is unnecessary now. Remove it.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7b7139d4
    • S
      x86, therm_throt.c: Fix CPU hotplug callback registration · 4e6192bb
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the thermal throttle code in x86 by using this latter form of callback
      registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e6192bb
    • S
      x86, mce: Fix CPU hotplug callback registration · 82a8f131
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the mce code in x86 by using this latter form of callback registration.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      82a8f131
  18. 12 1月, 2014 1 次提交
    • B
      x86, mce: Fix mce_start_timer semantics · 4f75d841
      Borislav Petkov 提交于
      So mce_start_timer() has a 'cpu' argument which is supposed to mean to
      start a timer on that cpu. However, the code currently starts a timer on
      the *current* cpu the function runs on and causes the sanity-check in
      mce_timer_fn to fire:
      
      	WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn
      
      because it is running on the wrong cpu.
      
      This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
      all the cpus in succession.
      
      Then, we were fiddling with the CMCI storm settings when starting the
      timer whereas there's no need for that - if there's storm happening
      on this newly restarted cpu, we're going to be in normal CMCI mode
      initially and then when the CMCI interrupt starts firing, we're going to
      go to the polling mode with the timer real soon.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Reviewed-by: NChen, Gong <gong.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
      4f75d841
  19. 07 1月, 2014 1 次提交
  20. 21 12月, 2013 1 次提交
  21. 30 11月, 2013 1 次提交
  22. 24 10月, 2013 1 次提交
  23. 30 7月, 2013 1 次提交
  24. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  25. 09 7月, 2013 1 次提交
    • N
      mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC · c3d1fb56
      Naveen N. Rao 提交于
      The Corrected Machine Check structure (CMC) in HEST has a flag which can be
      set by the firmware to indicate to the OS that it prefers to process the
      corrected error events first. In this scenario, the OS is expected to not
      monitor for corrected errors (through CMCI/polling). Instead, the firmware
      notifies the OS on corrected error events through GHES.
      
      Linux already has support for GHES. This patch adds support for parsing CMC
      structure and to disable CMCI/polling if the firmware first flag is set.
      
      Further, the list of machine check bank structures at the end of CMC is used
      to determine which MCA banks function in FF mode, so that we continue to
      monitor error events on the other banks.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      c3d1fb56
  26. 28 6月, 2013 1 次提交
  27. 26 6月, 2013 1 次提交
  28. 21 6月, 2013 2 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
    • S
      x86, trace: Introduce entering/exiting_irq() · eddc0e92
      Seiji Aguchi 提交于
      When implementing tracepoints in interrupt handers, if the tracepoints are
      simply added in the performance sensitive path of interrupt handers,
      it may cause potential performance problem due to the time penalty.
      
      To solve the problem, an idea is to prepare non-trace/trace irq handers and
      switch their IDTs at the enabling/disabling time.
      
      So, let's introduce entering_irq()/exiting_irq() for pre/post-
      processing of each irq handler.
      
      A way to use them is as follows.
      
      Non-trace irq handler:
      smp_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      Trace irq_handler:
      smp_trace_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	trace_irq_entry();	/* tracepoint for irq entry */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	trace_irq_exit();	/* tracepoint for irq exit */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      If tracepoints can place outside entering_irq()/exiting_irq() as follows,
      it looks cleaner.
      
      smp_trace_irq_handler()
      {
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      }
      
      But it doesn't work.
      The problem is with irq_enter/exit() being called. They must be called before
      trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
      any tracepoints are used, as tracepoints use  rcu to synchronize.
      
      As a possible alternative, we may be able to call irq_enter() first as follows
      if irq_enter() can nest.
      
      smp_trace_irq_hander()
      {
      	irq_entry();
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      	irq_exit();
      }
      
      But it doesn't work, either.
      If irq_enter() is nested, it may have a time penalty because it has to check if it
      was already called or not. The time penalty is not desired in performance sensitive
      paths even if it is tiny.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eddc0e92
  29. 15 6月, 2013 2 次提交
  30. 13 6月, 2013 1 次提交
  31. 05 6月, 2013 1 次提交