1. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  2. 28 6月, 2013 1 次提交
  3. 26 6月, 2013 1 次提交
  4. 21 6月, 2013 2 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
    • S
      x86, trace: Introduce entering/exiting_irq() · eddc0e92
      Seiji Aguchi 提交于
      When implementing tracepoints in interrupt handers, if the tracepoints are
      simply added in the performance sensitive path of interrupt handers,
      it may cause potential performance problem due to the time penalty.
      
      To solve the problem, an idea is to prepare non-trace/trace irq handers and
      switch their IDTs at the enabling/disabling time.
      
      So, let's introduce entering_irq()/exiting_irq() for pre/post-
      processing of each irq handler.
      
      A way to use them is as follows.
      
      Non-trace irq handler:
      smp_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      Trace irq_handler:
      smp_trace_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	trace_irq_entry();	/* tracepoint for irq entry */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	trace_irq_exit();	/* tracepoint for irq exit */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      If tracepoints can place outside entering_irq()/exiting_irq() as follows,
      it looks cleaner.
      
      smp_trace_irq_handler()
      {
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      }
      
      But it doesn't work.
      The problem is with irq_enter/exit() being called. They must be called before
      trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
      any tracepoints are used, as tracepoints use  rcu to synchronize.
      
      As a possible alternative, we may be able to call irq_enter() first as follows
      if irq_enter() can nest.
      
      smp_trace_irq_hander()
      {
      	irq_entry();
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      	irq_exit();
      }
      
      But it doesn't work, either.
      If irq_enter() is nested, it may have a time penalty because it has to check if it
      was already called or not. The time penalty is not desired in performance sensitive
      paths even if it is tiny.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eddc0e92
  5. 15 6月, 2013 2 次提交
  6. 13 6月, 2013 1 次提交
  7. 05 6月, 2013 1 次提交
  8. 03 4月, 2013 1 次提交
    • S
      x86/mce: Rework cmci_rediscover() to play well with CPU hotplug · 7a0c819d
      Srivatsa S. Bhat 提交于
      Dave Jones reports that offlining a CPU leads to this trace:
      
      numa_remove_cpu cpu 1 node 0: mask now 0,2-3
      smpboot: CPU 1 is now offline
      BUG: using smp_processor_id() in preemptible [00000000] code:
      cpu-offline.sh/10591
      caller is cmci_rediscover+0x6a/0xe0
      Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
      Call Trace:
       [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
       [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
       [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
       [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
       [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
       [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
       [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
       [<ffffffff815ef082>] _cpu_down+0x302/0x350
       [<ffffffff815ef106>] cpu_down+0x36/0x50
       [<ffffffff815f1c9d>] store_online+0x8d/0xd0
       [<ffffffff813edc48>] dev_attr_store+0x18/0x30
       [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
       [<ffffffff811adfb2>] vfs_write+0xa2/0x170
       [<ffffffff811ae16c>] sys_write+0x4c/0xa0
       [<ffffffff81613019>] system_call_fastpath+0x16/0x1b
      
      However, a look at cmci_rediscover shows that it can be simplified quite
      a bit, apart from solving the above issue. It invokes functions that
      take spin locks with interrupts disabled, and hence it can run in atomic
      context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
      is already dead and out of the cpu_online_mask. So take these points into
      account and simplify the code, and thereby also fix the above issue.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      7a0c819d
  9. 22 3月, 2013 2 次提交
  10. 21 1月, 2013 1 次提交
  11. 29 12月, 2012 1 次提交
    • T
      x86/mce: don't use [delayed_]work_pending() · 4d899be5
      Tejun Heo 提交于
      There's no need to test whether a (delayed) work item in pending
      before queueing, flushing or cancelling it.  Most uses are unnecessary
      and quite a few of them are buggy.
      
      Remove unnecessary pending tests from x86/mce.  Only compile tested.
      
      v2: Local var work removed from mce_schedule_work() as suggested by
          Borislav.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac@vger.kernel.org
      4d899be5
  12. 31 10月, 2012 1 次提交
    • T
      x86/mce: Do not change worker's running cpu in cmci_rediscover(). · 85b97637
      Tang Chen 提交于
      cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
      running cpu, and migrate itself to the dest cpu. But worker processes are not
      allowed to be migrated. If current is a worker, the worker will be migrated to
      another cpu, but the corresponding  worker_pool is still on the original cpu.
      
      In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
      BUG_ON(rq != this_rq());
      
      This will cause the kernel panic. The call trace is like the following:
      
      [ 6155.451107] ------------[ cut here ]------------
      [ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
      ......
      [ 6155.452019] RIP: 0010:[<ffffffff810add15>]  [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
      ......
      [ 6155.452019] Call Trace:
      [ 6155.452019]  [<ffffffff8166fc14>] __schedule+0x764/0x880
      [ 6155.452019]  [<ffffffff81670059>] schedule+0x29/0x70
      [ 6155.452019]  [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
      [ 6155.452019]  [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
      [ 6155.452019]  [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
      [ 6155.452019]  [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
      [ 6155.452019]  [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
      [ 6155.452019]  [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
      [ 6155.452019]  [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
      [ 6155.452019]  [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
      [ 6155.452019]  [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
      [ 6155.452019]  [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
      [ 6155.452019]  [<ffffffff810a6ab8>] ? complete+0x28/0x60
      [ 6155.452019]  [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
      [ 6155.452019]  [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
      [ 6155.452019]  [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
      [ 6155.452019]  [<ffffffff81676187>] notifier_call_chain+0x67/0x150
      [ 6155.452019]  [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
      [ 6155.452019]  [<ffffffff81070470>] __cpu_notify+0x20/0x40
      [ 6155.452019]  [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
      [ 6155.452019]  [<ffffffff81655182>] _cpu_down+0x262/0x2e0
      [ 6155.452019]  [<ffffffff81655236>] cpu_down+0x36/0x50
      [ 6155.452019]  [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
      [ 6155.452019]  [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
      [ 6155.452019]  [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
      [ 6155.452019]  [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
      [ 6155.452019]  [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
      [ 6155.452019]  [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
      [ 6155.452019]  [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
      [ 6155.452019]  [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
      [ 6155.452019]  [<ffffffff81090589>] process_one_work+0x219/0x680
      [ 6155.452019]  [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
      [ 6155.452019]  [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
      [ 6155.452019]  [<ffffffff810923be>] worker_thread+0x12e/0x320
      [ 6155.452019]  [<ffffffff81092290>] ? manage_workers+0x110/0x110
      [ 6155.452019]  [<ffffffff81098396>] kthread+0xc6/0xd0
      [ 6155.452019]  [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
      [ 6155.452019]  [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
      [ 6155.452019]  [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
      [ 6155.452019]  [<ffffffff8167c4c0>] ? gs_change+0x13/0x13
      
      This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
      jobs onto all the other cpus using system_wq. This could bring some delay for
      the jobs.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      85b97637
  13. 30 10月, 2012 1 次提交
  14. 26 10月, 2012 4 次提交
  15. 19 10月, 2012 1 次提交
  16. 18 10月, 2012 1 次提交
  17. 28 9月, 2012 1 次提交
  18. 10 8月, 2012 2 次提交
    • C
      x86/mce: Add CMCI poll mode · 55babd8f
      Chen Gong 提交于
      On Intel systems corrected machine check interrupts (CMCI) may be sent to
      multiple logical processors; possibly to all processors on the affected
      socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface").  This means
      that a persistent error (such as a stuck bit in ECC memory) may cause
      a storm of interrupts that greatly hinders or prevents forward progress
      (probably on many processors).
      
      To solve this we keep track of the rate at which each processor sees
      CMCI. If we exceed a threshold, we disable CMCI delivery and switch to
      polling the machine check banks. If the storm subsides (none of the
      affected processors see any more errors for a complete poll interval) we
      re-enable CMCI.
      
      [Tony: Added console messages when storm begins/ends and increased storm
      threshold from 5 to 15 so we have a few more logged entries before we
      disable interrupts and start dropping reports]
      Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NChen Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      55babd8f
    • T
      x86/mce: Make cmci_discover() quiet · 4670a300
      Tony Luck 提交于
      cmci_discover() works out which machine check banks support CMCI, and
      which of those are shared by multiple logical processors. It uses this
      information to ensure that exactly one cpu is designated the owner of
      each bank so that when interrupts are broadcast to multiple cpus, only one
      of them will look in a shared bank to log the error and clear the bank.
      
      At boot time cmci_discover() performs this task silently. But during
      certain cpu hotplug operations it prints out a set of summary lines
      like this:
      
      CPU 35 MCA banks CMCI:0 CMCI:1 CMCI:3 CMCI:5 CMCI:6 CMCI:7 CMCI:8 CMCI:9 CMCI:10 CMCI:11
      CPU 1 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 39 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 38 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 32 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 37 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 36 MCA banks CMCI:0 CMCI:1 CMCI:3
      CPU 34 MCA banks CMCI:0 CMCI:1 CMCI:3
      
      The value of these messages seems very low. A user might painstakingly
      cross-check against the data sheet for a processor to ensure that all
      CMCI supported banks are correctly reported, but this seems improbable.
      If users really wanted to do this, we should print the information at
      boot time too.
      
      Remove the messages.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4670a300
  19. 04 8月, 2012 4 次提交
  20. 26 7月, 2012 2 次提交
  21. 20 7月, 2012 2 次提交
  22. 12 7月, 2012 1 次提交
    • T
      x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults · 6751ed65
      Tony Luck 提交于
      In commit dad1743e ("x86/mce: Only restart instruction after machine
      check recovery if it is safe") we fixed mce_notify_process() to force a
      signal to the current process if it was not restartable (RIPV bit not
      set in MCG_STATUS). But doing it here means that the process doesn't
      get told the virtual address of the fault via siginfo_t->si_addr. This
      would prevent application level recovery from the fault.
      
      Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
      that we will provide the right information with the signal.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org    # 3.4+
      6751ed65
  23. 07 6月, 2012 6 次提交
    • B
      x86, MCE, AMD: Update copyrights and boilerplate · 11122570
      Borislav Petkov 提交于
      Jacob is doing something else now so add myself as the loser who
      provides support.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      11122570
    • B
      x86, MCE, AMD: Give proper names to the thresholding banks · 336d335a
      Borislav Petkov 提交于
      Having the banks numbered is ok but having real names which mean
      something to the user makes a lot more sense:
      
       /sys/devices/system/machinecheck/machinecheck0/
       |-- bank0
       |-- bank1
       |-- bank2
       |-- bank3
       |-- bank4
       |-- bank5
       |-- bank6
       |-- check_interval
       |-- cmci_disabled
       |-- combined_unit
       |   |-- combined_unit
       |       |-- error_count
       |       |-- threshold_limit
       |-- dont_log_ce
       |-- execution_unit
       |   |-- execution_unit
       |       |-- error_count
       |       |-- threshold_limit
       |-- ignore_ce
       |-- insn_fetch
       |   |-- insn_fetch
       |       |-- error_count
       |       |-- threshold_limit
       |-- load_store
       |   |-- load_store
       |       |-- error_count
       |       |-- threshold_limit
       |-- monarch_timeout
       |-- northbridge
       |   |-- dram
       |   |   |-- error_count
       |   |   |-- interrupt_enable
       |   |   |-- threshold_limit
       |   |-- ht_links
       |   |   |-- error_count
       |   |   |-- interrupt_enable
       |   |   |-- threshold_limit
       |   |-- l3_cache
       |       |-- error_count
       |       |-- interrupt_enable
       |       |-- threshold_limit
      ...
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      336d335a
    • B
      x86, MCE, AMD: Make error_count read only · 6e927361
      Borislav Petkov 提交于
      Until now, writing to error count caused the code to reset the
      thresholding bank to the current thresholding limit and start counting
      errors from the beginning.
      
      This is misleading and unclear, and can be accomplished by writing the
      old thresholding limit into ->threshold_limit.
      
      Make error_count read-only with the functionality to show the current
      error count.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      6e927361
    • B
      x86, MCE, AMD: Cleanup reading of error_count · 2c9c42fa
      Borislav Petkov 提交于
      We have rdmsr_on_cpu() now so remove locally defined solution in favor
      of the generic one.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      2c9c42fa
    • B
      x86, MCE, AMD: Print decimal thresholding values · 18c20f37
      Borislav Petkov 提交于
      If one sets the threshold limit, say to 25:
      
      $ echo 25 > machinecheck0/threshold_bank4/misc0/threshold_limit
      
      and then reads it back again, it gives
      
      $ cat machinecheck0/threshold_bank4/misc0/threshold_limit
      19
      
      which is actually 0x19 but we don't know that.
      
      Make all output decimal.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      18c20f37
    • B
      x86, MCE, AMD: Move shared bank to node descriptor · 019f34fc
      Borislav Petkov 提交于
      Well, instead of having a real bank 4 on the BSP of each node and
      symlinks on the remaining cores, we push it up into the amd_northbridge
      descriptor which now contains a pointer to the northbridge bank 4
      because the bank is one per northbridge and, as such, belongs in the NB
      descriptor anyway.
      
      Each time we hotplug CPUs, we use the northbridge pointer to copy the
      shared bank into the per-CPU array of threshold_banks pointers, or
      destroy it when the last CPU on the node goes offline, or create it when
      the first comes online.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      019f34fc