1. 21 6月, 2013 3 次提交
    • S
      trace,x86: Move creation of irq tracepoints from apic.c to irq.c · 83ab8514
      Steven Rostedt (Red Hat) 提交于
      Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
      compiled, and the irq tracepoints will not be created via the
      CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
      we get the following build error:
      
        LD      init/built-in.o
      arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
      linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
      arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
      linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
      arch/x86/built-in.o: In function `trace_irq_work_entry':
      linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
      arch/x86/built-in.o: In function `trace_irq_work_exit':
      linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
      arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
      arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
      arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
      arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
      make[1]: *** [vmlinux] Error 1
      make: *** [sub-make] Error 2
      
      As irq.c is always compiled for x86, it is a more appropriate location
      to create the irq tracepoints.
      
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      83ab8514
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
    • S
      x86, trace: Introduce entering/exiting_irq() · eddc0e92
      Seiji Aguchi 提交于
      When implementing tracepoints in interrupt handers, if the tracepoints are
      simply added in the performance sensitive path of interrupt handers,
      it may cause potential performance problem due to the time penalty.
      
      To solve the problem, an idea is to prepare non-trace/trace irq handers and
      switch their IDTs at the enabling/disabling time.
      
      So, let's introduce entering_irq()/exiting_irq() for pre/post-
      processing of each irq handler.
      
      A way to use them is as follows.
      
      Non-trace irq handler:
      smp_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      Trace irq_handler:
      smp_trace_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	trace_irq_entry();	/* tracepoint for irq entry */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	trace_irq_exit();	/* tracepoint for irq exit */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      If tracepoints can place outside entering_irq()/exiting_irq() as follows,
      it looks cleaner.
      
      smp_trace_irq_handler()
      {
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      }
      
      But it doesn't work.
      The problem is with irq_enter/exit() being called. They must be called before
      trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
      any tracepoints are used, as tracepoints use  rcu to synchronize.
      
      As a possible alternative, we may be able to call irq_enter() first as follows
      if irq_enter() can nest.
      
      smp_trace_irq_hander()
      {
      	irq_entry();
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      	irq_exit();
      }
      
      But it doesn't work, either.
      If irq_enter() is nested, it may have a time penalty because it has to check if it
      was already called or not. The time penalty is not desired in performance sensitive
      paths even if it is tiny.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eddc0e92
  2. 20 2月, 2013 1 次提交
  3. 28 1月, 2013 2 次提交
  4. 02 11月, 2012 1 次提交
  5. 19 9月, 2012 1 次提交
  6. 16 7月, 2012 1 次提交
  7. 14 6月, 2012 3 次提交
  8. 08 6月, 2012 2 次提交
    • A
      x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask · 4988a40c
      Alexander Gordeev 提交于
      Currently cpu_mask_to_apicid() should not get a offline CPU with
      the cpumask. Otherwise some apic drivers might try to access
      non-existent per-cpu variables (i.e. x2apic). In that regard
      cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations are
      inconsistent.
      
      This fix makes the two operations do not rely on calling
      functions and always return the apicid for only online CPUs. As
      result, the meaning and implementations of cpu_mask_to_apicid()
      and cpu_mask_to_apicid_and() operations become straight.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/20120607131624.GG4759@dhcp-26-207.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4988a40c
    • A
      x86/apic: Make cpu_mask_to_apicid() operations return error code · ff164324
      Alexander Gordeev 提交于
      Current cpu_mask_to_apicid() and cpu_mask_to_apicid_and()
      implementations have few shortcomings:
      
      1. A value returned by cpu_mask_to_apicid() is written to
      hardware registers unconditionally. Should BAD_APICID get ever
      returned it will be written to a hardware too. But the value of
      BAD_APICID is not universal across all hardware in all modes and
      might cause unexpected results, i.e. interrupts might get routed
      to CPUs that are not configured to receive it.
      
      2. Because the value of BAD_APICID is not universal it is
      counter- intuitive to return it for a hardware where it does not
      make sense (i.e. x2apic).
      
      3. cpu_mask_to_apicid_and() operation is thought as an
      complement to cpu_mask_to_apicid() that only applies a AND mask
      on top of a cpumask being passed. Yet, as consequence of 18374d89
      commit the two operations are inconsistent in that of:
        cpu_mask_to_apicid() should not get a offline CPU with the cpumask
        cpu_mask_to_apicid_and() should not fail and return BAD_APICID
      These limitations are impossible to realize just from looking at
      the operations prototypes.
      
      Most of these shortcomings are resolved by returning a error
      code instead of BAD_APICID. As the result, faults are reported
      back early rather than possibilities to cause a unexpected
      behaviour exist (in case of [1]).
      
      The only exception is setup_timer_IRQ0_pin() routine. Although
      obviously controversial to this fix, its existing behaviour is
      preserved to not break the fragile check_timer() and would
      better addressed in a separate fix.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/20120607131559.GF4759@dhcp-26-207.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff164324
  9. 06 6月, 2012 1 次提交
  10. 07 5月, 2012 5 次提交
  11. 19 4月, 2012 1 次提交
  12. 29 3月, 2012 1 次提交
    • R
      x86/apic/amd: Be more verbose about LVT offset assignments · 8abc3122
      Robert Richter 提交于
      Add information about LVT offset assignments to better debug firmware
      bugs related to this. See following examples.
      
       # dmesg | grep -i 'offset\|ibs'
       LVT offset 0 assigned for vector 0xf9
       [Firmware Bug]: cpu 0, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
       [Firmware Bug]: cpu 0, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
       Failed to setup IBS, -22
      
      In this case the BIOS assigns both offsets for MCE (0xf9) and IBS
      (0x400) vectors to offset 0, which is why the second APIC setup (IBS)
      failed.
      
      With correct setup you get:
      
       # dmesg | grep -i 'offset\|ibs'
       LVT offset 0 assigned for vector 0xf9
       LVT offset 1 assigned for vector 0x400
       IBS: LVT offset 1 assigned
       perf: AMD IBS detected (0x00000007)
       oprofile: AMD IBS detected (0x00000007)
      
      Note: The vector includes also the message type to handle also NMIs
      (0x400). In the firmware bug message the format is the same as of the
      APIC500 register and includes the mask bit (bit 16) in addition.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8abc3122
  13. 24 12月, 2011 2 次提交
  14. 18 12月, 2011 1 次提交
  15. 14 12月, 2011 1 次提交
    • F
      x86: Add per-cpu stat counter for APIC ICR read tries · 346b46be
      Fernando Luis Vázquez Cao 提交于
      In the IPI delivery slow path (NMI delivery) we retry the ICR
      read to check for delivery completion a limited number of times.
      
      [ The reason for the limited retries is that some of the places
        where it is used (cpu boot, kdump, etc) IPI delivery might not
        succeed (due to a firmware bug or system crash, for example)
        and in such a case it is better to give up and resume
        execution of other code. ]
      
      This patch adds a new entry to /proc/interrupts, RTR, which
      tells user space the number of times we retried the ICR read in
      the IPI delivery slow path.
      
      This should give some insight into how well the APIC
      message delivery hardware is working - if the counts are way
      too large then we are hitting a (very-) slow path way too
      often.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Cc: Jörn Engel <joern@logfs.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/n/tip-vzsp20lo2xdzh5f70g0eis2s@git.kernel.org
      [ extended the changelog ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      346b46be
  16. 12 12月, 2011 1 次提交
    • F
      x86: Call idle notifier after irq_enter() · 98ad1cc1
      Frederic Weisbecker 提交于
      Interrupts notify the idle exit state before calling irq_enter().
      But the notifier code calls rcu_read_lock() and this is not
      allowed while rcu is in an extended quiescent state. We need
      to wait for irq_enter() -> rcu_idle_exit() to be called before
      doing so otherwise this results in a grumpy RCU:
      
      [    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    0.099991] Hardware name: AMD690VM-FMH
      [    0.099991] Modules linked in:
      [    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
      [    0.099991] Call Trace:
      [    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
      [    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
      [    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
      [    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
      [    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
      [    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
      [    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
      [    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
      [    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
      [    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
      [    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
      [    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
      [    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Henroid <andrew.d.henroid@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      98ad1cc1
  17. 10 11月, 2011 1 次提交
  18. 21 9月, 2011 2 次提交
  19. 27 7月, 2011 1 次提交
  20. 13 7月, 2011 1 次提交
  21. 09 7月, 2011 1 次提交
    • V
      x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit · 14cb6dcf
      Vivek Goyal 提交于
      nr_cpus allows one to specify number of possible cpus in the system.
      Current assumption seems to be that first cpu to show up is boot cpu
      and this assumption will be broken in kdump scenario where we can be
      booting on a non boot cpu with nr_cpus=1.
      
      It might happen that first cpu we parse is not the cpu we boot on and
      later we ignore boot cpu. Though code later seems to recognize this
      anomaly and forcibly sets boot cpu in physical cpu map with following
      warning.
      
      if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
              printk(KERN_WARNING
                      "weird, boot CPU (#%d) not listed by the BIOS.\n",
                      hard_smp_processor_id());
      
              physid_set(hard_smp_processor_id(), phys_cpu_present_map);
      }
      
      This patch waits for boot cpu to show up and starts ignoring the cpus
      once we have hit (nr_cpus - 1) number of cpus. So effectively we are
      reserving one slot out of nr_cpus for boot cpu explicitly.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/20110708171926.GF2930@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      14cb6dcf
  22. 09 6月, 2011 2 次提交
  23. 30 5月, 2011 1 次提交
  24. 20 5月, 2011 1 次提交
  25. 02 5月, 2011 2 次提交
    • T
      x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional · 84914ed0
      Tejun Heo 提交于
      NUMAQ is the only meaningful user of this callback and
      setup_local_APIC() the only callsite.  Stop torturing everyone else by
      making the callback optional and removing all the boilerplate
      implementations and assignments.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      84914ed0
    • T
      x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() · c4b90c11
      Tejun Heo 提交于
      Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid ->
      node mapping using set_apicid_to_node() during NUMA init but implement
      custom apic->x86_32_numa_cpu_node() instead.
      
      This patch automatically initializes the default apic -> node mapping
      table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such
      that the mapping table is in sync with the actual mapping.
      
      As the table isn't used by custom implementations, this doesn't make
      any difference at this point.  This is in preparation of unifying
      numa_cpu_node() between x86-32 and 64.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      c4b90c11
  26. 20 4月, 2011 1 次提交
    • Y
      x86, apic: Print verbose error interrupt reason on apic=debug · 2b398bd9
      Youquan Song 提交于
      End users worry about the error interrupt printout we generate
      currently:
      
      	pr_debug("APIC error on CPU%d: %02x(%02x)\n",
      	smp_processor_id(), v , v1);
      
      ... and would like to know the reason why error interrupts are generated.
      
      This patch prints out more detailed debug information.
      
      Another practical problem is that dynamic debug is not initialized yet
      when the APIC initializes, so the pr_debug() will not output the error
      interrupt debug information on bootup. In this patch, we use
      apic_printk(APIC_DEBUG, ...), so the apic=debug boot option will print
      verbose error interupts during bootup.
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: hpa@linux.intel.com
      Cc: suresh.b.siddha@intel.com
      Cc: yong.y.wang@linux.intel.com
      Cc: jbaron@redhat.com
      Cc: trenn@suse.de
      Cc: kent.liu@intel.com
      Cc: chaohong.guo@intel.com
      Link: http://lkml.kernel.org/r/1302762968-24380-2-git-send-email-youquan.song@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2b398bd9