1. 23 11月, 2009 6 次提交
    • P
      perf_events: Update the context time on exit · 5e942bb3
      Peter Zijlstra 提交于
      It appeared we did call update_event_times() on exit, but we
      failed to update the context time, which renders the former
      moot.
      
      Locking is a bit iffy, we call update_event_times under
      ctx->mutex instead of ctx->lock - the next patch fixes this.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091123103819.764207355@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e942bb3
    • P
      perf_events: Disable events when we detach them · 2e2af50b
      Peter Zijlstra 提交于
      If we leave the event in STATE_INACTIVE, any read of the event
      after the detach will increase the running count but not the
      enabled count and cause funny scaling artefacts.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091123103819.689055515@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e2af50b
    • P
      perf_events: Fix style nits · 6c2bfcbe
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091123103819.613427378@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6c2bfcbe
    • P
      perf_events: Undo copy/paste damage · a66a3052
      Peter Zijlstra 提交于
      We had two almost identical functions, avoid the duplication.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20091123103819.537537928@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a66a3052
    • I
      perf_events: Optimize the swcounter hotpath · a4234bfc
      Ingo Molnar 提交于
      The structure init creates a bit memcpy, which shows
      up big time in perf annotate output:
      
                :      ffffffff810a859d <__perf_sw_event>:
           1.68 :      ffffffff810a859d:       55                      push   %rbp
           1.69 :      ffffffff810a859e:       41 89 fa                mov    %edi,%r10d
           0.01 :      ffffffff810a85a1:       49 89 c9                mov    %rcx,%r9
           0.00 :      ffffffff810a85a4:       31 c0                   xor    %eax,%eax
           1.71 :      ffffffff810a85a6:       b9 16 00 00 00          mov    $0x16,%ecx
           0.00 :      ffffffff810a85ab:       48 89 e5                mov    %rsp,%rbp
           0.00 :      ffffffff810a85ae:       48 83 ec 60             sub    $0x60,%rsp
           1.52 :      ffffffff810a85b2:       48 8d 7d a0             lea    -0x60(%rbp),%rdi
          85.20 :      ffffffff810a85b6:       f3 ab                   rep stos %eax,%es:(%rdi)
      
      None of the callees depends on the structure being pre-initialized,
      so only initialize ->addr. This gets rid of the memcpy overhead.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4234bfc
    • I
      perf events: Do not generate function trace entries in perf code · 6e3d8330
      Ingo Molnar 提交于
      Decreases perf overhead when function tracing is enabled,
      by about 50%.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e3d8330
  2. 22 11月, 2009 5 次提交
    • I
      perf_events: Fix modular build · 645e8cc0
      Ingo Molnar 提交于
      Fix:
      
        ERROR: "perf_swevent_put_recursion_context" [fs/ext4/ext4.ko] undefined!
        ERROR: "perf_swevent_get_recursion_context" [fs/ext4/ext4.ko] undefined!
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1258864015-10579-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      645e8cc0
    • M
      perf_event: Remove redundant zero fill · 96b02d78
      Márton Németh 提交于
      The buffer is first zeroed out by memset(). Then strncpy() is
      used to fill the content. The strncpy() function also pads the
      string till the end of the specified length, which is redundant.
      The strncpy() does not ensures that the string will be properly
      closed with 0. Use strlcpy() instead.
      
      The semantic match that finds this kind of pattern is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression buffer;
      expression size;
      expression str;
      @@
      	memset(buffer, 0, size);
      	...
      -	strncpy(
      +	strlcpy(
      	buffer, str, sizeof(buffer)
      	);
      @@
      expression buffer;
      expression size;
      expression str;
      @@
      	memset(&buffer, 0, size);
      	...
      -	strncpy(
      +	strlcpy(
      	&buffer, str, sizeof(buffer));
      @@
      expression buffer;
      identifier field;
      expression size;
      expression str;
      @@
      	memset(buffer, 0, size);
      	...
      -	strncpy(
      +	strlcpy(
      	buffer->field, str, sizeof(buffer->field)
      	);
      @@
      expression buffer;
      identifier field;
      expression size;
      expression str;
      @@
      	memset(&buffer, 0, size);
      	...
      -	strncpy(
      +	strlcpy(
      	buffer.field, str, sizeof(buffer.field));
      // </smpl>
      
      On strncpy() vs strlcpy() see
      http://www.gratisoft.us/todd/papers/strlcpy.html .
      Signed-off-by: NMárton Németh <nm127@freemail.hu>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: cocci@diku.dk
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B086547.5040100@freemail.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      96b02d78
    • F
      hw-breakpoints: Remove x86 specific headers from core file · b3a75542
      Frederic Weisbecker 提交于
      Remove asm/processor.h and asm/debugreg.h as these headers are
      not used anymore in the hw-breakpoints core file.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      LKML-Reference: <1258863695-10464-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3a75542
    • F
      tracing: Forget about the NMI buffer for syscall events · 28889bf9
      Frederic Weisbecker 提交于
      We are never in an NMI context when we commit a syscall trace to
      perf. So just forget about the nmi buffer there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1258863695-10464-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      28889bf9
    • F
      tracing: Use the perf recursion protection from trace event · ce71b9df
      Frederic Weisbecker 提交于
      When we commit a trace to perf, we first check if we are
      recursing in the same buffer so that we don't mess-up the buffer
      with a recursing trace. But later on, we do the same check from
      perf to avoid commit recursion. The recursion check is desired
      early before we touch the buffer but we want to do this check
      only once.
      
      Then export the recursion protection from perf and use it from
      the trace events before submitting a trace.
      
      v2: Put appropriate Reported-by tag
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1258864015-10579-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce71b9df
  3. 21 11月, 2009 15 次提交
  4. 16 11月, 2009 1 次提交
    • P
      perf_event: Optimize perf_output_lock() · 559fdc3c
      Peter Zijlstra 提交于
      The purpose of perf_output_{un,}lock() is to:
      
       1) avoid publishing incomplete data
          [ possible when publishing a head that is ahead of an entry
            that is still being written ]
      
       2) guarantee fwd progress
          [ a simple refcount on pending writers doesn't need to drop to
            0, making it so would end up implementing something like forced
            quiecent states of RCU ]
      
      To satisfy the above without undue complexity it serializes
      between CPUs, this means that a pending writer can only be the
      same cpu in a nested context, and since (under normal operation)
      a cpu always makes progress we're good -- if the head is only
      published when the bottom  most writer completes.
      
      Now we don't need to disable IRQs in order to serialize between
      CPUs, disabling preemption ought to be sufficient, esp since we
      already deal with nesting due to NMIs.
      
      This avoids potentially expensive (and needless) local IRQ
      disable/enable ops.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1258373161.26714.254.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559fdc3c
  5. 13 11月, 2009 1 次提交
  6. 10 11月, 2009 2 次提交
  7. 08 11月, 2009 7 次提交
    • L
      ksym_tracer: Remove KSYM_SELFTEST_ENTRY · 30ff21e3
      Li Zefan 提交于
      The macro used to be used in both trace_selftest.c and
      trace_ksym.c, but no longer, so remove it from header file.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      30ff21e3
    • F
      hw-breakpoints: Arbitrate access to pmu following registers constraints · ba1c813a
      Frederic Weisbecker 提交于
      Allow or refuse to build a counter using the breakpoints pmu following
      given constraints.
      
      We keep track of the pmu users by using three per cpu variables:
      
      - nr_cpu_bp_pinned stores the number of pinned cpu breakpoints counters
        in the given cpu
      
      - nr_bp_flexible stores the number of non-pinned breakpoints counters
        in the given cpu.
      
      - task_bp_pinned stores the number of pinned task breakpoints in a cpu
      
      The latter is not a simple counter but gathers the number of tasks that
      have n pinned breakpoints.
      Considering HBP_NUM the number of available breakpoint address
      registers:
         task_bp_pinned[0] is the number of tasks having 1 breakpoint
         task_bp_pinned[1] is the number of tasks having 2 breakpoints
         [...]
         task_bp_pinned[HBP_NUM - 1] is the number of tasks having the
         maximum number of registers (HBP_NUM).
      
      When a breakpoint counter is created and wants an access to the pmu,
      we evaluate the following constraints:
      
      == Non-pinned counter ==
      
      - If attached to a single cpu, check:
      
          (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
               + max(per_cpu(task_bp_pinned, cpu)))) < HBP_NUM
      
             -> If there are already non-pinned counters in this cpu, it
                means there is already a free slot for them.
                Otherwise, we check that the maximum number of per task
                breakpoints (for this cpu) plus the number of per cpu
                breakpoint (for this cpu) doesn't cover every registers.
      
      - If attached to every cpus, check:
      
          (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
                 + max(per_cpu(task_bp_pinned, *)))) < HBP_NUM
      
             -> This is roughly the same, except we check the number of per
                cpu bp for every cpu and we keep the max one. Same for the
                per tasks breakpoints.
      
      == Pinned counter ==
      
      - If attached to a single cpu, check:
      
             ((per_cpu(nr_bp_flexible, cpu) > 1)
                  + per_cpu(nr_cpu_bp_pinned, cpu)
                  + max(per_cpu(task_bp_pinned, cpu))) < HBP_NUM
      
             -> Same checks as before. But now the nr_bp_flexible, if any,
                must keep one register at least (or flexible breakpoints will
                never be be fed).
      
      - If attached to every cpus, check:
      
            ((per_cpu(nr_bp_flexible, *) > 1)
                 + max(per_cpu(nr_cpu_bp_pinned, *))
                 + max(per_cpu(task_bp_pinned, *))) < HBP_NUM
      
      Changes in v2:
      
      - Counter -> event rename
      
      Changes in v5:
      
      - Fix unreleased non-pinned task-bound-only counters. We only released
        it in the first cpu. (Thanks to Paul Mackerras for reporting that)
      
      Changes in v6:
      
      - Currently, events scheduling are done in this order: cpu context
        pinned + cpu context non-pinned + task context pinned + task context
        non-pinned events. Then our current constraints are right theoretically
        but not in practice, because non-pinned counters may be scheduled
        before we can apply every possible pinned counters. So consider
        non-pinned counters as pinned for now.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      ba1c813a
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
    • C
      sched: Use root_task_group_empty only with FAIR_GROUP_SCHED · e9036b36
      Cyrill Gorcunov 提交于
      root_task_group_empty is used only with FAIR_GROUP_SCHED
      so if we use other scheduler options we get:
      
        kernel/sched.c:314: warning: 'root_task_group_empty' defined but not used
      
      So move CONFIG_FAIR_GROUP_SCHED up that it covers
      root_task_group_empty().
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20091026192414.GB5321@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e9036b36
    • R
      sched: Fix kernel-doc function parameter name · 968c8645
      Randy Dunlap 提交于
      Fix variable name in sched.c kernel-doc notation.
      
      Fixes this DocBook warning:
      
       Warning(kernel/sched.c:2008): No description found for parameter
       'p' Warning(kernel/sched.c:2008): Excess function parameter 'k'
       description in 'kthread_bind'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <4AF4B1BC.8020604@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      968c8645
    • F
      tracing, perf_events: Protect the buffer from recursion in perf · 444a2a3b
      Frederic Weisbecker 提交于
      While tracing using events with perf, if one enables the
      lockdep:lock_acquire event, it will infect every other perf
      trace events.
      
      Basically, you can enable whatever set of trace events through
      perf but if this event is part of the set, the only result we
      can get is a long list of lock_acquire events of rcu read lock,
      and only that.
      
      This is because of a recursion inside perf.
      
      1) When a trace event is triggered, it will fill a per cpu
         buffer and submit it to perf.
      
      2) Perf will commit this event but will also protect some data
         using rcu_read_lock
      
      3) A recursion appears: rcu_read_lock triggers a lock_acquire
         event that will fill the per cpu event and then submit the
         buffer to perf.
      
      4) Perf detects a recursion and ignores it
      
      5) Perf continues its work on the previous event, but its buffer
         has been overwritten by the lock_acquire event, it has then
         been turned into a lock_acquire event of rcu read lock
      
      Such scenario also happens with lock_release with
      rcu_read_unlock().
      
      We could turn the rcu_read_lock() into __rcu_read_lock() to drop
      the lock debugging from perf fast path, but that would make us
      lose the rcu debugging and that doesn't prevent from other
      possible kind of recursion from perf in the future.
      
      This patch adds a recursion protection based on a counter on the
      perf trace per cpu buffers to solve the problem.
      
      -v2: Fixed lost whitespace, added reviewed-by tag
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      444a2a3b
    • Y
      genirq: try_one_irq() must be called with irq disabled · e7e7e0c0
      Yong Zhang 提交于
      Prarit reported:
      =================================
      [ INFO: inconsistent lock state ]
      2.6.32-rc5 #1
      ---------------------------------
      inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&irq_desc_lock_class){?.-...}, at: [<ffffffff810c264e>] try_one_irq+0x32/0x138
      {IN-HARDIRQ-W} state was registered at:
       [<ffffffff81095160>] __lock_acquire+0x2fc/0xd5d
       [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
       [<ffffffff814cdadd>] _spin_lock+0x40/0x89
       [<ffffffff810c3389>] handle_level_irq+0x30/0x105
       [<ffffffff81014e0e>] handle_irq+0x95/0xb7
       [<ffffffff810141bd>] do_IRQ+0x6a/0xe0
       [<ffffffff81012813>] ret_from_intr+0x0/0x16
      irq event stamp: 195096
      hardirqs last  enabled at (195096): [<ffffffff814cd7f7>] _spin_unlock_irq+0x3a/0x5c
      hardirqs last disabled at (195095): [<ffffffff814cdbdd>] _spin_lock_irq+0x29/0x95
      softirqs last  enabled at (195088): [<ffffffff81068c92>] __do_softirq+0x1c1/0x1ef
      softirqs last disabled at (195093): [<ffffffff8101304c>] call_softirq+0x1c/0x30
      
      other info that might help us debug this:
      1 lock held by swapper/0:
       #0:  (kernel/irq/spurious.c:21){+.-...}, at: [<ffffffff81070cf2>]
      run_timer_softirq+0x1a9/0x315
      
      stack backtrace:
      Pid: 0, comm: swapper Not tainted 2.6.32-rc5 #1
      Call Trace:
       <IRQ>  [<ffffffff81093e94>] valid_state+0x187/0x1ae
       [<ffffffff81093fe4>] mark_lock+0x129/0x253
       [<ffffffff810951d4>] __lock_acquire+0x370/0xd5d
       [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
       [<ffffffff814cdadd>] _spin_lock+0x40/0x89
       [<ffffffff810c264e>] try_one_irq+0x32/0x138
       [<ffffffff810c2795>] poll_all_shared_irqs+0x41/0x6d
       [<ffffffff810c27dd>] poll_spurious_irqs+0x1c/0x49
       [<ffffffff81070d82>] run_timer_softirq+0x239/0x315
       [<ffffffff81068bd3>] __do_softirq+0x102/0x1ef
       [<ffffffff8101304c>] call_softirq+0x1c/0x30
       [<ffffffff81014b65>] do_softirq+0x59/0xca
       [<ffffffff810686ad>] irq_exit+0x58/0xae
       [<ffffffff81029b84>] smp_apic_timer_interrupt+0x94/0xba
       [<ffffffff81012a33>] apic_timer_interrupt+0x13/0x20
      
      The reason is that try_one_irq() is called from hardirq context with
      interrupts disabled and from softirq context (poll_all_shared_irqs())
      with interrupts enabled.
      
      Disable interrupts before calling it from poll_all_shared_irqs().
      Reported-and-tested-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
      LKML-Reference: <1257563773-4620-1-git-send-email-yong.zhang0@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7e7e0c0
  8. 04 11月, 2009 3 次提交