1. 04 4月, 2010 1 次提交
    • F
      perf: Fetch hot regs from the template caller · 6cc8a7c1
      Frederic Weisbecker 提交于
      Trace events can be defined from a template using
      DECLARE_EVENT_CLASS/DEFINE_EVENT or directly with TRACE_EVENT.
      
      In both cases we have a template tracepoint handler, used to
      record the trace, to which we pass our ftrace event instance.
      
      In the function level, if the class is named "foo" and the event
      is named "blah", we have the following chain of calls:
      
      perf_trace_blah() -> perf_trace_templ_foo()
      
      In the case we have several events sharing the class "blah",
      we'll have multiple users of perf_trace_templ_foo(), and it
      won't be inlined by the compiler. This is usually what happens
      with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition.
      
      But if perf_trace_blah() is the only caller of perf_trace_templ_foo()
      there are fair chances that it will be inlined.
      
      The problem is that we fetch the regs from perf_trace_templ_foo()
      after we rewinded the frame pointer to the second caller, we want
      to reach the caller of perf_trace_blah() to get the right source
      of the event. And we do this by always assuming that
      perf_trace_templ_foo() is not inlined. But as shown above this
      is not always true. And if it is inlined we miss the first caller,
      losing the most important level of precision.
      
      We get:
      	    61.31%       ls  [kernel.kallsyms]  [k] do_softirq
                               |
                               --- do_softirq
                                   irq_exit
                                   do_IRQ
                                   common_interrupt
                                  |
                                  |--25.00%-- tty_buffer_request_room
      
      Instead of:
      	    61.31%       ls  [kernel.kallsyms]  [k] __do_softirq
                               |
                               --- __do_softirq
                                   do_softirq
                                   irq_exit
                                   do_IRQ
                                   common_interrupt
                                  |
                                  |--25.00%-- tty_buffer_request_room
      
      To fix this, we fetch the regs from perf_trace_blah() rather than
      perf_trace_templ_foo() so that we don't have to deal with inlining
      surprises.
      
      That also bring us the advantage of having the true source of the
      event even if we don't have frame pointers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      6cc8a7c1
  2. 10 3月, 2010 2 次提交
    • F
      perf: Drop the obsolete profile naming for trace events · 97d5a220
      Frederic Weisbecker 提交于
      Drop the obsolete "profile" naming used by perf for trace events.
      Perf can now do more than simple events counting, so generalize
      the API naming.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      97d5a220
    • F
      perf: Take a hot regs snapshot for trace events · c530665c
      Frederic Weisbecker 提交于
      We are taking a wrong regs snapshot when a trace event triggers.
      Either we use get_irq_regs(), which gives us the interrupted
      registers if we are in an interrupt, or we use task_pt_regs()
      which gives us the state before we entered the kernel, assuming
      we are lucky enough to be no kernel thread, in which case
      task_pt_regs() returns the initial set of regs when the kernel
      thread was started.
      
      What we want is different. We need a hot snapshot of the regs,
      so that we can get the instruction pointer to record in the
      sample, the frame pointer for the callchain, and some other
      things.
      
      Let's use the new perf_fetch_caller_regs() for that.
      
      Comparison with perf record -e lock: -R -a -f -g
      Before:
      
              perf  [kernel]                   [k] __do_softirq
                     |
                     --- __do_softirq
                        |
                        |--55.16%-- __open
                        |
                         --44.84%-- __write_nocancel
      
      After:
      
                  perf  [kernel]           [k] perf_tp_event
                     |
                     --- perf_tp_event
                        |
                        |--41.07%-- lock_acquire
                        |          |
                        |          |--39.36%-- _raw_spin_lock
                        |          |          |
                        |          |          |--7.81%-- hrtimer_interrupt
                        |          |          |          smp_apic_timer_interrupt
                        |          |          |          apic_timer_interrupt
      
      The old case was producing unreliable callchains. Now having
      right frame and instruction pointers, we have the trace we
      want.
      
      Also syscalls and kprobe events already have the right regs,
      let's use them instead of wasting a retrieval.
      
      v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      c530665c
  3. 04 3月, 2010 1 次提交
  4. 25 2月, 2010 1 次提交
    • J
      tracing: Fix ftrace_event_call alignment for use with gcc 4.5 · 86c38a31
      Jeff Mahoney 提交于
      GCC 4.5 introduces behavior that forces the alignment of structures to
       use the largest possible value. The default value is 32 bytes, so if
       some structures are defined with a 4-byte alignment and others aren't
       declared with an alignment constraint at all - it will align at 32-bytes.
      
       For things like the ftrace events, this results in a non-standard array.
       When initializing the ftrace subsystem, we traverse the _ftrace_events
       section and call the initialization callback for each event. When the
       structures are misaligned, we could be treating another part of the
       structure (or the zeroed out space between them) as a function pointer.
      
       This patch forces the alignment for all the ftrace_event_call structures
       to 4 bytes.
      
       Without this patch, the kernel fails to boot very early when built with
       gcc 4.5.
      
       It's trivial to check the alignment of the members of the array, so it
       might be worthwhile to add something to the build system to do that
       automatically. Unfortunately, that only covers this case. I've asked one
       of the gcc developers about adding a warning when this condition is seen.
      
      Cc: stable@kernel.org
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      LKML-Reference: <4B85770B.6010901@suse.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      86c38a31
  5. 16 2月, 2010 1 次提交
    • S
      tracing: Add notrace to TRACE_EVENT implementation functions · 83f0d539
      Steven Rostedt 提交于
      The functions used to implement the TRACE_EVENT macro show up in
      function tracing. This is considered a distraction, and these should
      not be displayed. For example:
      
           <idle>-0     [000]    57.202149: task_of <-update_stats_wait_end
           <idle>-0     [000]    57.202149: ftrace_raw_event_sched_stat_wait <-update_stats_wait_end
           <idle>-0     [000]    57.202150: ftrace_raw_event_id_sched_stat_template <-ftrace_raw_event_sched_stat_wait
           <idle>-0     [000]    57.202150: sched_stat_wait: comm=sshd pid=2735 delay=19207 [ns]
      
      The "ftrace_raw_event_*" traces are just the utility functions used
      by TRACE_EVENT tracepoints.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Requested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      83f0d539
  6. 29 1月, 2010 1 次提交
    • X
      perf: Factorize trace events raw sample buffer operations · 430ad5a6
      Xiao Guangrong 提交于
      Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
      gather the common code that operates on raw events sampling buffer.
      This cleans up redundant code between regular trace events, syscall
      events and kprobe events.
      
      Changelog v1->v2:
      - Rename function name as per Masami and Frederic's suggestion
      - Add __kprobes for ftrace_perf_buf_prepare() and make
        ftrace_perf_buf_submit() inline as per Masami's suggestion
      - Export ftrace_perf_buf_prepare since modules will use it
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4B60E92D.9000808@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      430ad5a6
  7. 07 1月, 2010 2 次提交
    • L
      tracing: Remove show_format and related macros from TRACE_EVENT · 0fa0edaf
      Lai Jiangshan 提交于
      The previous patches added the use of print_fmt string and changes
      the trace_define_field() function to also create the fields and
      format output for the event format files.
      
         text	   data	    bss	    dec	    hex	filename
      5857201	1355780	9336808	16549789	 fc879d	vmlinux
      5884589	1351684	9337896	16574169	 fce6d9	vmlinux-orig
      
      The above shows the size of the vmlinux after this patch set
      compared to the vmlinux-orig which is before the patch set.
      
      This saves us 27k on text, 1k on bss and adds just 4k of data.
      
      The total savings of 24k in size.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B273D4D.40604@cn.fujitsu.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0fa0edaf
    • L
      tracing: Add print_fmt field · 509e760c
      Lai Jiangshan 提交于
      This is part of a patch set that removes the show_format method
      in the ftrace event macros.
      
      The print_fmt field is added to hold the string that shows
      the print_fmt in the event format files. This patch only adds
      the field but it is currently not used. Later patches will use
      this field to enable us to remove the show_format field
      and function.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B273D3E.2000704@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      509e760c
  8. 30 12月, 2009 1 次提交
  9. 28 12月, 2009 1 次提交
    • L
      perf events: Remove CONFIG_EVENT_PROFILE · 07b139c8
      Li Zefan 提交于
      Quoted from Ingo:
      
      | This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE -
      | it's an unnecessary Kconfig complication. If both PERF_EVENTS and
      | EVENT_TRACING is enabled we should expose generic tracepoints.
      |
      | Nor is it limited to event 'profiling', so it has become a misnomer as
      | well.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07b139c8
  10. 14 12月, 2009 4 次提交
  11. 13 12月, 2009 1 次提交
  12. 26 11月, 2009 1 次提交
    • I
      events: Rename TRACE_EVENT_TEMPLATE() to DECLARE_EVENT_CLASS() · 091ad365
      Ingo Molnar 提交于
      It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
      does: does it define an event as well beyond defining a template?
      
      To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
      the various 'DECLARE_*()' idioms we already have in the kernel:
      
        DECLARE_EVENT_CLASS(class)
      
          DEFINE_EVENT(class, event1)
          DEFINE_EVENT(class, event2)
          DEFINE_EVENT(class, event3)
      
      To complete this logic we should also rename TRACE_EVENT() to:
      
        DEFINE_SINGLE_EVENT(single_event)
      
      ... but in a more quiet moment of the kernel cycle.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      091ad365
  13. 25 11月, 2009 2 次提交
    • S
      tracing: Create new DEFINE_EVENT_PRINT · e5bc9721
      Steven Rostedt 提交于
      After creating the TRACE_EVENT_TEMPLATE I started to look at other
      trace points to see what duplication was made. I noticed that there
      are several trace points where they are almost identical except for
      the name and the output format. Since TRACE_EVENT_TEMPLATE was successful
      in bringing down the size of trace events, I added a DEFINE_EVENT_PRINT.
      
      DEFINE_EVENT_PRINT is used just like DEFINE_EVENT is. That is, the
      DEFINE_EVENT_PRINT also uses a TRACE_EVENT_TEMPLATE, but it allows the
      developer to overwrite the print format. If there are two or more
      TRACE_EVENTS that are identical except for the name and print, then
      they can be converted to use a TRACE_EVENT_TEMPLATE. Since the
      TRACE_EVENT_TEMPLATE already does the print output, the first trace event
      would have its print format held in the TRACE_EVENT_TEMPLATE and
      be defined with a DEFINE_EVENT. The rest will use the DEFINE_EVENT_PRINT
      and override the print format.
      
      Converting the sched trace points to both DEFINE_EVENT and
      DEFINE_EVENT_PRINT. Five were converted to DEFINE_EVENT and two were
      converted to DEFINE_EVENT_PRINT.
      
      I was able to get the following:
      
      $ size kernel/sched.o-*
         text	   data	    bss	    dec	    hex	filename
        79299	   6776	   2520	  88595	  15a13	kernel/sched.o-notrace
       101941	  11896	   2584	 116421	  1c6c5	kernel/sched.o-templ
       104779	  11896	   2584	 119259	  1d1db	kernel/sched.o-trace
      
      sched.o-notrace is the scheduler compiled with no trace points.
      sched.o-templ is with the use of DEFINE_EVENT and DEFINE_EVENT_PRINT
      sched.o-trace is the current trace events.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e5bc9721
    • S
      tracing: Create new TRACE_EVENT_TEMPLATE · ff038f5c
      Steven Rostedt 提交于
      There are some places in the kernel that define several tracepoints and
      they are all identical besides the name. The code to enable, disable and
      record is created for every trace point even if most of the code is
      identical.
      
      This patch adds TRACE_EVENT_TEMPLATE that lets the developer create
      a template TRACE_EVENT and create trace points with DEFINE_EVENT, which
      is based off of a given template. Each trace point used by this
      will share most of the code, and bring down the size of the kernel
      when there are several duplicate events.
      
      Usage is:
      
      TRACE_EVENT_TEMPLATE(name, proto, args, tstruct, assign, print);
      
      Which would be the same as defining a normal TRACE_EVENT.
      
      To create the trace events that the trace points will use:
      
      DEFINE_EVENT(template, name, proto, args) is done. The template
      is the name of the TRACE_EVENT_TEMPLATE to use. The name is the
      name of the trace point. The parameters proto and args must be the same
      as the proto and args of the template. If they are not the same,
      then a compile error will result. I tried hard removing this duplication
      but the C preprocessor is not powerful enough (or my CPP magic
      experience points is not at a high enough level) to not need them.
      
      A lot of trace events are coming in with new XFS development. Most of
      the trace points are identical except for the name. The following shows
      the advantage of having TRACE_EVENT_TEMPLATE:
      
      $ size fs/xfs/xfs.o.*
          text          data     bss     dec     hex filename
        452114          2788    3520  458422   6feb6 fs/xfs/xfs.o.old
        638482         38116    3744  680342   a6196 fs/xfs/xfs.o.template
        996954         38116    4480 1039550   fdcbe fs/xfs/xfs.o.trace
      
      xfs.o.old is without any tracepoints.
      xfs.o.template uses the new TRACE_EVENT_TEMPLATE.
      xfs.o.trace uses the current TRACE_EVENT macros.
      Requested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff038f5c
  14. 23 11月, 2009 1 次提交
  15. 22 11月, 2009 1 次提交
    • F
      tracing: Use the perf recursion protection from trace event · ce71b9df
      Frederic Weisbecker 提交于
      When we commit a trace to perf, we first check if we are
      recursing in the same buffer so that we don't mess-up the buffer
      with a recursing trace. But later on, we do the same check from
      perf to avoid commit recursion. The recursion check is desired
      early before we touch the buffer but we want to do this check
      only once.
      
      Then export the recursion protection from perf and use it from
      the trace events before submitting a trace.
      
      v2: Put appropriate Reported-by tag
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1258864015-10579-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce71b9df
  16. 14 11月, 2009 1 次提交
    • J
      tracing: Fix event format export · 811cb50b
      Johannes Berg 提交于
      For some reason the export of the event print format to userspace
      uses '#fmt' which breaks if the format string is anything but a plain
      string, for example if it is built with macros then the macro names
      are exported instead of their contents.
      
      Use
      	"\"%s\"", fmt
      instead of
      	"%s", #fmt
      to export the string and not the way it is built.
      
      For example, in net/mac80211/driver-trace.h for the trace event drv_start
      there is:
      
              TP_printk(
                      LOCAL_PR_FMT, LOCAL_PR_ARG
              )
      
      Which use to produce:
      
       print fmt: LOCAL_PR_FMT, REC->wiphy_name
      
      Now produces:
      
       print fmt: "%s", REC->wiphy_name
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      LKML-Reference: <20091113224009.GB23942@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      811cb50b
  17. 08 11月, 2009 1 次提交
    • F
      tracing, perf_events: Protect the buffer from recursion in perf · 444a2a3b
      Frederic Weisbecker 提交于
      While tracing using events with perf, if one enables the
      lockdep:lock_acquire event, it will infect every other perf
      trace events.
      
      Basically, you can enable whatever set of trace events through
      perf but if this event is part of the set, the only result we
      can get is a long list of lock_acquire events of rcu read lock,
      and only that.
      
      This is because of a recursion inside perf.
      
      1) When a trace event is triggered, it will fill a per cpu
         buffer and submit it to perf.
      
      2) Perf will commit this event but will also protect some data
         using rcu_read_lock
      
      3) A recursion appears: rcu_read_lock triggers a lock_acquire
         event that will fill the per cpu event and then submit the
         buffer to perf.
      
      4) Perf detects a recursion and ignores it
      
      5) Perf continues its work on the previous event, but its buffer
         has been overwritten by the lock_acquire event, it has then
         been turned into a lock_acquire event of rcu read lock
      
      Such scenario also happens with lock_release with
      rcu_read_unlock().
      
      We could turn the rcu_read_lock() into __rcu_read_lock() to drop
      the lock debugging from perf fast path, but that would make us
      lose the rcu debugging and that doesn't prevent from other
      possible kind of recursion from perf in the future.
      
      This patch adds a recursion protection based on a counter on the
      perf trace per cpu buffers to solve the problem.
      
      -v2: Fixed lost whitespace, added reviewed-by tag
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      444a2a3b
  18. 06 10月, 2009 1 次提交
  19. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  20. 18 9月, 2009 2 次提交
    • F
      tracing: Allocate the ftrace event profile buffer dynamically · 20ab4425
      Frederic Weisbecker 提交于
      Currently the trace event profile buffer is allocated in the stack. But
      this may be too much for the stack, as the events can have large
      statically defined field size and can also grow with dynamic arrays.
      
      Allocate two per cpu buffer for all profiled events. The first cpu
      buffer is used to host every non-nmi context traces. It is protected
      by disabling the interrupts while writing and committing the trace.
      
      The second buffer is reserved for nmi. So that there is no race between
      them and the first buffer.
      
      The whole write/commit section is rcu protected because we release
      these buffers while deactivating the last profiling trace event.
      
      v2: Move the buffers from trace_event to be global, as pointed by
          Steven Rostedt.
      
      v3: Fix the syscall events to handle the profiling buffer races
          by disabling interrupts, now that the buffers are globals.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      20ab4425
    • F
      tracing: Factorize the events profile accounting · e5e25cf4
      Frederic Weisbecker 提交于
      Factorize the events enabling accounting in a common tracing core
      helper. This reduces the size of the profile_enable() and
      profile_disable() callbacks for each trace events.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      e5e25cf4
  21. 14 9月, 2009 2 次提交
  22. 05 9月, 2009 1 次提交
    • S
      tracing: pass around ring buffer instead of tracer · e77405ad
      Steven Rostedt 提交于
      The latency tracers (irqsoff and wakeup) can swap trace buffers
      on the fly. If an event is happening and has reserved data on one of
      the buffers, and the latency tracer swaps the global buffer with the
      max buffer, the result is that the event may commit the data to the
      wrong buffer.
      
      This patch changes the API to the trace recording to be recieve the
      buffer that was used to reserve a commit. Then this buffer can be passed
      in to the commit.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e77405ad
  23. 31 8月, 2009 1 次提交
    • L
      tracing/filters: Defer pred allocation · 8e254c1d
      Li Zefan 提交于
      init_preds() allocates about 5392 bytes of memory (on x86_32) for
      a TRACE_EVENT. With my config, at system boot total memory occupied
      is:
      
      	5392 * (642 + 15) == 3459KB
      
      642 == cat available_events | wc -l
      15 == number of dirs in events/ftrace
      
      That's quite a lot, so we'd better defer memory allocation util
      it's needed, that's when filter is used.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4A9B8EA5.6020700@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8e254c1d
  24. 28 8月, 2009 1 次提交
    • F
      tracing: Fix double CPP substitution in TRACE_EVENT_FN · 0dd7b747
      Frederic Weisbecker 提交于
      TRACE_EVENT_FN relays on TRACE_EVENT by reprocessing its parameters
      into the ftrace events CPP macro. This leads to a double substitution
      in some cases.
      
      For example, a bad consequence is a format always prefixed by
      "%s, %s\n" for every TRACE_EVENT_FN based events.
      
      Eg:
      	cat /debug/tracing/events/syscalls/sys_enter/format
      	[...]
      	print fmt: "%s, %s\n", "\"NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)\"",\
      	"REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3],\
      	REC->args[4], REC->args[5]"
      
      This creates a failure in post-processing tools such as perf trace or
      trace-cmd.
      
      Then drop this double substitution and replace it by a new __cpparg()
      macro that relays CPP arguments containing commas.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Josh Stone <jistone@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1251413406-6704-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0dd7b747
  25. 27 8月, 2009 1 次提交
    • M
      tracing: Ftrace dynamic ftrace_event_call support · bd1a5c84
      Masami Hiramatsu 提交于
      Add dynamic ftrace_event_call support to ftrace. Trace engines can add
      new ftrace_event_call to ftrace on the fly. Each operator function of
      the call takes an ftrace_event_call data structure as an argument,
      because these functions may be shared among several ftrace_event_calls.
      
      Changes from v13:
       - Define remove_subsystem_dir() always (revirt a2ca5e03), because
         trace_remove_event_call() uses it.
       - Modify syscall tracer because of ftrace_event_call change.
      
      [fweisbec@gmail.com: Fixed conflict against latest tracing/core]
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      LKML-Reference: <20090813203453.31965.71901.stgit@localhost.localdomain>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      bd1a5c84
  26. 26 8月, 2009 2 次提交
    • L
      tracing/filters: Add __field_ext() to TRACE_EVENT · 43b51ead
      Li Zefan 提交于
      Add __field_ext(), so a field can be assigned to a specific
      filter_type, which matches a corresponding filter function.
      
      For example, a later patch will allow this:
      	__field_ext(const char *, str, FILTER_PTR_STR);
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A7B9272.60507095@cn.fujitsu.com>
      
      [
        Fixed a -1 to FILTER_OTHER
        Forward ported to latest kernel.
      ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43b51ead
    • J
      tracing: Move tracepoint callbacks from declaration to definition · 97419875
      Josh Stone 提交于
      It's not strictly correct for the tracepoint reg/unreg callbacks to
      occur when a client is hooking up, because the actual tracepoint may not
      be present yet.  This happens to be fine for syscall, since that's in
      the core kernel, but it would cause problems for tracepoints defined in
      a module that hasn't been loaded yet.  It also means the reg/unreg has
      to be EXPORTed for any modules to use the tracepoint (as in SystemTap).
      
      This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
      DEFINE_TRACE_FN which stores the callbacks in struct tracepoint.  The
      callbacks are used now when the active state of the tracepoint changes
      in set_tracepoint & disable_tracepoint.
      
      This also introduces TRACE_EVENT_FN, so ftrace events can also provide
      registration callbacks if needed.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      97419875
  27. 19 8月, 2009 2 次提交
  28. 12 8月, 2009 2 次提交
    • F
      tracing: Add ftrace event call parameter to its field descriptor handler · e8f9f4d7
      Frederic Weisbecker 提交于
      Add the struct ftrace_event_call as a parameter of its show_format()
      callback. This way we can use it from the syscall trace events to
      retrieve the syscall name from the ftrace event call parameter and
      describe its fields using the syscalls metadata.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      e8f9f4d7
    • J
      tracing: Add ftrace_event_call void * 'data' field · 69fd4f0e
      Jason Baron 提交于
      add an optional void * pointer to 'ftrace_event_call' that is
      passed in for regfunc and unregfunc.
      
      This prepares for syscall tracepoints creation by passing the name of
      the syscall we want to trace and then retrieve its number through our
      arch syscall table.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      69fd4f0e
  29. 10 8月, 2009 1 次提交
    • F
      perf_counter: Zero dead bytes from ftrace raw samples size alignment · 1853db0e
      Frederic Weisbecker 提交于
      After aligning the ftrace raw samples, there are dead bytes storing
      random data from the stack. We don't want to leak these to userspace,
      then zero these out.
      
      Before:
      
      	0x2de88 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00  h...h...,......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f  h...@.F........
                                                            ^  ^  ^  ^
                                                               Leak
      
      After:
      
      	0x2d318 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00  h...h...h......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00  h.....F........
                                                            ^  ^  ^  ^
      							 Fixed
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      1853db0e