1. 13 12月, 2009 1 次提交
  2. 26 11月, 2009 1 次提交
    • I
      events: Rename TRACE_EVENT_TEMPLATE() to DECLARE_EVENT_CLASS() · 091ad365
      Ingo Molnar 提交于
      It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
      does: does it define an event as well beyond defining a template?
      
      To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
      the various 'DECLARE_*()' idioms we already have in the kernel:
      
        DECLARE_EVENT_CLASS(class)
      
          DEFINE_EVENT(class, event1)
          DEFINE_EVENT(class, event2)
          DEFINE_EVENT(class, event3)
      
      To complete this logic we should also rename TRACE_EVENT() to:
      
        DEFINE_SINGLE_EVENT(single_event)
      
      ... but in a more quiet moment of the kernel cycle.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      091ad365
  3. 25 11月, 2009 2 次提交
    • S
      tracing: Create new DEFINE_EVENT_PRINT · e5bc9721
      Steven Rostedt 提交于
      After creating the TRACE_EVENT_TEMPLATE I started to look at other
      trace points to see what duplication was made. I noticed that there
      are several trace points where they are almost identical except for
      the name and the output format. Since TRACE_EVENT_TEMPLATE was successful
      in bringing down the size of trace events, I added a DEFINE_EVENT_PRINT.
      
      DEFINE_EVENT_PRINT is used just like DEFINE_EVENT is. That is, the
      DEFINE_EVENT_PRINT also uses a TRACE_EVENT_TEMPLATE, but it allows the
      developer to overwrite the print format. If there are two or more
      TRACE_EVENTS that are identical except for the name and print, then
      they can be converted to use a TRACE_EVENT_TEMPLATE. Since the
      TRACE_EVENT_TEMPLATE already does the print output, the first trace event
      would have its print format held in the TRACE_EVENT_TEMPLATE and
      be defined with a DEFINE_EVENT. The rest will use the DEFINE_EVENT_PRINT
      and override the print format.
      
      Converting the sched trace points to both DEFINE_EVENT and
      DEFINE_EVENT_PRINT. Five were converted to DEFINE_EVENT and two were
      converted to DEFINE_EVENT_PRINT.
      
      I was able to get the following:
      
      $ size kernel/sched.o-*
         text	   data	    bss	    dec	    hex	filename
        79299	   6776	   2520	  88595	  15a13	kernel/sched.o-notrace
       101941	  11896	   2584	 116421	  1c6c5	kernel/sched.o-templ
       104779	  11896	   2584	 119259	  1d1db	kernel/sched.o-trace
      
      sched.o-notrace is the scheduler compiled with no trace points.
      sched.o-templ is with the use of DEFINE_EVENT and DEFINE_EVENT_PRINT
      sched.o-trace is the current trace events.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e5bc9721
    • S
      tracing: Create new TRACE_EVENT_TEMPLATE · ff038f5c
      Steven Rostedt 提交于
      There are some places in the kernel that define several tracepoints and
      they are all identical besides the name. The code to enable, disable and
      record is created for every trace point even if most of the code is
      identical.
      
      This patch adds TRACE_EVENT_TEMPLATE that lets the developer create
      a template TRACE_EVENT and create trace points with DEFINE_EVENT, which
      is based off of a given template. Each trace point used by this
      will share most of the code, and bring down the size of the kernel
      when there are several duplicate events.
      
      Usage is:
      
      TRACE_EVENT_TEMPLATE(name, proto, args, tstruct, assign, print);
      
      Which would be the same as defining a normal TRACE_EVENT.
      
      To create the trace events that the trace points will use:
      
      DEFINE_EVENT(template, name, proto, args) is done. The template
      is the name of the TRACE_EVENT_TEMPLATE to use. The name is the
      name of the trace point. The parameters proto and args must be the same
      as the proto and args of the template. If they are not the same,
      then a compile error will result. I tried hard removing this duplication
      but the C preprocessor is not powerful enough (or my CPP magic
      experience points is not at a high enough level) to not need them.
      
      A lot of trace events are coming in with new XFS development. Most of
      the trace points are identical except for the name. The following shows
      the advantage of having TRACE_EVENT_TEMPLATE:
      
      $ size fs/xfs/xfs.o.*
          text          data     bss     dec     hex filename
        452114          2788    3520  458422   6feb6 fs/xfs/xfs.o.old
        638482         38116    3744  680342   a6196 fs/xfs/xfs.o.template
        996954         38116    4480 1039550   fdcbe fs/xfs/xfs.o.trace
      
      xfs.o.old is without any tracepoints.
      xfs.o.template uses the new TRACE_EVENT_TEMPLATE.
      xfs.o.trace uses the current TRACE_EVENT macros.
      Requested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff038f5c
  4. 23 11月, 2009 1 次提交
  5. 22 11月, 2009 1 次提交
    • F
      tracing: Use the perf recursion protection from trace event · ce71b9df
      Frederic Weisbecker 提交于
      When we commit a trace to perf, we first check if we are
      recursing in the same buffer so that we don't mess-up the buffer
      with a recursing trace. But later on, we do the same check from
      perf to avoid commit recursion. The recursion check is desired
      early before we touch the buffer but we want to do this check
      only once.
      
      Then export the recursion protection from perf and use it from
      the trace events before submitting a trace.
      
      v2: Put appropriate Reported-by tag
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1258864015-10579-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce71b9df
  6. 14 11月, 2009 1 次提交
    • J
      tracing: Fix event format export · 811cb50b
      Johannes Berg 提交于
      For some reason the export of the event print format to userspace
      uses '#fmt' which breaks if the format string is anything but a plain
      string, for example if it is built with macros then the macro names
      are exported instead of their contents.
      
      Use
      	"\"%s\"", fmt
      instead of
      	"%s", #fmt
      to export the string and not the way it is built.
      
      For example, in net/mac80211/driver-trace.h for the trace event drv_start
      there is:
      
              TP_printk(
                      LOCAL_PR_FMT, LOCAL_PR_ARG
              )
      
      Which use to produce:
      
       print fmt: LOCAL_PR_FMT, REC->wiphy_name
      
      Now produces:
      
       print fmt: "%s", REC->wiphy_name
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      LKML-Reference: <20091113224009.GB23942@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      811cb50b
  7. 08 11月, 2009 1 次提交
    • F
      tracing, perf_events: Protect the buffer from recursion in perf · 444a2a3b
      Frederic Weisbecker 提交于
      While tracing using events with perf, if one enables the
      lockdep:lock_acquire event, it will infect every other perf
      trace events.
      
      Basically, you can enable whatever set of trace events through
      perf but if this event is part of the set, the only result we
      can get is a long list of lock_acquire events of rcu read lock,
      and only that.
      
      This is because of a recursion inside perf.
      
      1) When a trace event is triggered, it will fill a per cpu
         buffer and submit it to perf.
      
      2) Perf will commit this event but will also protect some data
         using rcu_read_lock
      
      3) A recursion appears: rcu_read_lock triggers a lock_acquire
         event that will fill the per cpu event and then submit the
         buffer to perf.
      
      4) Perf detects a recursion and ignores it
      
      5) Perf continues its work on the previous event, but its buffer
         has been overwritten by the lock_acquire event, it has then
         been turned into a lock_acquire event of rcu read lock
      
      Such scenario also happens with lock_release with
      rcu_read_unlock().
      
      We could turn the rcu_read_lock() into __rcu_read_lock() to drop
      the lock debugging from perf fast path, but that would make us
      lose the rcu debugging and that doesn't prevent from other
      possible kind of recursion from perf in the future.
      
      This patch adds a recursion protection based on a counter on the
      perf trace per cpu buffers to solve the problem.
      
      -v2: Fixed lost whitespace, added reviewed-by tag
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      444a2a3b
  8. 06 10月, 2009 1 次提交
  9. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  10. 18 9月, 2009 2 次提交
    • F
      tracing: Allocate the ftrace event profile buffer dynamically · 20ab4425
      Frederic Weisbecker 提交于
      Currently the trace event profile buffer is allocated in the stack. But
      this may be too much for the stack, as the events can have large
      statically defined field size and can also grow with dynamic arrays.
      
      Allocate two per cpu buffer for all profiled events. The first cpu
      buffer is used to host every non-nmi context traces. It is protected
      by disabling the interrupts while writing and committing the trace.
      
      The second buffer is reserved for nmi. So that there is no race between
      them and the first buffer.
      
      The whole write/commit section is rcu protected because we release
      these buffers while deactivating the last profiling trace event.
      
      v2: Move the buffers from trace_event to be global, as pointed by
          Steven Rostedt.
      
      v3: Fix the syscall events to handle the profiling buffer races
          by disabling interrupts, now that the buffers are globals.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      20ab4425
    • F
      tracing: Factorize the events profile accounting · e5e25cf4
      Frederic Weisbecker 提交于
      Factorize the events enabling accounting in a common tracing core
      helper. This reduces the size of the profile_enable() and
      profile_disable() callbacks for each trace events.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      e5e25cf4
  11. 14 9月, 2009 2 次提交
  12. 05 9月, 2009 1 次提交
    • S
      tracing: pass around ring buffer instead of tracer · e77405ad
      Steven Rostedt 提交于
      The latency tracers (irqsoff and wakeup) can swap trace buffers
      on the fly. If an event is happening and has reserved data on one of
      the buffers, and the latency tracer swaps the global buffer with the
      max buffer, the result is that the event may commit the data to the
      wrong buffer.
      
      This patch changes the API to the trace recording to be recieve the
      buffer that was used to reserve a commit. Then this buffer can be passed
      in to the commit.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e77405ad
  13. 31 8月, 2009 1 次提交
    • L
      tracing/filters: Defer pred allocation · 8e254c1d
      Li Zefan 提交于
      init_preds() allocates about 5392 bytes of memory (on x86_32) for
      a TRACE_EVENT. With my config, at system boot total memory occupied
      is:
      
      	5392 * (642 + 15) == 3459KB
      
      642 == cat available_events | wc -l
      15 == number of dirs in events/ftrace
      
      That's quite a lot, so we'd better defer memory allocation util
      it's needed, that's when filter is used.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4A9B8EA5.6020700@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8e254c1d
  14. 28 8月, 2009 1 次提交
    • F
      tracing: Fix double CPP substitution in TRACE_EVENT_FN · 0dd7b747
      Frederic Weisbecker 提交于
      TRACE_EVENT_FN relays on TRACE_EVENT by reprocessing its parameters
      into the ftrace events CPP macro. This leads to a double substitution
      in some cases.
      
      For example, a bad consequence is a format always prefixed by
      "%s, %s\n" for every TRACE_EVENT_FN based events.
      
      Eg:
      	cat /debug/tracing/events/syscalls/sys_enter/format
      	[...]
      	print fmt: "%s, %s\n", "\"NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)\"",\
      	"REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3],\
      	REC->args[4], REC->args[5]"
      
      This creates a failure in post-processing tools such as perf trace or
      trace-cmd.
      
      Then drop this double substitution and replace it by a new __cpparg()
      macro that relays CPP arguments containing commas.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Josh Stone <jistone@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1251413406-6704-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0dd7b747
  15. 27 8月, 2009 1 次提交
    • M
      tracing: Ftrace dynamic ftrace_event_call support · bd1a5c84
      Masami Hiramatsu 提交于
      Add dynamic ftrace_event_call support to ftrace. Trace engines can add
      new ftrace_event_call to ftrace on the fly. Each operator function of
      the call takes an ftrace_event_call data structure as an argument,
      because these functions may be shared among several ftrace_event_calls.
      
      Changes from v13:
       - Define remove_subsystem_dir() always (revirt a2ca5e03), because
         trace_remove_event_call() uses it.
       - Modify syscall tracer because of ftrace_event_call change.
      
      [fweisbec@gmail.com: Fixed conflict against latest tracing/core]
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      LKML-Reference: <20090813203453.31965.71901.stgit@localhost.localdomain>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      bd1a5c84
  16. 26 8月, 2009 2 次提交
    • L
      tracing/filters: Add __field_ext() to TRACE_EVENT · 43b51ead
      Li Zefan 提交于
      Add __field_ext(), so a field can be assigned to a specific
      filter_type, which matches a corresponding filter function.
      
      For example, a later patch will allow this:
      	__field_ext(const char *, str, FILTER_PTR_STR);
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A7B9272.60507095@cn.fujitsu.com>
      
      [
        Fixed a -1 to FILTER_OTHER
        Forward ported to latest kernel.
      ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43b51ead
    • J
      tracing: Move tracepoint callbacks from declaration to definition · 97419875
      Josh Stone 提交于
      It's not strictly correct for the tracepoint reg/unreg callbacks to
      occur when a client is hooking up, because the actual tracepoint may not
      be present yet.  This happens to be fine for syscall, since that's in
      the core kernel, but it would cause problems for tracepoints defined in
      a module that hasn't been loaded yet.  It also means the reg/unreg has
      to be EXPORTed for any modules to use the tracepoint (as in SystemTap).
      
      This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
      DEFINE_TRACE_FN which stores the callbacks in struct tracepoint.  The
      callbacks are used now when the active state of the tracepoint changes
      in set_tracepoint & disable_tracepoint.
      
      This also introduces TRACE_EVENT_FN, so ftrace events can also provide
      registration callbacks if needed.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      97419875
  17. 19 8月, 2009 2 次提交
  18. 12 8月, 2009 2 次提交
    • F
      tracing: Add ftrace event call parameter to its field descriptor handler · e8f9f4d7
      Frederic Weisbecker 提交于
      Add the struct ftrace_event_call as a parameter of its show_format()
      callback. This way we can use it from the syscall trace events to
      retrieve the syscall name from the ftrace event call parameter and
      describe its fields using the syscalls metadata.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      e8f9f4d7
    • J
      tracing: Add ftrace_event_call void * 'data' field · 69fd4f0e
      Jason Baron 提交于
      add an optional void * pointer to 'ftrace_event_call' that is
      passed in for regfunc and unregfunc.
      
      This prepares for syscall tracepoints creation by passing the name of
      the syscall we want to trace and then retrieve its number through our
      arch syscall table.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      69fd4f0e
  19. 10 8月, 2009 3 次提交
    • F
      perf_counter: Zero dead bytes from ftrace raw samples size alignment · 1853db0e
      Frederic Weisbecker 提交于
      After aligning the ftrace raw samples, there are dead bytes storing
      random data from the stack. We don't want to leak these to userspace,
      then zero these out.
      
      Before:
      
      	0x2de88 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00  h...h...,......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f  h...@.F........
                                                            ^  ^  ^  ^
                                                               Leak
      
      After:
      
      	0x2d318 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00  h...h...h......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00  h.....F........
                                                            ^  ^  ^  ^
      							 Fixed
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      1853db0e
    • F
      perf_counter: Subtract the buffer size field from the event record size · 304703ab
      Frederic Weisbecker 提交于
      We compute the perf raw sample size by aligning the raw ftrace
      event size plus the buffer size field itself. We do that
      instead of aligning only the perf raw sample size, so that we
      might economize some in some cases.
      
      But this buffer size field is not stored in the perf raw
      sample, we must then substract its size from the buffer once we
      computed the alignment unless we may get a useless u32 field in
      the buffer.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090810141129.GA5124@nowhere>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      304703ab
    • P
      perf_counter: Correct PERF_SAMPLE_RAW output · a044560c
      Peter Zijlstra 提交于
      PERF_SAMPLE_* output switches should unconditionally output the
      correct format, as they are the only way to unambiguously parse
      the PERF_EVENT_SAMPLE data.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249896447.17467.74.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a044560c
  20. 09 8月, 2009 2 次提交
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
    • P
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra 提交于
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a659305
  21. 20 7月, 2009 2 次提交
  22. 11 6月, 2009 1 次提交
    • S
      tracing: do not translate event helper macros in print format · 6ff9a64d
      Steven Rostedt 提交于
      By moving the macro that creates the print format code above the
      defining of the event macro helpers (__get_str, __print_symbolic,
      and __get_dynamic_array), we get a little cleaner print format.
      
      Instead of:
      
        (char *)((void *)REC + REC->__data_loc_name)
      
      we get:
      
         __get_str(name)
      
      Instead of:
      
         ({ static const struct trace_print_flags symbols[] = { { HI_SOFTIRQ, "HI" }, {
      
      we get:
      
         __print_symbolic(REC->vec, { HI_SOFTIRQ, "HI" }, {
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6ff9a64d
  23. 03 6月, 2009 1 次提交
    • S
      tracing: fix multiple use of __print_flags and __print_symbolic · 56d8bd3f
      Steven Whitehouse 提交于
      Here is an updated patch to include the extra call to
      trace_seq_init() as requested. This is vs. the latest
      -tip tree and fixes the use of multiple __print_flags
      and __print_symbolic in a single tracer. Also tested
      to ensure its working now:
      
      mount.gfs2-2534  [000]   235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
      mount.gfs2-2534  [000]   235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
      mount.gfs2-2534  [000]   235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
      glock_workqueue-2529  [000]   235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
      glock_workqueue-2529  [000]   235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      56d8bd3f
  24. 02 6月, 2009 3 次提交
    • L
      tracing/events: introduce __dynamic_array() · 7fcb7c47
      Li Zefan 提交于
      __string() is limited:
      
        - it's a char array, but we may want to define array with other types
        - a source string should be available, but we may just know the string size
      
      We introduce __dynamic_array() to break those limitations, and __string()
      becomes a wrapper of it. As a side effect, now __get_str() can be used
      in TP_fast_assign but not only TP_print.
      
      Take XFS for example, we have the string length in the dirent, but the
      string itself is not NULL-terminated, so __dynamic_array() can be used:
      
      TRACE_EVENT(xfs_dir2,
      	TP_PROTO(struct xfs_da_args *args),
      	TP_ARGS(args),
      
      	TP_STRUCT__entry(
      		__field(int, namelen)
      		__dynamic_array(char, name, args->namelen + 1)
      		...
      	),
      
      	TP_fast_assign(
      		char *name = __get_str(name);
      
      		if (args->namelen)
      			memcpy(name, args->name, args->namelen);
      		name[args->namelen] = '\0';
      
      		__entry->namelen = args->namelen;
      	),
      
      	TP_printk("name %.*s namelen %d",
      		  __entry->namelen ? __get_str(name) : NULL
      		  __entry->namelen)
      );
      
      [ Impact: allow defining dynamic size arrays ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2384D2.3080403@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7fcb7c47
    • L
      tracing/events: put TP_fast_assign into braces · a9c1c3ab
      Li Zefan 提交于
      Currently TP_fast_assign has a limitation that we can't define local
      variables in it.
      
      Here's one use case when we introduce __dynamic_array():
      
      TP_fast_assign(
      	type *p = __get_dynamic_array(item);
      
      	foo(p);
      	bar(p);
      ),
      
      [ Impact: allow defining local variables in TP_fast_assign ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2384B1.90100@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9c1c3ab
    • L
      tracing/events: fix a typo in __string() format output · 6e25db44
      Li Zefan 提交于
      "tsize" should be "\tsize". Also remove the space before "__str_loc".
      
      Before:
       # cat tracing/events/irq/irq_handler_entry/format
              ...
              field:int irq;  offset:12;      size:4;
              field: __str_loc name;  offset:16;tsize:2;
              ...
      
      After:
       # cat tracing/events/irq/irq_handler_entry/format
      	...
              field:int irq;  offset:12;      size:4;
              field:__str_loc name;   offset:16;      size:2;
      	...
      
      [ Impact: standardize __string field description in events format file ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6e25db44
  25. 28 5月, 2009 1 次提交
  26. 27 5月, 2009 2 次提交
    • S
      tracing: add __print_symbolic to trace events · 0f4fc29d
      Steven Rostedt 提交于
      This patch adds __print_symbolic which is similar to __print_flags but
      works for an enumeration type instead. That is, there is only a one to one
      mapping between the values and the symbols. When a match is made, then
      it is printed, otherwise the hex value is outputed.
      
      [ Impact: add interface for showing symbol names in events ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      0f4fc29d
    • S
      tracing: add __print_flags for events · be74b73a
      Steven Rostedt 提交于
      Developers have been asking for the ability in the ftrace event tracer
      to display names of bits in a flags variable.
      
      Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
      assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
      
      Some examples where this would be useful are the state flags in a context
      switch, kmalloc flags, and even permision flags in accessing files.
      
      [
        v2 changes include:
      
        Frederic Weisbecker's idea of using a mask instead of bits,
        thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
      
        Li Zefan's idea of allowing the caller of __print_flags to add their
        own delimiter (or no delimiter) where we can get for file permissions
        rwx instead of r|w|x.
      ]
      
      [
        v3 changes:
      
         Christoph Hellwig's idea of using an array instead of va_args.
      ]
      
      [ Impact: better displaying of flags in trace output ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      be74b73a
  27. 26 5月, 2009 1 次提交