1. 06 5月, 2009 1 次提交
    • S
      ring-buffer: add benchmark and tester · 5092dbc9
      Steven Rostedt 提交于
      This patch adds code that can benchmark the ring buffer as well as
      test it. This code can be compiled into the kernel (not recommended)
      or as a module.
      
      A separate ring buffer is used to not interfer with other users, like
      ftrace. It creates a producer and a consumer (option to disable creation
      of the consumer) and will run for 10 seconds, then sleep for 10 seconds
      and then repeat.
      
      While running, the producer will write 10 byte loads into the ring
      buffer with just putting in the current CPU number. The reader will
      continually try to read the buffer. The reader will alternate from reading
      the buffer via event by event, or by full pages.
      
      The output is a pr_info, thus it will fill up the syslogs.
      
        Starting ring buffer hammer
        End ring buffer hammer
        Time:     9000349 (usecs)
        Overruns: 12578640
        Read:     5358440  (by events)
        Entries:  0
        Total:    17937080
        Missed:   0
        Hit:      17937080
        Entries per millisec: 1993
        501 ns per entry
        Sleeping for 10 secs
        Starting ring buffer hammer
        End ring buffer hammer
        Time:     9936350 (usecs)
        Overruns: 0
        Read:     28146644  (by pages)
        Entries:  74
        Total:    28146718
        Missed:   0
        Hit:      28146718
        Entries per millisec: 2832
        353 ns per entry
        Sleeping for 10 secs
      
      Time:      is the time the test ran
      Overruns:  the number of events that were overwritten and not read
      Read:      the number of events read (either by pages or events)
      Entries:   the number of entries left in the buffer
                       (the by pages will only read full pages)
      Total:     Entries + Read + Overruns
      Missed:    the number of entries that failed to write
      Hit:       the number of entries that were written
      
      The above example shows that it takes ~353 nanosecs per entry when
      there is a reader, reading by pages (and no overruns)
      
      The event by event reader slowed the producer down to 501 nanosecs.
      
      [ Impact: see how changes to the ring buffer affect stability and performance ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5092dbc9
  2. 15 4月, 2009 2 次提交
    • S
      tracing/events: move the ftrace event tracing code to core · f42c85e7
      Steven Rostedt 提交于
      This patch moves the ftrace creation into include/trace/ftrace.h and
      simplifies the work of developers in adding new tracepoints.
      Just the act of creating the trace points in include/trace and including
      define_trace.h will create the events in the debugfs/tracing/events
      directory.
      
      This patch removes the need of include/trace/trace_events.h
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f42c85e7
    • I
      tracing: make the trace clocks available generally · 56449f43
      Ingo Molnar 提交于
      Jeremy Fitzhardinge reported this build failure:
      
       LD	 .tmp_vmlinux1
       arch/x86/kernel/built-in.o: In function `ds_take_timestamp':
       git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global'
       git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global'
      
      Which is due to !CONFIG_TRACING && CONFIG_X86_DS=y.
      
      Expose the trace clock code to CONFIG_X86_DS as well.
      
      [ Unfortunately librarizing doesnt work well - ancient architectures
        with no raw_local_irq_save() primitive break the build. ]
      Reported-by: NJeremy Fitzhardinge <jeremy@goop.org>
      LKML-Reference: <49E4413F.7070700@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56449f43
  3. 14 4月, 2009 1 次提交
    • T
      tracing/infrastructure: separate event tracer from event support · 5f77a88b
      Tom Zanussi 提交于
      Add a new config option, CONFIG_EVENT_TRACING that gets selected
      when CONFIG_TRACING is selected and adds everything needed by the stuff
      in trace_export - basically all the event tracing support needed by e.g.
      bprint, minus the actual events, which are only included if
      CONFIG_EVENT_TRACER is selected.
      
      So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
      (what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
      on or off the base event tracing support used by both the event tracer and
      the other things such as bprint that can't be configured out.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1239178441.10295.34.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f77a88b
  4. 23 3月, 2009 1 次提交
    • T
      tracing: add per-event filtering · 7ce7e424
      Tom Zanussi 提交于
      This patch adds per-event filtering to the event tracing subsystem.
      
      It adds a 'filter' debugfs file to each event directory.  This file can
      be written to to set filters; reading from it will display the current
      set of filters set for that event.
      
      Basically, any field listed in the 'format' file for an event can be
      filtered on (including strings, but not yet other array types) using
      either matching ('==') or non-matching ('!=') 'predicates'.  A
      'predicate' can be either a single expression:
      
       # echo pid != 0 > filter
      
       # cat filter
       pid != 0
      
      or a compound expression of up to 8 sub-expressions combined using '&&'
      or '||':
      
       # echo comm == Xorg > filter
       # echo "&& sig != 29" > filter
      
       # cat filter
       comm == Xorg
       && sig != 29
      
      Only events having field values matching an expression will be available
      in the trace output; non-matching events are discarded.
      
      Note that a compound expression is built up by echoing each
      sub-expression separately - it's not the most efficient way to do
      things, but it keeps the parser simple and assumes that compound
      expressions will be relatively uncommon.  In any case, a subsequent
      patch introducing a way to set filters for entire subsystems should
      mitigate any need to do this for lots of events.
      
      Setting a filter without an '&&' or '||' clears the previous filter
      completely and sets the filter to the new expression:
      
       # cat filter
       comm == Xorg
       && sig != 29
      
       # echo comm != Xorg
      
       # cat filter
       comm != Xorg
      
      To clear a filter, echo 0 to the filter file:
      
       # echo 0 > filter
       # cat filter
       none
      
      The limit of 8 predicates for a compound expression is arbitrary - for
      efficiency, it's implemented as an array of pointers to predicates, and
      8 seemed more than enough for any filter...
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1237710665.7703.48.camel@charm-linux>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ce7e424
  5. 20 3月, 2009 1 次提交
  6. 13 3月, 2009 1 次提交
  7. 07 3月, 2009 2 次提交
    • F
      tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() · 769b0441
      Frederic Weisbecker 提交于
      Impact: faster and lighter tracing
      
      Now that we have trace_bprintk() which is faster and consume lesser
      memory than trace_printk() and has the same purpose, we can now drop
      the old implementation in favour of the binary one from trace_bprintk(),
      which means we move all the implementation of trace_bprintk() to
      trace_printk(), so the Api doesn't change except that we must now use
      trace_seq_bprintk() to print the TRACE_PRINT entries.
      
      Some changes result of this:
      
      - Previously, trace_bprintk depended of a single tracer and couldn't
        work without. This tracer has been dropped and the whole implementation
        of trace_printk() (like the module formats management) is now integrated
        in the tracing core (comes with CONFIG_TRACING), though we keep the file
        trace_printk (previously trace_bprintk.c) where we can find the module
        management. Thus we don't overflow trace.c
      
      - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
      
      - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
        constants, and fix 'const' qualifiers warnings. But this is all transparent for
        developers.
      
      - etc...
      
      V2:
      
      - Rebase against last changes
      - Fix mispell on the changelog
      
      V3:
      
      - Rebase against last changes (moving trace_printk() to kernel.h)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      769b0441
    • L
      tracing: infrastructure for supporting binary record · 1427cdf0
      Lai Jiangshan 提交于
      Impact: save on memory for tracing
      
      Current tracers are typically using a struct(like struct ftrace_entry,
      struct ctx_switch_entry, struct special_entr etc...)to record a binary
      event. These structs can only record a their own kind of events.
      A new kind of tracer need a new struct and a lot of code too handle it.
      
      So we need a generic binary record for events. This infrastructure
      is for this purpose.
      
      [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
      tracing as reported by Steven Rostedt]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1427cdf0
  8. 06 3月, 2009 1 次提交
    • S
      tracing: add format files for ftrace default entries · 770cb243
      Steven Rostedt 提交于
      Impact: allow user apps to read binary format of basic ftrace entries
      
      Currently, only defined raw events export their formats so a binary
      reader can parse them. There's no reason that the default ftrace entries
      can't export their formats.
      
      This patch adds a subsystem called "ftrace" in the events directory
      that includes the ftrace entries for basic ftrace recorded items.
      
      These only have three files in the events directory:
      
       type             : printf
       available_types  : printf
       format           : format for the event entry
      
      For example:
      
       # cat /debug/tracing/events/ftrace/wakeup/format
      name: wakeup
      ID: 3
      format:
              field:unsigned char type;       offset:0;       size:1;
              field:unsigned char flags;      offset:1;       size:1;
              field:unsigned char preempt_count;      offset:2;       size:1;
              field:int pid;  offset:4;       size:4;
              field:int tgid; offset:8;       size:4;
      
              field:unsigned int prev_pid;    offset:12;      size:4;
              field:unsigned char prev_prio;  offset:16;      size:1;
              field:unsigned char prev_state; offset:17;      size:1;
              field:unsigned int next_pid;    offset:20;      size:4;
              field:unsigned char next_prio;  offset:24;      size:1;
              field:unsigned char next_state; offset:25;      size:1;
              field:unsigned int next_cpu;    offset:28;      size:4;
      
      print fmt: "%u:%u:%u  ==+ %u:%u:%u [%03u]"
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      770cb243
  9. 27 2月, 2009 1 次提交
  10. 25 2月, 2009 2 次提交
    • S
      tracing: add schedule events to event trace · f3fe8e4a
      Steven Rostedt 提交于
      This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT
      such that they are automatically registered with the event tracer.
      
      And it also adds the tracing sched headers to kernel/trace/events.c
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      f3fe8e4a
    • S
      tracing: add event trace infrastructure · b77e38aa
      Steven Rostedt 提交于
      This patch creates the event tracing infrastructure of ftrace.
      It will create the files:
      
       /debug/tracing/available_events
       /debug/tracing/set_event
      
      The available_events will list the trace points that have been
      registered with the event tracer.
      
      set_events will allow the user to enable or disable an event hook.
      
      example:
      
       # echo sched_wakeup > /debug/tracing/set_event
      
      Will enable the sched_wakeup event (if it is registered).
      
       # echo "!sched_wakeup" >> /debug/tracing/set_event
      
      Will disable the sched_wakeup event (and only that event).
      
       # echo > /debug/tracing/set_event
      
      Will disable all events (notice the '>')
      
       # cat /debug/tracing/available_events > /debug/tracing/set_event
      
      Will enable all registered event hooks.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      b77e38aa
  11. 09 2月, 2009 1 次提交
  12. 14 1月, 2009 1 次提交
    • F
      tracing: add a new workqueue tracer · e1d8aa9f
      Frederic Weisbecker 提交于
      Impact: new tracer
      
      The workqueue tracer provides some statistical informations
      about each cpu workqueue thread such as the number of the
      works inserted and executed since their creation. It can help
      to evaluate the amount of work each of them have to perform.
      For example it can help a developer to decide whether he should
      choose a per cpu workqueue instead of a singlethreaded one.
      
      It only traces statistical informations for now but it will probably later
      provide event tracing too.
      
      Such a tracer could help too, and be improved, to help rt priority sorted
      workqueue development.
      
      To have a snapshot of the workqueues state at any time, just do
      
      cat /debugfs/tracing/trace_stat/workqueues
      
      Ie:
      
        1    125        125       reiserfs/1
        1      0          0       scsi_tgtd/1
        1      0          0       aio/1
        1      0          0       ata/1
        1    114        114       kblockd/1
        1      0          0       kintegrityd/1
        1   2147       2147       events/1
      
        0      0          0       kpsmoused
        0    105        105       reiserfs/0
        0      0          0       scsi_tgtd/0
        0      0          0       aio/0
        0      0          0       ata_aux
        0      0          0       ata/0
        0      0          0       cqueue
        0      0          0       kacpi_notify
        0      0          0       kacpid
        0    149        149       kblockd/0
        0      0          0       kintegrityd/0
        0   1000       1000       khelper
        0   2270       2270       events/0
      
      Changes in V2:
      
      _ Drop the static array based on NR_CPU and dynamically allocate the stat array
        with num_possible_cpus() and other cpu mask facilities....
      _ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
        even the workqueue barriers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1d8aa9f
  13. 30 12月, 2008 1 次提交
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
  14. 29 12月, 2008 2 次提交
    • F
      tracing/ftrace: provide the base infrastructure for histogram tracing · dbd0b4b3
      Frederic Weisbecker 提交于
      Impact: extend the tracing API
      
      The goal of this patch is to normalize and make more easy the
      implementation of statistical (histogram) tracing.
      
      It implements a trace_stat file into the /debugfs/tracing directory where
      one can print a one-shot output of statistics/histogram entries.
      
      A tracer has to provide two basic iterator callbacks:
      
        stat_start() => the first entry
        stat_next(prev, idx) => the next one.
      
      Note that it is adapted for arrays or hash tables or lists.... since it
      provides a pointer to the previous entry and the current index of the
      iterator.
      
      These two callbacks are called to get a snapshot of the statistics at each
      opening of the trace_stat file because. The values are so updated between
      two "cat trace_stat". And the tracer is free to lock its datas during the
      iteration to keep consistent values.
      
      Since it is almost always interesting to sort statisticals values to
      address the problems by priority, this infrastructure provides a "sorting"
      of the stat entries too if desired. A tracer has just to provide a
      stat_cmp callback to compare two entries and the stat tracing
      infrastructure will build a sorted list of the given entries.
      
      A last callback, called stat_headers, can be implemented by a tracer to
      output headers on its trace.
      
      If one of these callbacks is changed on runtime, it just have to signal it
      to the stat tracing API by calling the init_tracer_stat() helper.
      
      Changes in V2:
      
      - Fix a memory leak if the user opens multiple times the trace_stat file
        without closing it. Now we always free our list before rebuilding it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbd0b4b3
    • S
      ftrace: set up trace event hash infrastructure · f0868d1e
      Steven Rostedt 提交于
      Impact: simplify/generalize/refactor trace.c
      
      The trace.c file is becoming more difficult to maintain due to the
      growing number of events. There is several formats that an event may
      be printed. This patch sets up the infrastructure of an event hash to
      allow for events to register how they should be printed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f0868d1e
  15. 12 12月, 2008 1 次提交
  16. 26 11月, 2008 3 次提交
  17. 13 11月, 2008 2 次提交
  18. 12 11月, 2008 2 次提交
    • S
      tracing: likely/unlikely branch annotation tracer · 52f232cb
      Steven Rostedt 提交于
      Impact: new likely/unlikely branch tracer
      
      This patch adds a way to record the instances of the likely() and unlikely()
      branch condition annotations.
      
      When "unlikely" is set in /debugfs/tracing/iter_ctrl the unlikely conditions
      will be added to any of the ftrace tracers. The change takes effect when
      a new tracer is passed into the current_tracer file.
      
      For example:
      
       bash-3471  [003]   357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177
       bash-3471  [003]   357.014756: [correct] update_curr:sched_fair.c:489
       bash-3471  [003]   357.014758: [correct] calc_delta_fair:sched_fair.c:411
       bash-3471  [003]   357.014759: [correct] account_group_exec_runtime:sched_stats.h:356
       bash-3471  [003]   357.014761: [correct] update_curr:sched_fair.c:489
       bash-3471  [003]   357.014763: [INCORRECT] calc_delta_fair:sched_fair.c:411
       bash-3471  [003]   357.014765: [correct] calc_delta_mine:sched.c:1279
      
      Which shows the normal tracer heading, as well as whether the condition was
      correct "[correct]" or was mistaken "[INCORRECT]", followed by the function,
      file name and line number.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      52f232cb
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  19. 11 11月, 2008 1 次提交
    • F
      tracing: add a tracer to catch execution time of kernel functions · 15e6cb36
      Frederic Weisbecker 提交于
      Impact: add new tracing plugin which can trace full (entry+exit) function calls
      
      This tracer uses the low level function return ftrace plugin to
      measure the execution time of the kernel functions.
      
      The first field is the caller of the function, the second is the
      measured function, and the last one is the execution time in
      nanoseconds.
      
      - v3:
      
      - HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
        should enable it.
      - ftrace_return_stub becomes ftrace_stub.
      - CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
      - Return traces printing can be used for other tracers on trace.c
      - Adapt to the new tracing API (no more ctrl_update callback)
      - Correct the check of "disabled" during insertion.
      - Minor changes...
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15e6cb36
  20. 21 10月, 2008 1 次提交
  21. 14 10月, 2008 4 次提交
    • S
      tracing: unified trace buffer · 7a8e76a3
      Steven Rostedt 提交于
      This is a unified tracing buffer that implements a ring buffer that
      hopefully everyone will eventually be able to use.
      
      The events recorded into the buffer have the following structure:
      
        struct ring_buffer_event {
      	u32 type:2, len:3, time_delta:27;
      	u32 array[];
        };
      
      The minimum size of an event is 8 bytes. All events are 4 byte
      aligned inside the buffer.
      
      There are 4 types (all internal use for the ring buffer, only
      the data type is exported to the interface users).
      
       RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
      	of a buffer page.
      
       RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
      	is greater than the 27 bit delta can hold. We add another
      	32 bits, and record that in its own event (8 byte size).
      
       RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
      	help keep the buffer timestamps in sync.
      
      RINGBUF_TYPE_DATA: The event actually holds user data.
      
      The "len" field is only three bits. Since the data must be
      4 byte aligned, this field is shifted left by 2, giving a
      max length of 28 bytes. If the data load is greater than 28
      bytes, the first array field holds the full length of the
      data load and the len field is set to zero.
      
      Example, data size of 7 bytes:
      
      	type = RINGBUF_TYPE_DATA
      	len = 2
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0..1]: <7 bytes of data> <1 byte empty>
      
      This event is saved in 12 bytes of the buffer.
      
      An event with 82 bytes of data:
      
      	type = RINGBUF_TYPE_DATA
      	len = 0
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0]: 84 (Note the alignment)
      	array[1..14]: <82 bytes of data> <2 bytes empty>
      
      The above event is saved in 92 bytes (if my math is correct).
      82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
      
      Do not reference the above event struct directly. Use the following
      functions to gain access to the event table, since the
      ring_buffer_event structure may change in the future.
      
      ring_buffer_event_length(event): get the length of the event.
      	This is the size of the memory used to record this
      	event, and not the size of the data pay load.
      
      ring_buffer_time_delta(event): get the time delta of the event
      	This returns the delta time stamp since the last event.
      	Note: Even though this is in the header, there should
      		be no reason to access this directly, accept
      		for debugging.
      
      ring_buffer_event_data(event): get the data from the event
      	This is the function to use to get the actual data
      	from the event. Note, it is only a pointer to the
      	data inside the buffer. This data must be copied to
      	another location otherwise you risk it being written
      	over in the buffer.
      
      ring_buffer_lock: A way to lock the entire buffer.
      ring_buffer_unlock: unlock the buffer.
      
      ring_buffer_alloc: create a new ring buffer. Can choose between
      	overwrite or consumer/producer mode. Overwrite will
      	overwrite old data, where as consumer producer will
      	throw away new data if the consumer catches up with the
      	producer.  The consumer/producer is the default.
      
      ring_buffer_free: free the ring buffer.
      
      ring_buffer_resize: resize the buffer. Changes the size of each cpu
      	buffer. Note, it is up to the caller to provide that
      	the buffer is not being used while this is happening.
      	This requirement may go away but do not count on it.
      
      ring_buffer_lock_reserve: locks the ring buffer and allocates an
      	entry on the buffer to write to.
      ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
      	the buffer.
      
      ring_buffer_write: writes some data into the ring buffer.
      
      ring_buffer_peek: Look at a next item in the cpu buffer.
      ring_buffer_consume: get the next item in the cpu buffer and
      	consume it. That is, this function increments the head
      	pointer.
      
      ring_buffer_read_start: Start an iterator of a cpu buffer.
      	For now, this disables the cpu buffer, until you issue
      	a finish. This is just because we do not want the iterator
      	to be overwritten. This restriction may change in the future.
      	But note, this is used for static reading of a buffer which
      	is usually done "after" a trace. Live readings would want
      	to use the ring_buffer_consume above, which will not
      	disable the ring buffer.
      
      ring_buffer_read_finish: Finishes the read iterator and reenables
      	the ring buffer.
      
      ring_buffer_iter_peek: Look at the next item in the cpu iterator.
      ring_buffer_read: Read the iterator and increment it.
      ring_buffer_iter_reset: Reset the iterator to point to the beginning
      	of the cpu buffer.
      ring_buffer_iter_empty: Returns true if the iterator is at the end
      	of the cpu buffer.
      
      ring_buffer_size: returns the size in bytes of each cpu buffer.
      	Note, the real size is this times the number of CPUs.
      
      ring_buffer_reset_cpu: Sets the cpu buffer to empty
      ring_buffer_reset: sets all cpu buffers to empty
      
      ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
      	cpu buffer of another buffer. This is handy when you
      	want to take a snap shot of a running trace on just one
      	cpu. Having a backup buffer, to swap with facilitates this.
      	Ftrace max latencies use this.
      
      ring_buffer_empty: Returns true if the ring buffer is empty.
      ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
      
      ring_buffer_record_disable: disable all cpu buffers (read only)
      ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
      ring_buffer_record_enable: enable all cpu buffers.
      ring_buffer_record_enabl_cpu: enable a single cpu buffer.
      
      ring_buffer_entries: The number of entries in a ring buffer.
      ring_buffer_overruns: The number of entries removed due to writing wrap.
      
      ring_buffer_time_stamp: Get the time stamp used by the ring buffer
      ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
      	into nanosecs.
      
      I still need to implement the GTOD feature. But we need support from
      the cpu frequency infrastructure.  But this can be done at a later
      time without affecting the ring buffer interface.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a8e76a3
    • F
      tracing/ftrace: give an entry on the config for boot tracer · 1f5c2abb
      Frédéric Weisbecker 提交于
      Bring the entry to choose the boot tracer on the kernel config.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f5c2abb
    • S
      ftrace: add nop tracer · fb1b6d8b
      Steven Noonan 提交于
      A no-op tracer which can serve two purposes:
      
       1. A template for development of a new tracer.
       2. A convenient way to see ftrace_printk() calls without
          an irrelevant trace making the output messy.
      
      [ mingo@elte.hu: resolved conflicts ]
      Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fb1b6d8b
    • S
      ftrace: add stack tracer · e5a81b62
      Steven Rostedt 提交于
      This is another tracer using the ftrace infrastructure, that examines
      at each function call the size of the stack. If the stack use is greater
      than the previous max it is recorded.
      
      You can always see (and set) the max stack size seen. By setting it
      to zero will start the recording again. The backtrace is also available.
      
      For example:
      
      # cat /debug/tracing/stack_max_size
      1856
      
      # cat /debug/tracing/stack_trace
      [<c027764d>] stack_trace_call+0x8f/0x101
      [<c021b966>] ftrace_call+0x5/0x8
      [<c02553cc>] clocksource_get_next+0x12/0x48
      [<c02542a5>] update_wall_time+0x538/0x6d1
      [<c0245913>] do_timer+0x23/0xb0
      [<c0257657>] tick_do_update_jiffies64+0xd9/0xf1
      [<c02576b9>] tick_sched_timer+0x4a/0xad
      [<c0250fe6>] __run_hrtimer+0x3e/0x75
      [<c02518ed>] hrtimer_interrupt+0xf1/0x154
      [<c022c870>] smp_apic_timer_interrupt+0x71/0x84
      [<c021b7e9>] apic_timer_interrupt+0x2d/0x34
      [<c0238597>] finish_task_switch+0x29/0xa0
      [<c05abd13>] schedule+0x765/0x7be
      [<c05abfca>] schedule_timeout+0x1b/0x90
      [<c05ab4d4>] wait_for_common+0xab/0x101
      [<c05ab5ac>] wait_for_completion+0x12/0x14
      [<c033cfc3>] blk_execute_rq+0x84/0x99
      [<c0402470>] scsi_execute+0xc2/0x105
      [<c040250a>] scsi_execute_req+0x57/0x7f
      [<c043afe0>] sr_test_unit_ready+0x3e/0x97
      [<c043bbd6>] sr_media_change+0x43/0x205
      [<c046b59f>] media_changed+0x48/0x77
      [<c046b5ff>] cdrom_media_changed+0x31/0x37
      [<c043b091>] sr_block_media_changed+0x16/0x18
      [<c02b9e69>] check_disk_change+0x1b/0x63
      [<c046f4c3>] cdrom_open+0x7a1/0x806
      [<c043b148>] sr_block_open+0x78/0x8d
      [<c02ba4c0>] do_open+0x90/0x257
      [<c02ba869>] blkdev_open+0x2d/0x56
      [<c0296a1f>] __dentry_open+0x14d/0x23c
      [<c0296b32>] nameidata_to_filp+0x24/0x38
      [<c02a1c68>] do_filp_open+0x347/0x626
      [<c02967ef>] do_sys_open+0x47/0xbc
      [<c02968b0>] sys_open+0x23/0x2b
      [<c021aadd>] sysenter_do_call+0x12/0x26
      
      I've tested this on both x86_64 and i386.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5a81b62
  22. 24 5月, 2008 8 次提交
    • P
      ftrace: add mmiotrace plugin · f984b51e
      Pekka Paalanen 提交于
      On Sat, 22 Mar 2008 13:07:47 +0100
      Ingo Molnar <mingo@elte.hu> wrote:
      
      > > > i'd suggest the following: pull x86.git and sched-devel.git into a
      > > > single tree [the two will combine without rejects]. Then try to add a
      > > > kernel/tracing/trace_mmiotrace.c ftrace plugin. The trace_sysprof.c
      > > > plugin might be a good example.
      > >
      > > I did this and now I have mmiotrace enabled/disabled via the tracing
      > > framework (what do we call this, since ftrace is one of the tracers?).
      >
      > cool! could you send the patches for that? (even if they are not fully
      > functional yet)
      
      Patch attached in the end. Nice to see how much code disappeared. I tried
      to mark all the features I had to break with XXX-comments.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f984b51e
    • I
      ftrace: add sysprof plugin · f06c3810
      Ingo Molnar 提交于
      very first baby version.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f06c3810
    • S
      ftrace: fix dynamic ftrace selftest · d05cdb25
      Steven Rostedt 提交于
      With the adding of the configuration changes in the Makefile to prevent
      tracing of functions in the ftrace code, all tracing of all the ftrace
      code has been removed. Unfortunately, one of the selftests, relied on
      a function to be traced. With the new change, the function was no longer
      traced and the test failed.
      
      This patch separates out the test function into its own file so that
      we can add the "-pg" flag to the compilation of that function and the
      adding of the mcount call to that function.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d05cdb25
    • I
      ftrace: disable -pg for the tracer itself · b53dde9d
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b53dde9d
    • S
      ftrace: trace preempt off critical timings · 6cd8a4bb
      Steven Rostedt 提交于
      Add preempt off timings. A lot of kernel core code is taken from the RT patch
      latency trace that was written by Ingo Molnar.
      
      This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers
      
      Now instead of just tracing irqs off, preemption off can be selected
      to be recorded.
      
      When this is selected, it shares the same files as irqs off timings.
      One can either trace preemption off, irqs off, or one or the other off.
      
      By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
      of preempt off only is performed. "irqsoff" will only record the time
      irqs are disabled, but "preemptirqsoff" will take the total time irqs
      or preemption are disabled. Runtime switching of these options is now
      supported by simpling echoing in the appropriate trace name into
      /debugfs/tracing/current_tracer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6cd8a4bb
    • S
      ftrace: trace irq disabled critical timings · 81d68a96
      Steven Rostedt 提交于
      This patch adds latency tracing for critical timings
      (how long interrupts are disabled for).
      
       "irqsoff" is added to /debugfs/tracing/available_tracers
      
      Note:
        tracing_max_latency
          also holds the max latency for irqsoff (in usecs).
         (default to large number so one must start latency tracing)
      
        tracing_thresh
          threshold (in usecs) to always print out if irqs off
          is detected to be longer than stated here.
          If irq_thresh is non-zero, then max_irq_latency
          is ignored.
      
      Here's an example of a trace with ftrace_enabled = 0
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --------------------------------------------------------------------
       latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1d.s3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      
      And this is a trace with ftrace_enabled == 1
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      --------------------------------------------------------------------
       latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1dNs3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
       swapper-0     1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
       swapper-0     1dNs3   47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
       swapper-0     1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
       swapper-0     1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
       swapper-0     1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
       swapper-0     1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
       swapper-0     1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      81d68a96
    • S
      ftrace: tracer for scheduler wakeup latency · 352ad25a
      Steven Rostedt 提交于
      This patch adds the tracer that tracks the wakeup latency of the
      highest priority waking task.
      
        "wakeup" is added to /debugfs/tracing/available_tracers
      
      Also added to /debugfs/tracing
      
        tracing_max_latency
           holds the current max latency for the wakeup
      
        wakeup_thresh
           if set to other than zero, a log will be recorded
           for every wakeup that takes longer than the number
           entered in here (usecs for all counters)
           (deletes previous trace)
      
      Examples:
      
        (with ftrace_enabled = 0)
      
      ============
      preemption latency trace v1.1.5 on 2.6.24-rc8
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --------------------------------------------------------------------
       latency: 26 us, #2/2, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: migration/0-3 (uid:0 nice:-5 policy:1 rt_prio:99)
          -----------------
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
         quilt-8551  0d..3    0us+: wake_up_process+0x15/0x17 <ffffffff80233e80> (sched_exec+0xc9/0x100 <ffffffff80235343>)
         quilt-8551  0d..4   26us : sched_switch_callback+0x73/0x81 <ffffffff80338d2f> (schedule+0x483/0x6d5 <ffffffff8048b3ee>)
      
      vim:ft=help
      ============
      
        (with ftrace_enabled = 1)
      
      ============
      preemption latency trace v1.1.5 on 2.6.24-rc8
      --------------------------------------------------------------------
       latency: 36 us, #45/45, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: migration/1-5 (uid:0 nice:-5 policy:1 rt_prio:99)
          -----------------
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
          bash-10653 1d..3    0us : wake_up_process+0x15/0x17 <ffffffff80233e80> (sched_exec+0xc9/0x100 <ffffffff80235343>)
          bash-10653 1d..3    1us : try_to_wake_up+0x271/0x2e7 <ffffffff80233dcf> (sub_preempt_count+0xc/0x7a <ffffffff8023309e>)
          bash-10653 1d..2    2us : try_to_wake_up+0x296/0x2e7 <ffffffff80233df4> (update_rq_clock+0x9/0x20 <ffffffff802303f3>)
          bash-10653 1d..2    2us : update_rq_clock+0x1e/0x20 <ffffffff80230408> (__update_rq_clock+0xc/0x90 <ffffffff80230366>)
          bash-10653 1d..2    3us : __update_rq_clock+0x1b/0x90 <ffffffff80230375> (sched_clock+0x9/0x29 <ffffffff80214529>)
          bash-10653 1d..2    4us : try_to_wake_up+0x2a6/0x2e7 <ffffffff80233e04> (activate_task+0xc/0x3f <ffffffff8022ffca>)
          bash-10653 1d..2    4us : activate_task+0x2d/0x3f <ffffffff8022ffeb> (enqueue_task+0xe/0x66 <ffffffff8022ff66>)
          bash-10653 1d..2    5us : enqueue_task+0x5b/0x66 <ffffffff8022ffb3> (enqueue_task_rt+0x9/0x3c <ffffffff80233351>)
          bash-10653 1d..2    6us : try_to_wake_up+0x2ba/0x2e7 <ffffffff80233e18> (check_preempt_wakeup+0x12/0x99 <ffffffff80234f84>)
      [...]
          bash-10653 1d..5   33us : tracing_record_cmdline+0xcf/0xd4 <ffffffff80338aad> (_spin_unlock+0x9/0x33 <ffffffff8048d3ec>)
          bash-10653 1d..5   34us : _spin_unlock+0x19/0x33 <ffffffff8048d3fc> (sub_preempt_count+0xc/0x7a <ffffffff8023309e>)
          bash-10653 1d..4   35us : wakeup_sched_switch+0x65/0x2ff <ffffffff80339f66> (_spin_lock_irqsave+0xc/0xa9 <ffffffff8048d08b>)
          bash-10653 1d..4   35us : _spin_lock_irqsave+0x19/0xa9 <ffffffff8048d098> (add_preempt_count+0xe/0x77 <ffffffff8023311a>)
          bash-10653 1d..4   36us : sched_switch_callback+0x73/0x81 <ffffffff80338d2f> (schedule+0x483/0x6d5 <ffffffff8048b3ee>)
      
      vim:ft=help
      ============
      
      The [...] was added here to not waste your email box space.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      352ad25a
    • S
      ftrace: add tracing of context switches · 35e8e302
      Steven Rostedt 提交于
      This patch adds context switch tracing, of the format of:
      
                        _------=> CPU#
                       / _-----=> irqs-off
                      | / _----=> need-resched
                      || / _---=> hardirq/softirq
                      ||| / _--=> preempt-depth
                      |||| /
                      |||||     delay
          cmd     pid ||||| time  |      pid:prio:state
             \   /    |||||   \   |      /
        swapper-0     1d..3    137us+:  0:140:R --> 2912:120
           sshd-2912  1d..3    216us+:  2912:120:S --> 0:140
        swapper-0     1d..3    261us+:  0:140:R --> 2912:120
           bash-2920  0d..3    267us+:  2920:120:S --> 0:140
           sshd-2912  1d..3    330us!:  2912:120:S --> 0:140
        swapper-0     1d..3   2389us+:  0:140:R --> 2847:120
       yum-upda-2847  1d..3   2411us!:  2847:120:S --> 0:140
        swapper-0     0d..3  11089us+:  0:140:R --> 3139:120
       gdm-bina-3139  0d..3  11113us!:  3139:120:S --> 0:140
        swapper-0     1d..3 102328us+:  0:140:R --> 2847:120
       yum-upda-2847  1d..3 102348us!:  2847:120:S --> 0:140
      
       "sched_switch" is added to /debugfs/tracing/available_tracers
      
      [ Eugene Teo <eugeneteo@kernel.sg: remove unused tracing_sched_switch_enabled ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      35e8e302