1. 11 2月, 2009 1 次提交
    • A
      ring_buffer: pahole struct ring_buffer · 00f62f61
      Arnaldo Carvalho de Melo 提交于
      While fixing some bugs in pahole (built-in.o files were not being
      processed due to relocation problems) I found out about these packable
      structures:
      
      $ pahole --packable kernel/trace/ring_buffer.o  | grep ring
      ring_buffer	72	64	8
      ring_buffer_per_cpu	112	104	8
      
      If we take a look at the current layout of struct ring_buffer we can see
      that we have two 4 bytes holes.
      
      $ pahole -C ring_buffer kernel/trace/ring_buffer.o
      struct ring_buffer {
      	unsigned int               pages;           /*     0     4 */
      	unsigned int               flags;           /*     4     4 */
      	int                        cpus;            /*     8     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	cpumask_var_t              cpumask;         /*    16     8 */
      	atomic_t                   record_disabled; /*    24     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct mutex               mutex;           /*    32    32 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct ring_buffer_per_cpu * * buffers;     /*    64     8 */
      
      	/* size: 72, cachelines: 2, members: 7 */
      	/* sum members: 64, holes: 2, sum holes: 8 */
      	/* last cacheline: 8 bytes */
      };
      
      So, if I ask pahole to reorganize it:
      
      $ pahole -C ring_buffer --reorganize kernel/trace/ring_buffer.o
      
      struct ring_buffer {
      	unsigned int               pages;           /*     0     4 */
      	unsigned int               flags;           /*     4     4 */
      	int                        cpus;            /*     8     4 */
      	atomic_t                   record_disabled; /*    12     4 */
      	cpumask_var_t              cpumask;         /*    16     8 */
      	struct mutex               mutex;           /*    24    32 */
      	struct ring_buffer_per_cpu * * buffers;     /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      
      	/* size: 64, cachelines: 1, members: 7 */
      };   /* saved 8 bytes and 1 cacheline! */
      
      We get it using just one 64 bytes cacheline.
      
      To see what it did:
      
      $ pahole -C ring_buffer --reorganize --show_reorg_steps \
      	kernel/trace/ring_buffer.o | grep \/
      /* Moving 'record_disabled' from after 'cpumask' to after 'cpus' */
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00f62f61
  2. 10 2月, 2009 3 次提交
  3. 09 2月, 2009 6 次提交
  4. 08 2月, 2009 4 次提交
    • W
      trace: trivial fixes in comment typos. · 57794a9d
      Wenji Huang 提交于
      Impact: clean up
      
      Fixed several typos in the comments.
      Signed-off-by: NWenji Huang <wenji.huang@oracle.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      57794a9d
    • S
      ring-buffer: use generic version of in_nmi · a81bd80a
      Steven Rostedt 提交于
      Impact: clean up
      
      Now that a generic in_nmi is available, this patch removes the
      special code in the ring_buffer and implements the in_nmi generic
      version instead.
      
      With this change, I was also able to rename the "arch_ftrace_nmi_enter"
      back to "ftrace_nmi_enter" and remove the code from the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a81bd80a
    • S
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt 提交于
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      78d904b4
    • S
      trace: remove deprecated entry->cpu · 1830b52d
      Steven Rostedt 提交于
      Impact: fix to prevent developers from using entry->cpu
      
      With the new ring buffer infrastructure, the cpu for the entry is
      implicit with which CPU buffer it is on.
      
      The original code use to record the current cpu into the generic
      entry header, which can be retrieved by entry->cpu. When the
      ring buffer was introduced, the users were convert to use the
      the cpu number of which cpu ring buffer was in use (this was passed
      to the tracers by the iterator: iter->cpu).
      
      Unfortunately, the cpu item in the entry structure was never removed.
      This allowed for developers to use it instead of the proper iter->cpu,
      unknowingly, using an uninitialized variable. This was not the fault
      of the developers, since it would seem like the logical place to
      retrieve the cpu identifier.
      
      This patch removes the cpu item from the entry structure and fixes
      all the users that should have been using iter->cpu.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      1830b52d
  5. 06 2月, 2009 3 次提交
    • A
      trace: Call tracing_reset_online_cpus before tracer->init() · b6f11df2
      Arnaldo Carvalho de Melo 提交于
      Impact: cleanup
      
      To make it easy for ftrace plugin writers, as this was open coded in
      the existing plugins
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b6f11df2
    • A
      tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} · 51a763dd
      Arnaldo Carvalho de Melo 提交于
      Impact: new API
      
      These new functions do what previously was being open coded, reducing
      the number of details ftrace plugin writers have to worry about.
      
      It also standardizes the handling of stacktrace, userstacktrace and
      other trace options we may introduce in the future.
      
      With this patch, for instance, the blk tracer (and some others already
      in the tree) can use the "userstacktrace" /d/tracing/trace_options
      facility.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |   -5
        trace_graph_return         |  -22
        trace_graph_entry          |  -26
        trace_function             |  -45
        __ftrace_trace_stack       |  -27
        ftrace_trace_userstack     |  -29
        tracing_sched_switch_trace |  -66
        tracing_stop               |   +1
        trace_seq_to_user          |   -1
        ftrace_trace_special       |  -63
        ftrace_special             |   +1
        tracing_sched_wakeup_trace |  -70
        tracing_reset_online_cpus  |   -1
       13 functions changed, 2 bytes added, 355 bytes removed, diff: -353
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |  -58
       1 function changed, 58 bytes removed, diff: -58
      
      linux-2.6-tip/kernel/trace/trace.c:
        trace_buffer_lock_reserve  |  +88
        trace_buffer_unlock_commit |  +86
       2 functions changed, 174 bytes added, diff: +174
      
      /tmp/vmlinux.after:
       16 functions changed, 176 bytes added, 413 bytes removed, diff: -237
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      51a763dd
    • A
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo 提交于
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a987751
  6. 05 2月, 2009 6 次提交
  7. 04 2月, 2009 1 次提交
  8. 03 2月, 2009 4 次提交
    • A
      trace: Change struct trace_event callbacks parameter list · 2c9b238e
      Arnaldo Carvalho de Melo 提交于
      Impact: API change
      
      The trace_seq and trace_entry are in trace_iterator, where there are
      more fields that may be needed by tracers, so just pass the
      tracer_iterator as is already the case for struct tracer->print_line.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c9b238e
    • F
      trace: better manage the context info for events · c4a8e8be
      Frederic Weisbecker 提交于
      Impact: make trace_event more convenient for tracers
      
      All tracers (for the moment) that use the struct trace_event want to
      have the context info printed before their own output: the pid/cmdline,
      cpu, and timestamp.
      
      But some other tracers that want to implement their trace_event
      callbacks will not necessary need these information or they may want to
      format them as they want.
      
      This patch adds a new default-enabled trace option:
      TRACE_ITER_CONTEXT_INFO When disabled through:
      
      echo nocontext-info > /debugfs/tracing/trace_options
      
      The pid, cpu and timestamps headers will not be printed.
      
      IE with the sched_switch tracer with context-info (default):
      
           bash-2935 [001] 100.356561: 2935:120:S ==> [001]  0:140:R <idle>
         <idle>-0    [000] 100.412804:    0:140:R   + [000] 11:115:S events/0
         <idle>-0    [000] 100.412816:    0:140:R ==> [000] 11:115:R events/0
       events/0-11   [000] 100.412829:   11:115:S ==> [000]  0:140:R <idle>
      
      Without context-info:
      
       2935:120:S ==> [001]  0:140:R <idle>
          0:140:R   + [000] 11:115:S events/0
          0:140:R ==> [000] 11:115:R events/0
         11:115:S ==> [000]  0:140:R <idle>
      
      A tracer can disable it at runtime by clearing the bit
      TRACE_ITER_CONTEXT_INFO in trace_flags.
      
      The print routines were renamed to trace_print_context and
      trace_print_lat_context, so that they can be used by tracers if they
      want to use them for one of the trace_event callbacks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c4a8e8be
    • S
      trace: let boot trace be chosen by command line · 79fb0768
      Steven Rostedt 提交于
      Now that we have a working ftrace=<tracer> function, make the boot
      tracer get activated by it. This way we can turn it on or off without
      recompiling the kernel, as well as keeping the selftests on. The
      selftests are disabled whenever a default tracer starts running.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      79fb0768
    • S
      trace: fix default boot up tracer · b2821ae6
      Steven Rostedt 提交于
      Peter Zijlstra started the functionality to start up a default
      tracing at bootup. This patch finishes the work.
      
      Now if you add 'ftrace=<tracer>' to the command line, when that tracer
      is registered on bootup, that tracer is selected and starts tracing.
      
      Note, all selftests for tracers that are registered after this tracer
      is disabled. This prevents the selftests from disturbing the running
      tracer, or the running tracer from disturbing the selftest.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2821ae6
  9. 30 1月, 2009 1 次提交
  10. 29 1月, 2009 2 次提交
  11. 26 1月, 2009 3 次提交
    • A
      blktrace: add ftrace plugin · c71a8961
      Arnaldo Carvalho de Melo 提交于
      Impact: New way of using the blktrace infrastructure
      
      This drops the requirement of userspace utilities to use the blktrace
      facility.
      
      Configuration is done thru sysfs, adding a "trace" directory to the
      partition directory where blktrace can be enabled for the associated
      request_queue.
      
      The same filters present in the IOCTL interface are present as sysfs
      device attributes.
      
      The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
      filters.
      
      The other files in this directory: pid, act_mask, start_lba and end_lba
      can be used with the same meaning as with the IOCTL interface.
      
      Using the sysfs interface will only setup the request_queue->blk_trace
      fields, tracing will only take place when the "blk" tracer is selected
      via the ftrace interface, as in the following example:
      
      To see the trace, one can use the /d/tracing/trace file or the
      /d/tracign/trace_pipe file, with semantics defined in the ftrace
      documentation in Documentation/ftrace.txt.
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3046.491224:   8,1    A WBS 6367 + 8 <- (8,1) 6304
             kjournald-305   [000]  3046.491227:   8,1    Q   R 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491236:   8,1    G  RB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491239:   8,1    P  NS [kjournald]
             kjournald-305   [000]  3046.491242:   8,1    I RBS 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491251:   8,1    D  WB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491610:   8,1    U  WS [kjournald] 1
                <idle>-0     [000]  3046.511914:   8,1    C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      The default line context (prefix) format is the one described in the ftrace
      documentation, with the blktrace specific bits using its existing format,
      described in blkparse(8).
      
      If one wants to have the classic blktrace formatting, this is possible by
      using:
      
      [root@f10-1 ~]# echo blk_classic > /t/trace_options
      [root@f10-1 ~]# cat /t/trace
        8,1    0  3046.491224   305  A WBS 6367 + 8 <- (8,1) 6304
        8,1    0  3046.491227   305  Q   R 6367 + 8 [kjournald]
        8,1    0  3046.491236   305  G  RB 6367 + 8 [kjournald]
        8,1    0  3046.491239   305  P  NS [kjournald]
        8,1    0  3046.491242   305  I RBS 6367 + 8 [kjournald]
        8,1    0  3046.491251   305  D  WB 6367 + 8 [kjournald]
        8,1    0  3046.491610   305  U  WS [kjournald] 1
        8,1    0  3046.511914     0  C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      Using the ftrace standard format allows more flexibility, such
      as the ability of asking for backtraces via trace_options:
      
      [root@f10-1 ~]# echo noblk_classic > /t/trace_options
      [root@f10-1 ~]# echo stacktrace > /t/trace_options
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3318.826779:   8,1    A WBS 6375 + 8 <- (8,1) 6312
             kjournald-305   [000]  3318.826782:
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
       <= child_rip
             kjournald-305   [000]  3318.826836:   8,1    Q   R 6375 + 8 [kjournald]
             kjournald-305   [000]  3318.826837:
       <= generic_make_request
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
      
      Please read the ftrace documentation to use aditional, standardized
      tracing filters such as /d/tracing/trace_cpumask, etc.
      
      See also /d/tracing/trace_mark to add comments in the trace stream,
      that is equivalent to the /d/block/sdaN/msg interface.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c71a8961
    • A
      ftrace: add ftrace_vprintk · 9011262a
      Arnaldo Carvalho de Melo 提交于
      Impact: new helper function
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9011262a
    • R
      kmemtrace: fix printk format warnings · cc2f6d90
      Randy Dunlap 提交于
      Fix kmemtrace printk warnings:
      
        kernel/trace/kmemtrace.c:142: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
        kernel/trace/kmemtrace.c:147: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc2f6d90
  12. 23 1月, 2009 3 次提交
    • F
      tracing/function-graph-tracer: various fixes and features · 9005f3eb
      Frederic Weisbecker 提交于
      This patch brings various bugfixes:
      
      - Drop the first irrelevant task switch on the very beginning of a trace.
      
      - Drop the OVERHEAD word from the headers, the DURATION word is sufficient
        and will not overlap other columns.
      
      - Make the headers fit well their respective columns whatever the
        selected options.
      
      Ie, default options:
      
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
      
        1)   0.646 us    |                    }
        1)               |                    mem_cgroup_del_lru_list() {
        1)   0.624 us    |                      lookup_page_cgroup();
        1)   1.970 us    |                    }
      
       echo funcgraph-proc > trace_options
      
       # tracer: function_graph
       #
       # CPU  TASK/PID        DURATION                  FUNCTION CALLS
       # |    |    |           |   |                     |   |   |   |
      
        0)   bash-2937    |   0.895 us    |                }
        0)   bash-2937    |   0.888 us    |                __rcu_read_unlock();
        0)   bash-2937    |   0.864 us    |                conv_uni_to_pc();
        0)   bash-2937    |   1.015 us    |                __rcu_read_lock();
      
       echo nofuncgraph-cpu > trace_options
       echo nofuncgraph-proc > trace_options
      
       # tracer: function_graph
       #
       #   DURATION                  FUNCTION CALLS
       #    |   |                     |   |   |   |
      
         3.752 us    |                  native_pud_val();
         0.616 us    |                  native_pud_val();
         0.624 us    |                  native_pmd_val();
      
      About features, one can now disable the duration (this will hide the
      overhead too for convenient reasons and because on  doesn't need
      overhead if it hasn't the duration):
      
       echo nofuncgraph-duration > trace_options
      
       # tracer: function_graph
       #
       #                FUNCTION CALLS
       #                |   |   |   |
      
                 cap_vm_enough_memory() {
                   __vm_enough_memory() {
                     vm_acct_memory();
                   }
                 }
               }
      
      And at last, an option to print the absolute time:
      
       //Restart from default options
       echo funcgraph-abstime > trace_options
      
       # tracer: function_graph
       #
       #      TIME       CPU  DURATION                  FUNCTION CALLS
       #       |         |     |   |                     |   |   |   |
      
         261.339774 |   1) + 42.823 us   |    }
         261.339775 |   1)   1.045 us    |    _spin_lock_irq();
         261.339777 |   1)   0.940 us    |    _spin_lock_irqsave();
         261.339778 |   1)   0.752 us    |    _spin_unlock_irqrestore();
         261.339780 |   1)   0.857 us    |    _spin_unlock_irq();
         261.339782 |   1)               |    flush_to_ldisc() {
         261.339783 |   1)               |      tty_ldisc_ref() {
         261.339783 |   1)               |        tty_ldisc_try() {
         261.339784 |   1)   1.075 us    |          _spin_lock_irqsave();
         261.339786 |   1)   0.842 us    |          _spin_unlock_irqrestore();
         261.339788 |   1)   4.211 us    |        }
         261.339788 |   1)   5.662 us    |      }
      
      The format is seconds.usecs.
      
      I guess no one needs the nanosec precision here, the main goal is to have
      an overview about the general timings of events, and to see the place when
      the trace switches from one cpu to another.
      
      ie:
      
         274.874760 |   1)   0.676 us    |      _spin_unlock();
         274.874762 |   1)   0.609 us    |      native_load_sp0();
         274.874763 |   1)   0.602 us    |      native_load_tls();
         274.878739 |   0)   0.722 us    |                  }
         274.878740 |   0)   0.714 us    |                  native_pmd_val();
         274.878741 |   0)   0.730 us    |                  native_pmd_val();
      
      Here there is a 4000 usecs difference when we switch the cpu.
      
      Changes in V2:
      
      - Completely fix the first pointless task switch.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9005f3eb
    • S
      trace: fix logic to start/stop counting · b06a8301
      Steven Rostedt 提交于
      The logic in the tracing_start/stop code prevents the WARN_ON
      from ever detecting if a start/stop pair was mismatched.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b06a8301
    • S
      trace: remove internal irqsoff disabling for trace output · 94523e81
      Steven Rostedt 提交于
      Impact: cleanup of duplicate features
      
      The trace output disables the ring buffer and prevents tracing to
      occur. The code in irqsoff to do the same thing is no longer needed.
      This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94523e81
  13. 22 1月, 2009 3 次提交