1. 12 11月, 2008 7 次提交
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
    • S
      ring-buffer: fix deadlock from reader_lock in read_start · 642edba5
      Steven Rostedt 提交于
      Impact: deadlock fix in ring_buffer_read_start
      
      The ring_buffer_iter_reset was called from ring_buffer_read_start
      where both grabbed the reader_lock.
      
      This patch separates out the internals of ring_buffer_iter_reset
      to its own function so that both APIs may grab the reader_lock.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      642edba5
    • F
      tracing/fastboot: Use the ring-buffer timestamp for initcall entries · 74239072
      Frederic Weisbecker 提交于
      Impact: Split the boot tracer entries in two parts: call and return
      
      Now that we are using the sched tracer from the boot tracer, we want
      to use the same timestamp than the ring-buffer to have consistent time
      captures between sched events and initcall events.
      
      So we get rid of the old time capture by the boot tracer and split the
      initcall events in two parts: call and return. This way we have the
      ring buffer timestamp of both.
      
      An example trace:
      
      [   27.904149584] calling  net_ns_init+0x0/0x1c0 @ 1
      [   27.904429624] initcall net_ns_init+0x0/0x1c0 returned 0 after 0 msecs
      [   27.904575926] calling  reboot_init+0x0/0x20 @ 1
      [   27.904655399] initcall reboot_init+0x0/0x20 returned 0 after 0 msecs
      [   27.904800228] calling  sysctl_init+0x0/0x30 @ 1
      [   27.905142914] initcall sysctl_init+0x0/0x30 returned 0 after 0 msecs
      [   27.905287211] calling  ksysfs_init+0x0/0xb0 @ 1
       ##### CPU 0 buffer started ####
                  init-1     [000]    27.905395:      1:120:R   + [001]    11:115:S
       ##### CPU 1 buffer started ####
                <idle>-0     [001]    27.905425:      0:140:R ==> [001]    11:115:R
                  init-1     [000]    27.905426:      1:120:D ==> [000]     0:140:R
                <idle>-0     [000]    27.905431:      0:140:R   + [000]     4:115:S
                <idle>-0     [000]    27.905451:      0:140:R ==> [000]     4:115:R
           ksoftirqd/0-4     [000]    27.905456:      4:115:S ==> [000]     0:140:R
                 udevd-11    [001]    27.905458:     11:115:R   + [001]    14:115:R
                <idle>-0     [000]    27.905459:      0:140:R   + [000]     4:115:S
                <idle>-0     [000]    27.905462:      0:140:R ==> [000]     4:115:R
                 udevd-11    [001]    27.905462:     11:115:R ==> [001]    14:115:R
           ksoftirqd/0-4     [000]    27.905467:      4:115:S ==> [000]     0:140:R
                <idle>-0     [000]    27.905470:      0:140:R   + [000]     4:115:S
                <idle>-0     [000]    27.905473:      0:140:R ==> [000]     4:115:R
           ksoftirqd/0-4     [000]    27.905476:      4:115:S ==> [000]     0:140:R
                <idle>-0     [000]    27.905479:      0:140:R   + [000]     4:115:S
                <idle>-0     [000]    27.905482:      0:140:R ==> [000]     4:115:R
           ksoftirqd/0-4     [000]    27.905486:      4:115:S ==> [000]     0:140:R
                 udevd-14    [001]    27.905499:     14:120:X ==> [001]    11:115:R
                 udevd-11    [001]    27.905506:     11:115:R   + [000]     1:120:D
                <idle>-0     [000]    27.905515:      0:140:R ==> [000]     1:120:R
                 udevd-11    [001]    27.905517:     11:115:S ==> [001]     0:140:R
      [   27.905557107] initcall ksysfs_init+0x0/0xb0 returned 0 after 3906 msecs
      [   27.905705736] calling  init_jiffies_clocksource+0x0/0x10 @ 1
      [   27.905779239] initcall init_jiffies_clocksource+0x0/0x10 returned 0 after 0 msecs
      [   27.906769814] calling  pm_init+0x0/0x30 @ 1
      [   27.906853627] initcall pm_init+0x0/0x30 returned 0 after 0 msecs
      [   27.906997803] calling  pm_disk_init+0x0/0x20 @ 1
      [   27.907076946] initcall pm_disk_init+0x0/0x20 returned 0 after 0 msecs
      [   27.907222556] calling  swsusp_header_init+0x0/0x30 @ 1
      [   27.907294325] initcall swsusp_header_init+0x0/0x30 returned 0 after 0 msecs
      [   27.907439620] calling  stop_machine_init+0x0/0x50 @ 1
                  init-1     [000]    27.907485:      1:120:R   + [000]     2:115:S
                  init-1     [000]    27.907490:      1:120:D ==> [000]     2:115:R
              kthreadd-2     [000]    27.907507:      2:115:R   + [001]    15:115:R
                <idle>-0     [001]    27.907517:      0:140:R ==> [001]    15:115:R
              kthreadd-2     [000]    27.907517:      2:115:D ==> [000]     0:140:R
                <idle>-0     [000]    27.907521:      0:140:R   + [000]     4:115:S
                <idle>-0     [000]    27.907524:      0:140:R ==> [000]     4:115:R
                 udevd-15    [001]    27.907527:     15:115:D   + [000]     2:115:D
           ksoftirqd/0-4     [000]    27.907537:      4:115:S ==> [000]     2:115:R
                 udevd-15    [001]    27.907537:     15:115:D ==> [001]     0:140:R
              kthreadd-2     [000]    27.907546:      2:115:R   + [000]     1:120:D
              kthreadd-2     [000]    27.907550:      2:115:S ==> [000]     1:120:R
                  init-1     [000]    27.907584:      1:120:R   + [000]    15:  0:D
                  init-1     [000]    27.907589:      1:120:R   + [000]     2:115:S
                  init-1     [000]    27.907593:      1:120:D ==> [000]    15:  0:R
                 udevd-15    [000]    27.907601:     15:  0:S ==> [000]     2:115:R
       ##### CPU 0 buffer started ####
              kthreadd-2     [000]    27.907616:      2:115:R   + [001]    16:115:R
       ##### CPU 1 buffer started ####
                <idle>-0     [001]    27.907620:      0:140:R ==> [001]    16:115:R
              kthreadd-2     [000]    27.907621:      2:115:D ==> [000]     0:140:R
                 udevd-16    [001]    27.907625:     16:115:D   + [000]     2:115:D
                <idle>-0     [000]    27.907628:      0:140:R   + [000]     4:115:S
                 udevd-16    [001]    27.907629:     16:115:D ==> [001]     0:140:R
                <idle>-0     [000]    27.907631:      0:140:R ==> [000]     4:115:R
           ksoftirqd/0-4     [000]    27.907636:      4:115:S ==> [000]     2:115:R
              kthreadd-2     [000]    27.907644:      2:115:R   + [000]     1:120:D
              kthreadd-2     [000]    27.907647:      2:115:S ==> [000]     1:120:R
                  init-1     [000]    27.907657:      1:120:R   + [001]    16:  0:D
                <idle>-0     [001]    27.907666:      0:140:R ==> [001]    16:  0:R
      [   27.907703862] initcall stop_machine_init+0x0/0x50 returned 0 after 0 msecs
      [   27.907850704] calling  filelock_init+0x0/0x30 @ 1
      [   27.907926573] initcall filelock_init+0x0/0x30 returned 0 after 0 msecs
      [   27.908071327] calling  init_script_binfmt+0x0/0x10 @ 1
      [   27.908165195] initcall init_script_binfmt+0x0/0x10 returned 0 after 0 msecs
      [   27.908309461] calling  init_elf_binfmt+0x0/0x10 @ 1
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74239072
    • F
      tracing/fastboot: move boot tracer structs and funcs into their own header. · 3f5ec136
      Frederic Weisbecker 提交于
      Impact: Cleanups on the boot tracer and ftrace
      
      This patch bring some cleanups about the boot tracer headers. The
      functions and structures of this tracer have nothing related to ftrace
      and should have so their own header file.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f5ec136
    • S
      ring-buffer: clean up warn ons · 3e89c7bb
      Steven Rostedt 提交于
      Impact: Restructure WARN_ONs in ring_buffer.c
      
      The current WARN_ON macros in ring_buffer.c are quite ugly.
      
      This patch cleans them up and uses a single RB_WARN_ON that returns
      the value of the condition. This allows the caller to abort the
      function if the condition is true.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e89c7bb
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
    • S
      ring-buffer: add reader lock · f83c9d0f
      Steven Rostedt 提交于
      Impact: serialize reader accesses to individual CPU ring buffers
      
      The code in the ring buffer expects only one reader at a time, but currently
      it puts that requirement on the caller. This is not strong enough, and this
      patch adds a "reader_lock" that serializes the access to the reader API
      of the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f83c9d0f
  2. 11 11月, 2008 5 次提交
    • F
      tracing: add a tracer to catch execution time of kernel functions · 15e6cb36
      Frederic Weisbecker 提交于
      Impact: add new tracing plugin which can trace full (entry+exit) function calls
      
      This tracer uses the low level function return ftrace plugin to
      measure the execution time of the kernel functions.
      
      The first field is the caller of the function, the second is the
      measured function, and the last one is the execution time in
      nanoseconds.
      
      - v3:
      
      - HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
        should enable it.
      - ftrace_return_stub becomes ftrace_stub.
      - CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
      - Return traces printing can be used for other tracers on trace.c
      - Adapt to the new tracing API (no more ctrl_update callback)
      - Correct the check of "disabled" during insertion.
      - Minor changes...
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15e6cb36
    • S
      ring-buffer: replace most bug ons with warn on and disable buffer · f536aafc
      Steven Rostedt 提交于
      This patch replaces most of the BUG_ONs in the ring_buffer code with
      RB_WARN_ON variants. It adds some more variants as needed for the
      replacement. This lets the buffer die nicely and still warn the user.
      
      One BUG_ON remains in the code, and that is because it detects a
      bad pointer passed in by the calling function, and not a bug by
      the ring buffer code itself.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f536aafc
    • S
      ftrace: prevent ftrace_special from recursion · 5aa1ba6a
      Steven Rostedt 提交于
      Impact: stop ftrace_special from recursion
      
      The ftrace_special is used to help debug areas of the kernel.
      Because of this, if it is put in certain locations, the fact that
      it allows recursion can become a problem if the kernel developer
      using does not realize that.
      
      This patch changes ftrace_special to not allow recursion into itself
      to make it more robust.
      
      It also changes from preempt disable interrupts disable to prevent
      any loss of trace entries.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5aa1ba6a
    • S
      ring-buffer: prevent infinite looping on time stamping · 4143c5cb
      Steven Rostedt 提交于
      Impact: removal of unnecessary looping
      
      The lockless part of the ring buffer allows for reentry into the code
      from interrupts. A timestamp is taken, a test is preformed and if it
      detects that an interrupt occurred that did tracing, it tries again.
      
      The problem arises if the timestamp code itself causes a trace.
      The detection will detect this and loop again. The difference between
      this and an interrupt doing tracing, is that this will fail every time,
      and cause an infinite loop.
      
      Currently, we test if the loop happens 1000 times, and if so, it will
      produce a warning and disable the ring buffer.
      
      The problem with this approach is that it makes it difficult to perform
      some types of tracing (tracing the timestamp code itself).
      
      Each trace entry has a delta timestamp from the previous entry.
      If a trace entry is reserved but and interrupt occurs and traces before
      the previous entry is commited, the delta timestamp for that entry will
      be zero. This actually makes sense in terms of tracing, because the
      interrupt entry happened before the preempted entry was commited, so
      one may consider the two happening at the same time. The order is
      still preserved in the buffer.
      
      With this idea, instead of trying to get a new timestamp if an interrupt
      made it in between the timestamp and the test, the entry could simply
      make the delta zero and continue. This will prevent interrupts or
      tracers in the timer code from causing the above loop.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4143c5cb
    • S
      ftrace: disable tracing on resize · bf5e6519
      Steven Rostedt 提交于
      Impact: fix for bug on resize
      
      This patch addresses the bug found here:
      
       http://bugzilla.kernel.org/show_bug.cgi?id=11996
      
      When ftrace converted to the new unified trace buffer, the resizing of
      the buffer was not protected as much as it was originally. If tracing
      is performed while the resize occurs, then the buffer can be corrupted.
      
      This patch disables all ftrace buffer modifications before a resize
      takes place.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      bf5e6519
  3. 08 11月, 2008 9 次提交
    • S
      ftrace: display start of CPU buffer in trace output · a309720c
      Steven Rostedt 提交于
      Impact: change in trace output
      
      Because the trace buffers are per cpu ring buffers, the start of
      the trace can be confusing. If one CPU is very active at the
      end of the trace, its history will not go as far back as the
      other CPU traces.  This means that output for a particular CPU
      may not appear for the first part of a trace.
      
      To help annotate what is happening, and to prevent any more
      confusion, this patch adds a line that annotates the start of
      a CPU buffer output.
      
      For example:
      
             automount-3495  [001]   184.596443: dnotify_parent <-vfs_write
      [...]
             automount-3495  [001]   184.596449: dput <-path_put
             automount-3496  [002]   184.596450: down_read_trylock <-do_page_fault
      [...]
                 sshd-3497  [001]   184.597069: up_read <-do_page_fault
                <idle>-0     [000]   184.597074: __exit_idle <-exit_idle
      [...]
             automount-3496  [002]   184.597257: filemap_fault <-__do_fault
                <idle>-0     [003]   184.597261: exit_idle <-smp_apic_timer_interrupt
      
      Note, parsers of a trace output should always ignore any lines that
      start with a '#'.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a309720c
    • S
      ftrace: force pass of preemptoff selftest · 769c48eb
      Steven Rostedt 提交于
      Impact: preemptoff not tested in selftest
      
      Due to the BKL not being preemptable anymore, the selftest of the
      preemptoff code can not be tested. It requires that it is called
      with preemption enabled, but since the BKL is held, that is no
      longer the case.
      
      This patch simply skips those tests if it detects that the context
      is not preemptable. The following will now show up in the tests:
      
      Testing tracer preemptoff: can not test ... force PASSED
      Testing tracer preemptirqsoff: can not test ... force PASSED
      
      When the BKL is removed, or it becomes preemptable once again, then
      the tests will be performed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      769c48eb
    • S
      ftrace: remove trace array ctrl · c76f0694
      Steven Rostedt 提交于
      Impact: remove obsolete variable in trace_array structure
      
      With the new start / stop method of ftrace, the ctrl variable
      in the trace_array structure is now obsolete. Remove it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c76f0694
    • S
      ftrace: remove ctrl_update method · bbf5b1a0
      Steven Rostedt 提交于
      Impact: Remove the ctrl_update tracer method
      
      With the new quick start/stop method of tracing, the ctrl_update
      method is out of date.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbf5b1a0
    • S
      ftrace: enable trace_printk by default · 49833fc2
      Steven Rostedt 提交于
      Impact: have the ftrace_printk enabled on startup
      
      It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl"
      after adding ftrace_printk in the kernel.
      
      Currently the trace_printk is set to off by default. ftrace_printk
      should only be in open kernel code when used for debugging, and thus
      it should be enabled by default.
      
      It may also be used to record data within a tracer, but those ftrace_printks
      should be within wrappers that are either enabled by trace_points or
      have a variable protecting the code path from being entered when the
      tracer is disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49833fc2
    • S
      ftrace: irqsoff tracer incorrect reset · 45193170
      Steven Rostedt 提交于
      Impact: fix to irqsoff tracer output
      
      In converting to the new start / stop ftrace handling, the irqsoff
      tracer start called the irqsoff reset function. irqsoff tracer is
      not the same as the other traces, and it resets the buffers while
      searching for the longest latency.
      
      The reset that the irqsoff stop method calls disables the function
      tracing. That means that, by starting the tracer, the function
      tracer is disabled incorrectly.
      
      This patch simply removes the call to reset which keeps the function
      tracing enabled. Reset is not needed for the irqsoff stop method.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45193170
    • S
      ftrace: fix sched_switch API · e168e051
      Steven Rostedt 提交于
      Impact: fix for sched_switch that broke dynamic ftrace startup
      
      The commit: tracing/fastboot: use sched switch tracer from boot tracer
      broke the API of the sched_switch trace. The use of the
      tracing_start/stop_cmdline record is for only recording the cmdline,
      NOT recording the schedule switches themselves.
      
      Seeing that the boot tracer broke the API to do something that it
      wanted, this patch adds a new interface for the API while
      puting back the original interface of the old API.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e168e051
    • S
      ftrace: fix boot trace sched startup · 75f5c47d
      Steven Rostedt 提交于
      Impact: boot tracer startup modified
      
      The boot tracer calls into some of the schedule tracing private functions
      that should not be exported. This patch cleans it up, and makes
      way for further changes in the ftrace infrastructure.
      
      This patch adds a api to assign a tracer array to the schedule
      context switch tracer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      75f5c47d
    • S
      ftrace: fix set_ftrace_filter · 0183fb1c
      Steven Rostedt 提交于
      Impact: fix of output of set_ftrace_filter
      
      Commit ftrace: do not show freed records in available_filter_functions
      
      Removed a bit too much from the set_ftrace_filter code, where we now see
      all functions in the set_ftrace_filter file even when we set a filter.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0183fb1c
  4. 06 11月, 2008 4 次提交
    • S
      ring-buffer: convert to raw spinlocks · 3e03fb7f
      Steven Rostedt 提交于
      Impact: no lockdep debugging of ring buffer
      
      The problem with running lockdep on the ring buffer is that the
      ring buffer is the core infrastructure of ftrace. What happens is
      that the tracer will start tracing the lockdep code while lockdep
      is testing the ring buffers locks.  This can cause lockdep to
      fail due to testing cases that have not fully finished their
      locking transition.
      
      This patch converts the spin locks used by the ring buffer back
      into raw spin locks which lockdep does not check.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e03fb7f
    • S
      ftrace: restructure tracing start/stop infrastructure · 9036990d
      Steven Rostedt 提交于
      Impact: change where tracing is started up and stopped
      
      Currently, when a new tracer is selected via echo'ing a tracer name into
      the current_tracer file, the startup is only done if tracing_enabled is
      set to one. If tracing_enabled is changed to zero (by echo'ing 0 into
      the tracing_enabled file) a full shutdown is performed.
      
      The full startup and shutdown of a tracer can be expensive and the
      user can lose out traces when echo'ing in 0 to the tracing_enabled file,
      because the process takes too long. There can also be places that
      the user would like to start and stop the tracer several times and
      doing the full startup and shutdown of a tracer might be too expensive.
      
      This patch performs the full startup and shutdown when a tracer is
      selected. It also adds a way to do a quick start or stop of a tracer.
      The quick version is just a flag that prevents the tracing from
      taking place, but the overhead of the code is still there.
      
      For example, the startup of a tracer may enable tracepoints, or enable
      the function tracer.  The stop and start will just set a flag to
      have the tracer ignore the calls when the tracepoint or function trace
      is called.  The overhead of the tracer may still be present when
      the tracer is stopped, but no tracing will occur. Setting the tracer
      to the 'nop' tracer (or any other tracer) will perform the shutdown
      of the tracer which will disable the tracepoint or disable the
      function tracer.
      
      The tracing_enabled file will simply start or stop tracing.
      
      This change is all internal. The end result for the user should be the same
      as before. If tracing_enabled is not set, no trace will happen.
      If tracing_enabled is set, then the trace will happen. The tracing_enabled
      variable is static between tracers. Enabling  tracing_enabled and
      going to another tracer will keep tracing_enabled enabled. Same
      is true with disabling tracing_enabled.
      
      This patch will now provide a fast start/stop method to the users
      for enabling or disabling tracing.
      
      Note: There were two methods to the struct tracer that were never
       used: The methods start and stop. These were to be used as a hook
       to the reading of the trace output, but ended up not being
       necessary. These two methods are now used to enable the start
       and stop of each tracer, in case the tracer needs to do more than
       just not write into the buffer. For example, the irqsoff tracer
       must stop recording max latencies when tracing is stopped.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9036990d
    • S
      ftrace: soft tracing stop and start · 0f048701
      Steven Rostedt 提交于
      Impact: add way to quickly start stop tracing from the kernel
      
      This patch adds a soft stop and start to the trace. This simply
      disables function tracing via the ftrace_disabled flag, and
      disables the trace buffers to prevent recording. The tracing
      code may still be executed, but the trace will not be recorded.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f048701
    • S
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt 提交于
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a7ecf4
  5. 05 11月, 2008 7 次提交
  6. 04 11月, 2008 3 次提交
    • S
      ftrace: function tracer with irqs disabled · b2a866f9
      Steven Rostedt 提交于
      Impact: disable interrupts during trace entry creation (as opposed to preempt)
      
      To help with performance, I set the ftracer to not disable interrupts,
      and only to disable preemption. If an interrupt occurred, it would not
      be traced, because the function tracer protects itself from recursion.
      This may be faster, but the trace output might miss some traces.
      
      This patch makes the fuction trace disable interrupts, but it also
      adds a runtime feature to disable preemption instead. It does this by
      having two different tracer functions. When the function tracer is
      enabled, it will check to see which version is requested (irqs disabled
      or preemption disabled). Then it will use the corresponding function
      as the tracer.
      
      Irq disabling is the default behaviour, but if the user wants better
      performance, with the chance of missing traces, then they can choose
      the preempt disabled version.
      
      Running hackbench 3 times with the irqs disabled and 3 times with
      the preempt disabled function tracer yielded:
      
      tracing type       times            entries recorded
      ------------      --------          ----------------
      irq disabled      43.393            166433066
                        43.282            166172618
                        43.298            166256704
      
      preempt disabled  38.969            159871710
                        38.943            159972935
                        39.325            161056510
      
      Average:
      
         irqs disabled:  43.324           166287462
      preempt disabled:  39.079           160300385
      
       preempt is 10.8 percent faster than irqs disabled.
      
      I wrote a patch to count function trace recursion and reran hackbench.
      
      With irq disabled: 1,150 times the function tracer did not trace due to
        recursion.
      with preempt disabled: 5,117,718 times.
      
      The thousand times with irq disabled could be due to NMIs, or simply a case
      where it called a function that was not protected by notrace.
      
      But we also see that a large amount of the trace is lost with the
      preempt version.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2a866f9
    • S
      ftrace: insert in the ftrace_preempt_disable()/enable() functions · 182e9f5f
      Steven Rostedt 提交于
      Impact: use new, consolidated APIs in ftrace plugins
      
      This patch replaces the schedule safe preempt disable code with the
      ftrace_preempt_disable() and ftrace_preempt_enable() safe functions.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      182e9f5f
    • S
      ftrace: introduce ftrace_preempt_disable()/enable() · 8f0a056f
      Steven Rostedt 提交于
      Impact: add new ftrace-plugin internal APIs
      
      Parts of the tracer needs to be careful about schedule recursion.
      If the NEED_RESCHED flag is set, a preempt_enable will call schedule.
      Inside the schedule function, the NEED_RESCHED flag is cleared.
      
      The problem arises when a trace happens in the schedule function but before
      NEED_RESCHED is cleared. The race is as follows:
      
      schedule()
        >> tracer called
      
          trace_function()
             preempt_disable()
             [ record trace ]
             preempt_enable()  <<- here's the issue.
      
               [check NEED_RESCHED]
                schedule()
                [ Repeat the above, over and over again ]
      
      The naive approach is simply to use preempt_enable_no_schedule instead.
      The problem with that approach is that, although we solve the schedule
      recursion issue, we now might lose a preemption check when not in the
      schedule function.
      
        trace_function()
          preempt_disable()
          [ record trace ]
          [Interrupt comes in and sets NEED_RESCHED]
          preempt_enable_no_resched()
          [continue without scheduling]
      
      The way ftrace handles this problem is with the following approach:
      
      	int resched;
      
      	resched = need_resched();
      	preempt_disable_notrace();
      	[record trace]
      	if (resched)
      		preempt_enable_no_sched_notrace();
      	else
      		preempt_enable_notrace();
      
      This may seem like the opposite of what we want. If resched is set
      then we call the "no_sched" version??  The reason we do this is because
      if NEED_RESCHED is set before we disable preemption, there's two reasons
      for that:
      
        1) we are in an atomic code path
        2) we are already on our way to the schedule function, and maybe even
           in the schedule function, but have yet to clear the flag.
      
      Both the above cases we do not want to schedule.
      
      This solution has already been implemented within the ftrace infrastructure.
      But the problem is that it has been implemented several times. This patch
      encapsulates this code to two nice functions.
      
        resched = ftrace_preempt_disable();
        [ record trace]
        ftrace_preempt_enable(resched);
      
      This way the tracers do not need to worry about getting it right.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f0a056f
  7. 03 11月, 2008 4 次提交
  8. 31 10月, 2008 1 次提交