1. 09 2月, 2009 1 次提交
  2. 08 2月, 2009 1 次提交
    • S
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt 提交于
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      78d904b4
  3. 03 2月, 2009 1 次提交
  4. 14 1月, 2009 1 次提交
    • F
      tracing: add a new workqueue tracer · e1d8aa9f
      Frederic Weisbecker 提交于
      Impact: new tracer
      
      The workqueue tracer provides some statistical informations
      about each cpu workqueue thread such as the number of the
      works inserted and executed since their creation. It can help
      to evaluate the amount of work each of them have to perform.
      For example it can help a developer to decide whether he should
      choose a per cpu workqueue instead of a singlethreaded one.
      
      It only traces statistical informations for now but it will probably later
      provide event tracing too.
      
      Such a tracer could help too, and be improved, to help rt priority sorted
      workqueue development.
      
      To have a snapshot of the workqueues state at any time, just do
      
      cat /debugfs/tracing/trace_stat/workqueues
      
      Ie:
      
        1    125        125       reiserfs/1
        1      0          0       scsi_tgtd/1
        1      0          0       aio/1
        1      0          0       ata/1
        1    114        114       kblockd/1
        1      0          0       kintegrityd/1
        1   2147       2147       events/1
      
        0      0          0       kpsmoused
        0    105        105       reiserfs/0
        0      0          0       scsi_tgtd/0
        0      0          0       aio/0
        0      0          0       ata_aux
        0      0          0       ata/0
        0      0          0       cqueue
        0      0          0       kacpi_notify
        0      0          0       kacpid
        0    149        149       kblockd/0
        0      0          0       kintegrityd/0
        0   1000       1000       khelper
        0   2270       2270       events/0
      
      Changes in V2:
      
      _ Drop the static array based on NR_CPU and dynamically allocate the stat array
        with num_possible_cpus() and other cpu mask facilities....
      _ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
        even the workqueue barriers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1d8aa9f
  5. 11 1月, 2009 1 次提交
    • P
      trace: mmiotrace to the tracer menu in Kconfig · fe6f90e5
      Pekka Paalanen 提交于
      Impact: cosmetic change in Kconfig menu layout
      
      This patch was originally suggested by Peter Zijlstra, but seems it
      was forgotten.
      
      CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
      directly under the Kernel hacking / debugging menu in the kernel
      configuration system. They were present only for x86 and x86_64.
      
      Other tracers that use the ftrace tracing framework are in their own
      sub-menu. This patch moves the mmiotrace configuration options there.
      Since the Kconfig file, where the tracer menu is, is not architecture
      specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
      x86/x86_64. CONFIG_MMIOTRACE now depends on it.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fe6f90e5
  6. 06 1月, 2009 1 次提交
  7. 30 12月, 2008 2 次提交
    • I
      tracing/kmemtrace: export kmemtrace_mark_alloc_node() / kmemtrace_mark_free() · 3fd4bc01
      Ingo Molnar 提交于
      Impact: build fix
      
      Also fix up Kconfig dependencies and include files.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3fd4bc01
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
  8. 18 12月, 2008 1 次提交
    • S
      trace: add a way to enable or disable the stack tracer · f38f1d2a
      Steven Rostedt 提交于
      Impact: enhancement to stack tracer
      
      The stack tracer currently is either on when configured in or
      off when it is not. It can not be disabled when it is configured on.
      (besides disabling the function tracer that it uses)
      
      This patch adds a way to enable or disable the stack tracer at
      run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
      has been added to enable it on bootup.
      
      A new sysctl has been added "kernel.stack_tracer_enabled" to let
      the user enable or disable the stack tracer at run time.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f38f1d2a
  9. 12 12月, 2008 1 次提交
  10. 03 12月, 2008 1 次提交
  11. 26 11月, 2008 3 次提交
  12. 23 11月, 2008 3 次提交
    • T
      tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT · 8d26487f
      Török Edwin 提交于
      Impact: cleanup
      
      User stack tracing is just implemented for x86, but it is not x86 specific.
      
      Introduce a generic config flag, that is currently enabled only for x86.
      When other arches implement it, they will have to
      SELECT USER_STACKTRACE_SUPPORT.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d26487f
    • S
      trace: profile all if conditionals · 2bcd521a
      Steven Rostedt 提交于
      Impact: feature to profile if statements
      
      This patch adds a branch profiler for all if () statements.
      The results will be found in:
      
        /debugfs/tracing/profile_branch
      
      For example:
      
         miss      hit    %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0        1 100 x86_64_start_reservations      head64.c             127
             0        1 100 copy_bootdata                  head64.c             69
             1        0   0 x86_64_start_kernel            head64.c             111
            32        0   0 set_intr_gate                  desc.h               319
             1        0   0 reserve_ebda_region            head.c               51
             1        0   0 reserve_ebda_region            head.c               47
             0        1 100 reserve_ebda_region            head.c               42
             0        0   X maxcpus                        main.c               165
      
      Miss means the branch was not taken. Hit means the branch was taken.
      The percent is the percentage the branch was taken.
      
      This adds a significant amount of overhead and should only be used
      by those analyzing their system.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bcd521a
    • S
      trace: consolidate unlikely and likely profiler · 45b79749
      Steven Rostedt 提交于
      Impact: clean up to make one profiler of like and unlikely tracer
      
      The likely and unlikely profiler prints out the file and line numbers
      of the annotated branches that it is profiling. It shows the number
      of times it was correct or incorrect in its guess. Having two
      different files or sections for that matter to tell us if it was a
      likely or unlikely is pretty pointless. We really only care if
      it was correct or not.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45b79749
  13. 16 11月, 2008 1 次提交
    • F
      tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e
      Frederic Weisbecker 提交于
      This patch adds the support for dynamic tracing on the function return tracer.
      The whole difference with normal dynamic function tracing is that we don't need
      to hook on a particular callback. The only pro that we want is to nop or set
      dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
      
      Some security checks ensure that we are not trying to launch dynamic tracing for
      return tracing while normal function tracing is already running.
      
      An example of trace with getnstimeofday set as a filter:
      
      ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e7d3737e
  14. 13 11月, 2008 1 次提交
  15. 12 11月, 2008 2 次提交
    • S
      tracing: likely/unlikely branch annotation tracer · 52f232cb
      Steven Rostedt 提交于
      Impact: new likely/unlikely branch tracer
      
      This patch adds a way to record the instances of the likely() and unlikely()
      branch condition annotations.
      
      When "unlikely" is set in /debugfs/tracing/iter_ctrl the unlikely conditions
      will be added to any of the ftrace tracers. The change takes effect when
      a new tracer is passed into the current_tracer file.
      
      For example:
      
       bash-3471  [003]   357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177
       bash-3471  [003]   357.014756: [correct] update_curr:sched_fair.c:489
       bash-3471  [003]   357.014758: [correct] calc_delta_fair:sched_fair.c:411
       bash-3471  [003]   357.014759: [correct] account_group_exec_runtime:sched_stats.h:356
       bash-3471  [003]   357.014761: [correct] update_curr:sched_fair.c:489
       bash-3471  [003]   357.014763: [INCORRECT] calc_delta_fair:sched_fair.c:411
       bash-3471  [003]   357.014765: [correct] calc_delta_mine:sched.c:1279
      
      Which shows the normal tracer heading, as well as whether the condition was
      correct "[correct]" or was mistaken "[INCORRECT]", followed by the function,
      file name and line number.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      52f232cb
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  16. 11 11月, 2008 1 次提交
    • F
      tracing: add a tracer to catch execution time of kernel functions · 15e6cb36
      Frederic Weisbecker 提交于
      Impact: add new tracing plugin which can trace full (entry+exit) function calls
      
      This tracer uses the low level function return ftrace plugin to
      measure the execution time of the kernel functions.
      
      The first field is the caller of the function, the second is the
      measured function, and the last one is the execution time in
      nanoseconds.
      
      - v3:
      
      - HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
        should enable it.
      - ftrace_return_stub becomes ftrace_stub.
      - CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
      - Return traces printing can be used for other tracers on trace.c
      - Adapt to the new tracing API (no more ctrl_update callback)
      - Correct the check of "disabled" during insertion.
      - Minor changes...
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15e6cb36
  17. 06 11月, 2008 1 次提交
    • S
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt 提交于
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a7ecf4
  18. 03 11月, 2008 1 次提交
  19. 30 10月, 2008 1 次提交
    • S
      ftrace: fix trace_nop config select · f3384b28
      Steven Rostedt 提交于
      Impact: build fix on non-function-tracing architectures
      
      The trace_nop is the tracer that is defined when no tracer is set in
      the ftrace infrastructure.
      
      The trace_nop was mistakenly selected by HAVE_FTRACE due to the confusion
      between ftrace infrastructure and the ftrace function tracer (which has
      been solved by renaming the function tracer).
      
      This patch changes the select to the approriate TRACING.
      
      This patch should fix compile errors on architectures that do not define
      the FUNCTION_TRACER.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3384b28
  20. 27 10月, 2008 1 次提交
    • F
      tracing/ftrace: make boot tracer select the sched_switch tracer · ea31e72d
      Frederic Weisbecker 提交于
      Impact: build fix
      
      If the boot tracer is selected but not the sched_switch,
      there will be a build failure:
      
       kernel/built-in.o: In function `boot_trace_init':
       trace_boot.c:(.text+0x5ee38): undefined reference to `sched_switch_trace'
       kernel/built-in.o: In function `disable_boot_trace':
       (.text+0x5eee1): undefined reference to `tracing_stop_cmdline_record'
       kernel/built-in.o: In function `enable_boot_trace':
       (.text+0x5ef11): undefined reference to `tracing_start_cmdline_record'
      
      This patch fixes it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea31e72d
  21. 22 10月, 2008 1 次提交
  22. 21 10月, 2008 2 次提交
  23. 14 10月, 2008 11 次提交
    • I
      tracing/fastboot: improve help text · 98d9c66a
      Ingo Molnar 提交于
      Improve the help text of the boot tracer.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      98d9c66a
    • I
      tracing/stacktrace: improve help text · 4519d9e5
      Ingo Molnar 提交于
      Improve the help text that is displayed for CONFIG_STACK_TRACER.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4519d9e5
    • S
      tracing: unified trace buffer · 7a8e76a3
      Steven Rostedt 提交于
      This is a unified tracing buffer that implements a ring buffer that
      hopefully everyone will eventually be able to use.
      
      The events recorded into the buffer have the following structure:
      
        struct ring_buffer_event {
      	u32 type:2, len:3, time_delta:27;
      	u32 array[];
        };
      
      The minimum size of an event is 8 bytes. All events are 4 byte
      aligned inside the buffer.
      
      There are 4 types (all internal use for the ring buffer, only
      the data type is exported to the interface users).
      
       RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
      	of a buffer page.
      
       RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
      	is greater than the 27 bit delta can hold. We add another
      	32 bits, and record that in its own event (8 byte size).
      
       RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
      	help keep the buffer timestamps in sync.
      
      RINGBUF_TYPE_DATA: The event actually holds user data.
      
      The "len" field is only three bits. Since the data must be
      4 byte aligned, this field is shifted left by 2, giving a
      max length of 28 bytes. If the data load is greater than 28
      bytes, the first array field holds the full length of the
      data load and the len field is set to zero.
      
      Example, data size of 7 bytes:
      
      	type = RINGBUF_TYPE_DATA
      	len = 2
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0..1]: <7 bytes of data> <1 byte empty>
      
      This event is saved in 12 bytes of the buffer.
      
      An event with 82 bytes of data:
      
      	type = RINGBUF_TYPE_DATA
      	len = 0
      	time_delta: <time-stamp> - <prev_event-time-stamp>
      	array[0]: 84 (Note the alignment)
      	array[1..14]: <82 bytes of data> <2 bytes empty>
      
      The above event is saved in 92 bytes (if my math is correct).
      82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
      
      Do not reference the above event struct directly. Use the following
      functions to gain access to the event table, since the
      ring_buffer_event structure may change in the future.
      
      ring_buffer_event_length(event): get the length of the event.
      	This is the size of the memory used to record this
      	event, and not the size of the data pay load.
      
      ring_buffer_time_delta(event): get the time delta of the event
      	This returns the delta time stamp since the last event.
      	Note: Even though this is in the header, there should
      		be no reason to access this directly, accept
      		for debugging.
      
      ring_buffer_event_data(event): get the data from the event
      	This is the function to use to get the actual data
      	from the event. Note, it is only a pointer to the
      	data inside the buffer. This data must be copied to
      	another location otherwise you risk it being written
      	over in the buffer.
      
      ring_buffer_lock: A way to lock the entire buffer.
      ring_buffer_unlock: unlock the buffer.
      
      ring_buffer_alloc: create a new ring buffer. Can choose between
      	overwrite or consumer/producer mode. Overwrite will
      	overwrite old data, where as consumer producer will
      	throw away new data if the consumer catches up with the
      	producer.  The consumer/producer is the default.
      
      ring_buffer_free: free the ring buffer.
      
      ring_buffer_resize: resize the buffer. Changes the size of each cpu
      	buffer. Note, it is up to the caller to provide that
      	the buffer is not being used while this is happening.
      	This requirement may go away but do not count on it.
      
      ring_buffer_lock_reserve: locks the ring buffer and allocates an
      	entry on the buffer to write to.
      ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
      	the buffer.
      
      ring_buffer_write: writes some data into the ring buffer.
      
      ring_buffer_peek: Look at a next item in the cpu buffer.
      ring_buffer_consume: get the next item in the cpu buffer and
      	consume it. That is, this function increments the head
      	pointer.
      
      ring_buffer_read_start: Start an iterator of a cpu buffer.
      	For now, this disables the cpu buffer, until you issue
      	a finish. This is just because we do not want the iterator
      	to be overwritten. This restriction may change in the future.
      	But note, this is used for static reading of a buffer which
      	is usually done "after" a trace. Live readings would want
      	to use the ring_buffer_consume above, which will not
      	disable the ring buffer.
      
      ring_buffer_read_finish: Finishes the read iterator and reenables
      	the ring buffer.
      
      ring_buffer_iter_peek: Look at the next item in the cpu iterator.
      ring_buffer_read: Read the iterator and increment it.
      ring_buffer_iter_reset: Reset the iterator to point to the beginning
      	of the cpu buffer.
      ring_buffer_iter_empty: Returns true if the iterator is at the end
      	of the cpu buffer.
      
      ring_buffer_size: returns the size in bytes of each cpu buffer.
      	Note, the real size is this times the number of CPUs.
      
      ring_buffer_reset_cpu: Sets the cpu buffer to empty
      ring_buffer_reset: sets all cpu buffers to empty
      
      ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
      	cpu buffer of another buffer. This is handy when you
      	want to take a snap shot of a running trace on just one
      	cpu. Having a backup buffer, to swap with facilitates this.
      	Ftrace max latencies use this.
      
      ring_buffer_empty: Returns true if the ring buffer is empty.
      ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
      
      ring_buffer_record_disable: disable all cpu buffers (read only)
      ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
      ring_buffer_record_enable: enable all cpu buffers.
      ring_buffer_record_enabl_cpu: enable a single cpu buffer.
      
      ring_buffer_entries: The number of entries in a ring buffer.
      ring_buffer_overruns: The number of entries removed due to writing wrap.
      
      ring_buffer_time_stamp: Get the time stamp used by the ring buffer
      ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
      	into nanosecs.
      
      I still need to implement the GTOD feature. But we need support from
      the cpu frequency infrastructure.  But this can be done at a later
      time without affecting the ring buffer interface.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a8e76a3
    • F
      ftrace/fastboot: disable tracers self-tests when boot tracer is selected · 3ce2b920
      Frédéric Weisbecker 提交于
      The tracing engine resets the ring buffer and the tracers touch it
      too during self-tests. These self-tests happen during tracers registering
      and work against boot tracing which is logging initcalls.
      
      We have to disable tracing self-tests if the boot-tracer is selected.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ce2b920
    • F
      tracing/ftrace: give an entry on the config for boot tracer · 1f5c2abb
      Frédéric Weisbecker 提交于
      Bring the entry to choose the boot tracer on the kernel config.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f5c2abb
    • F
      tracing/ftrace: tracing engine depends on Nop Tracer · 2a3a4f66
      Frédéric Weisbecker 提交于
      Now that the nop tracer is used as the default tracer by
      replacing the "none" tracer, tracing engine depends on it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2a3a4f66
    • S
      ftrace: add nop tracer · fb1b6d8b
      Steven Noonan 提交于
      A no-op tracer which can serve two purposes:
      
       1. A template for development of a new tracer.
       2. A convenient way to see ftrace_printk() calls without
          an irrelevant trace making the output messy.
      
      [ mingo@elte.hu: resolved conflicts ]
      Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fb1b6d8b
    • I
      ftrace: make it depend on DEBUG_KERNEL · d3ee6d99
      Ingo Molnar 提交于
      make most of the tracers depend on DEBUG_KERNEL - that's their intended
      purpose. (most distributions have DEBUG_KERNEL enabled anyway so this is
      not a practical limitation - but it simplifies the tracing menu in the
      normal case)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3ee6d99
    • I
      stack tracer: depends on DEBUG_KERNEL · 2ff01c6a
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ff01c6a
    • S
      ftrace: add stack tracer · e5a81b62
      Steven Rostedt 提交于
      This is another tracer using the ftrace infrastructure, that examines
      at each function call the size of the stack. If the stack use is greater
      than the previous max it is recorded.
      
      You can always see (and set) the max stack size seen. By setting it
      to zero will start the recording again. The backtrace is also available.
      
      For example:
      
      # cat /debug/tracing/stack_max_size
      1856
      
      # cat /debug/tracing/stack_trace
      [<c027764d>] stack_trace_call+0x8f/0x101
      [<c021b966>] ftrace_call+0x5/0x8
      [<c02553cc>] clocksource_get_next+0x12/0x48
      [<c02542a5>] update_wall_time+0x538/0x6d1
      [<c0245913>] do_timer+0x23/0xb0
      [<c0257657>] tick_do_update_jiffies64+0xd9/0xf1
      [<c02576b9>] tick_sched_timer+0x4a/0xad
      [<c0250fe6>] __run_hrtimer+0x3e/0x75
      [<c02518ed>] hrtimer_interrupt+0xf1/0x154
      [<c022c870>] smp_apic_timer_interrupt+0x71/0x84
      [<c021b7e9>] apic_timer_interrupt+0x2d/0x34
      [<c0238597>] finish_task_switch+0x29/0xa0
      [<c05abd13>] schedule+0x765/0x7be
      [<c05abfca>] schedule_timeout+0x1b/0x90
      [<c05ab4d4>] wait_for_common+0xab/0x101
      [<c05ab5ac>] wait_for_completion+0x12/0x14
      [<c033cfc3>] blk_execute_rq+0x84/0x99
      [<c0402470>] scsi_execute+0xc2/0x105
      [<c040250a>] scsi_execute_req+0x57/0x7f
      [<c043afe0>] sr_test_unit_ready+0x3e/0x97
      [<c043bbd6>] sr_media_change+0x43/0x205
      [<c046b59f>] media_changed+0x48/0x77
      [<c046b5ff>] cdrom_media_changed+0x31/0x37
      [<c043b091>] sr_block_media_changed+0x16/0x18
      [<c02b9e69>] check_disk_change+0x1b/0x63
      [<c046f4c3>] cdrom_open+0x7a1/0x806
      [<c043b148>] sr_block_open+0x78/0x8d
      [<c02ba4c0>] do_open+0x90/0x257
      [<c02ba869>] blkdev_open+0x2d/0x56
      [<c0296a1f>] __dentry_open+0x14d/0x23c
      [<c0296b32>] nameidata_to_filp+0x24/0x38
      [<c02a1c68>] do_filp_open+0x347/0x626
      [<c02967ef>] do_sys_open+0x47/0xbc
      [<c02968b0>] sys_open+0x23/0x2b
      [<c021aadd>] sysenter_do_call+0x12/0x26
      
      I've tested this on both x86_64 and i386.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5a81b62
    • S
      ftrace: create __mcount_loc section · 8da3821b
      Steven Rostedt 提交于
      This patch creates a section in the kernel called "__mcount_loc".
      This will hold a list of pointers to the mcount relocation for
      each call site of mcount.
      
      For example:
      
      objdump -dr init/main.o
      [...]
      Disassembly of section .text:
      
      0000000000000000 <do_one_initcall>:
         0:   55                      push   %rbp
      [...]
      000000000000017b <init_post>:
       17b:   55                      push   %rbp
       17c:   48 89 e5                mov    %rsp,%rbp
       17f:   53                      push   %rbx
       180:   48 83 ec 08             sub    $0x8,%rsp
       184:   e8 00 00 00 00          callq  189 <init_post+0xe>
                              185: R_X86_64_PC32      mcount+0xfffffffffffffffc
      [...]
      
      We will add a section to point to each function call.
      
         .section __mcount_loc,"a",@progbits
      [...]
         .quad .text + 0x185
      [...]
      
      The offset to of the mcount call site in init_post is an offset from
      the start of the section, and not the start of the function init_post.
      The mcount relocation is at the call site 0x185 from the start of the
      .text section.
      
        .text + 0x185  == init_post + 0xa
      
      We need a way to add this __mcount_loc section in a way that we do not
      lose the relocations after final link.  The .text section here will
      be attached to all other .text sections after final link and the
      offsets will be meaningless.  We need to keep track of where these
      .text sections are.
      
      To do this, we use the start of the first function in the section.
      do_one_initcall.  We can make a tmp.s file with this function as a reference
      to the start of the .text section.
      
         .section __mcount_loc,"a",@progbits
      [...]
         .quad do_one_initcall + 0x185
      [...]
      
      Then we can compile the tmp.s into a tmp.o
      
        gcc -c tmp.s -o tmp.o
      
      And link it into back into main.o.
      
        ld -r main.o tmp.o -o tmp_main.o
        mv tmp_main.o main.o
      
      But we have a problem.  What happens if the first function in a section
      is not exported, and is a static function. The linker will not let
      the tmp.o use it.  This case exists in main.o as well.
      
      Disassembly of section .init.text:
      
      0000000000000000 <set_reset_devices>:
         0:   55                      push   %rbp
         1:   48 89 e5                mov    %rsp,%rbp
         4:   e8 00 00 00 00          callq  9 <set_reset_devices+0x9>
                              5: R_X86_64_PC32        mcount+0xfffffffffffffffc
      
      The first function in .init.text is a static function.
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
      
      The lowercase 't' means that set_reset_devices is local and is not exported.
      If we simply try to link the tmp.o with the set_reset_devices we end
      up with two symbols: one local and one global.
      
       .section __mcount_loc,"a",@progbits
       .quad set_reset_devices + 0x10
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
                       U set_reset_devices
      
      We still have an undefined reference to set_reset_devices, and if we try
      to compile the kernel, we will end up with an undefined reference to
      set_reset_devices, or even worst, it could be exported someplace else,
      and then we will have a reference to the wrong location.
      
      To handle this case, we make an intermediate step using objcopy.
      We convert set_reset_devices into a global exported symbol before linking
      it with tmp.o and set it back afterwards.
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 T set_reset_devices
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 T set_reset_devices
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
      
      Now we have a section in main.o called __mcount_loc that we can place
      somewhere in the kernel using vmlinux.ld.S and access it to convert
      all these locations that call mcount into nops before starting SMP
      and thus, eliminating the need to do this with kstop_machine.
      
      Note, A well documented perl script (scripts/recordmcount.pl) is used
      to do all this in one location.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8da3821b