1. 16 8月, 2009 1 次提交
    • I
      perf: Enable more compiler warnings · 83a0944f
      Ingo Molnar 提交于
      Related to a shadowed variable bug fix Valdis Kletnieks noticed
      that perf does not get built with -Wshadow, which could have
      helped us avoid the bug.
      
      So enable -Wshadow and also enable the following warnings on
      perf builds, in addition to the already enabled -Wall -Wextra
      -std=gnu99 warnings:
      
       -Wcast-align
       -Wformat=2
       -Wshadow
       -Winit-self
       -Wpacked
       -Wredundant-decls
       -Wstack-protector
       -Wstrict-aliasing=3
       -Wswitch-default
       -Wswitch-enum
       -Wno-system-headers
       -Wundef
       -Wvolatile-register-var
       -Wwrite-strings
       -Wbad-function-cast
       -Wmissing-declarations
       -Wmissing-prototypes
       -Wnested-externs
       -Wold-style-definition
       -Wstrict-prototypes
       -Wdeclaration-after-statement
      
      And change/fix the perf code to build cleanly under GCC 4.3.2.
      
      The list of warnings enablement is rather arbitrary: it's based
      on my (quick) reading of the GCC manpages and trying them on
      perf.
      
      I categorized the warnings based on individually enabling them
      and looking whether they trigger something in the perf build.
      If i liked those warnings (i.e. if they trigger for something
      that arguably could be improved) i enabled the warning.
      
      If the warnings seemed to come from language laywers spamming
      the build with tons of nuisance warnings i generally kept them
      off. Most of the sign conversion related warnings were in
      this category. (A second patch enabling some of the sign
      warnings might be welcome - sign bugs can be nasty.)
      
      I also kept warnings that seem to make sense from their manpage
      description and which produced no actual warnings on our code
      base. These warnings might still be turned off if they end up
      being a nuisance.
      
      I also left out a few warnings that are not supported in older
      compilers.
      
      [ Note that these changes might break the build on older
        compilers i did not test, or on non-x86 architectures that
        produce different warnings, so more testing would be welcome. ]
      
      Reported-by: Valdis.Kletnieks@vt.edu
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83a0944f
  2. 09 8月, 2009 3 次提交
    • F
      perf tools: callchain: Fix bad rounding of minimum rate · c0a8865e
      Frederic Weisbecker 提交于
      Sometimes we get callchain branches that have a rate under the
      limit given by the user.
      
      Say you launched:
      
       perf record -f -g -a ./hackbench 10
       perf report -g fractal,10.0
      
      And you got:
      
      2.33%       hackbench  [kernel]                  [k] _spin_lock_irqsave
                      |
                      |--78.57%-- remove_wait_queue
                      |          poll_freewait
                      |          do_sys_poll
                      |          sys_poll
                      |          sysenter_dispatch
                      |          0xf7ffa430
                      |          0x1ffadea3c
                      |
                      |--7.14%-- __up_read
                      |          up_read
                      |          do_page_fault
                      |          page_fault
                      |          0xf7ffa430
                      |          0xa0df710000000a
                      ...
      
      It is abnormal to get a 7.14% branch whereas we passed a 10%
      filter.
      
      The problem is that we round down the minimum threshold. This
      happens mostly when we have very low number of events. If the
      total amount of your branch is 4 and you have a subranch of 3
      events, filtering to 90% will be computed like follows:
      
        limit = 4 * 0.9;
      
      The result is about 3.6, but the cast to integer will round
      down to 3. It means that our filter is actually of 75%
      
      We must then explicitly round up the minimum threshold.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: acme@redhat.com
      Cc: peterz@infradead.org
      Cc: efault@gmx.de
      LKML-Reference: <20090809024235.GA10146@nowhere>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c0a8865e
    • F
      perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains · b0efe213
      Frederic Weisbecker 提交于
      When the callchain tree comes to insert an empty backtrace, it
      raises a spurious warning about the fact we are inserting an
      empty. This is spurious because the radix tree assumes it did
      something wrong to reach this state. But it didn't, we just met
      an empty callchain that has to be ignored.
      
      This happens occasionally with certain types of call-chain
      recordings. If it happens it's a big nuisance as perf report
      output starts with thousands of warning lines.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249690585-9145-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b0efe213
    • F
      perf tools: Fix call-chain cumul hit based sub-total (fractal mode) · 1953287b
      Frederic Weisbecker 提交于
      The callchain fractal mode builds each new total hits in a new
      branch of profiling by using the parent's hits of the current
      branch plus the hits of the children.
      
      This is wrong, the total hits of a branch should be made of the
      sum of every children hits, we must ignore the parent hits in
      this scope.
      
      This patch also fixes another mistake with the hit counting.
      
      Now the rates are correct.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1953287b
  3. 05 7月, 2009 2 次提交
    • F
      perf report: Add "Fractal" mode output - support callchains with relative overhead rate · 805d127d
      Frederic Weisbecker 提交于
      The current callchain displays the overhead rates as absolute:
      relative to the total overhead.
      
      This patch provides relative overhead percentage, in which each
      branch of the callchain tree is a independant instrumentated object.
      
      This provides a 'fractal' view of the call-chain profile: each
      sub-graph looks like a profile in itself - relative to its parent.
      
      You can produce such output by using the "fractal" mode
      that you can abbreviate via f, fr, fra, frac, etc...
      
      ./perf report -s sym -c fractal
      
      Example:
      
           8.46%  [k] copy_user_generic_string
                      |
                      |--52.01%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--97.20%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --2.81%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--39.85%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--97.05%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --2.95%-- sys_write
                      |                     system_call_fastpath
                      |                     __write_nocancel
      [...]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      805d127d
    • F
      perf_counter tools: callchains: Manage the cumul hits on the fly · e05b876c
      Frederic Weisbecker 提交于
      The cumul hits are the number of hits of every childs of a node
      plus the hits of the current nodes, required for percentage
      computing of a branch.
      
      Theses numbers are calculated during the sorting of the branches of
      the callchain tree using a depth first postfix traversal, so that
      cumulative hits are propagated in the right order.
      
      But if we plan to implement percentages relative to the parent and not
      absolute percentages (relative to the whole overhead), we need to know
      the cumulative hits of the parent before computing the children
      because the relative minimum acceptable number of entries (ie: minimum
      rate against the cumulative hits from the parent) is the basis to
      filter the children against a given rate.
      
      Then we need to handle the cumul hits on the fly to prepare the
      implementation of relative overhead rates.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e05b876c
  4. 03 7月, 2009 3 次提交
    • F
      perf_counter tools: Set the minimum percent for callchains to be displayed · c20ab37e
      Frederic Weisbecker 提交于
      Callchains output may become a burden on a trace because even
      rarely hit site are exposed. This can be too much information.
      
      Let the user set a threshold as a minimum percent of hits using
      the new pattern for the -c option:
      
          -c mode,min_percent
      
      Example:
      
      $ perf report -s sym -c flat,4
      
           8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
           5.39%  [k] search_by_key
           4.63%  0x00000000009e0a
           2.36%  [k] memcpy_c
      [...]
      
      $ perf report -s sym -c graph,2
      
           8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |           --4.19%-- sys_pread64
                      |                     system_call_fastpath
                      |                     pread64
                      |
                       --3.24%-- generic_file_buffered_write
                                 __generic_file_aio_write_nolock
                                 generic_file_aio_write
                                 do_sync_write
                                 reiserfs_file_write
                                 vfs_write
                                 |
                                  --3.14%-- sys_pwrite64
                                            system_call_fastpath
                                            __pwrite64
      
           5.39%  [k] search_by_key
                      |
                       --2.23%-- reiserfs_update_sd_size
      
           4.63%  0x00000000009e0a
      
           2.36%  [k] memcpy_c
      [...]
      
      You can also omit it and it will default to 0.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c20ab37e
    • F
      perf report: Add support for callchain graph output · 4eb3e478
      Frederic Weisbecker 提交于
      Currently, the printing of callchains is done in a single
      vertical level, this is the "flat" mode:
      
      8.25%  [k] copy_user_generic_string
                   4.19%
                      copy_user_generic_string
                      generic_file_aio_read
                      do_sync_read
                      vfs_read
                      sys_pread64
                      system_call_fastpath
                      pread64
      
      This patch introduces a new "graph" mode which provides a
      hierarchical output of factorized paths recursively sorted:
      
       8.25%  [k] copy_user_generic_string
                      |
                      |--4.31%-- generic_file_aio_read
                      |          do_sync_read
                      |          vfs_read
                      |          |
                      |          |--4.19%-- sys_pread64
                      |          |          system_call_fastpath
                      |          |          pread64
                      |          |
                      |           --0.12%-- sys_read
                      |                     system_call_fastpath
                      |                     __read
                      |
                      |--3.24%-- generic_file_buffered_write
                      |          __generic_file_aio_write_nolock
                      |          generic_file_aio_write
                      |          do_sync_write
                      |          reiserfs_file_write
                      |          vfs_write
                      |          |
                      |          |--3.14%-- sys_pwrite64
                      |          |          system_call_fastpath
                      |          |          __pwrite64
                      |          |
                      |           --0.10%-- sys_write
      [...]
      
      The command line has then changed.
      
      By providing the -c option, the callchain will output in the
      flat mode by default.
      
      But you can override it:
      
          perf report -c graph
      
      or
      
          perf report -c flat
      
      You can also pass the abreviated mode:
      
          perf report -c g
      
      or
      
          perf report -c gra
      
      will both make use of the graph mode.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4eb3e478
    • F
      perf_counter tools: Create new chain_for_each_child() iterator · 14f4654c
      Frederic Weisbecker 提交于
      Iterating through children of a node in the callchain tree
      shows something that may be quite confusing at a first glance.
      The head is the children field of the parent and the list nodes
      are in the brothers field of the children.
      
      This is because the childs are linked to the parent as a list
      of "brothers" using the "children" list of the parent as a
      head:
      
        ---------------
       | Parent (head) |-------------------------------------
        ---------------                                      |
           |                                                 |
        children                                             |
           |                                                 |
        -----------               -----------                |
       | 1st child |---brother---| 2nd child |---brother-----
        -----------               -----------
      
      This makes the following strange pattern often occuring:
      
       list_for_each_entry(child, &parent->children, brothers) {
              // do something with children
       }
      
      Abstract it to chain_for_each_child() to factorize and simplify
      this pattern.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14f4654c
  5. 01 7月, 2009 4 次提交
    • I
      perf_counter tools: Add more warnings and fix/annotate them · f37a291c
      Ingo Molnar 提交于
      Enable -Wextra. This found a few real bugs plus a number
      of signed/unsigned type mismatches/uncleanlinesses. It
      also required a few annotations
      
      All things considered it was still worth it so lets try with
      this enabled for now.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f37a291c
    • F
      perf_counter tools: Various fixes for callchains · deac911c
      Frederic Weisbecker 提交于
      The symbol resolving has of course revealed some bugs in the
      callchain tree handling. This patch fixes some of them,
      including:
      
      - inherit the children from the parents while splitting a node
      - fix list range moving
      - fix indexes setting in callchains
      - create a child on the current node if the path doesn't match in
        the existent children (was only done on the root)
      - compare using symbols when possible so that we can match a function
        using any ip inside by referring to its start address.
      
      The practical effects are:
      
      - remove double callchains
      - fix upside down or any random order of callchains
      - fix wrong paths
      - fix bad hits and percentage accounts
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      deac911c
    • F
      perf_counter tools: Resolve symbols in callchains · 4424961a
      Frederic Weisbecker 提交于
      This patch resolves the names, when possible, of each ip
      present in the callchains while using the -c option with perf
      report.
      
      Example:
      
      5.40%  [k] __d_lookup
                   5.37%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
                      sys_access
                      system_call_fastpath
                      0x7fb609846f77
      
                   0.01%
                      perf_callchain
                      perf_counter_overflow
                      intel_pmu_handle_irq
                      perf_counter_nmi_handler
                      notifier_call_chain
                      atomic_notifier_call_chain
                      notify_die
                      do_nmi
                      nmi
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      sys_faccessat
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4424961a
    • F
      perf_counter tools: Fix storage size allocation of callchain list · 9198aa77
      Frederic Weisbecker 提交于
      Fix a confusion while giving the size of a callchain list
      during its allocation. We are using the wrong structure size.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9198aa77
  6. 26 6月, 2009 1 次提交
    • F
      perf_counter tools: Prepare a small callchain framework · 8cb76d99
      Frederic Weisbecker 提交于
      We plan to display the callchains depending on some user-configurable
      parameters.
      
      To gather the callchains stats from the recorded stream in a fast way,
      this patch introduces an ad hoc radix tree adapted for callchains and also
      a rbtree to sort these callchains once we have gathered every events
      from the stream.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8cb76d99