1. 28 11月, 2011 7 次提交
  2. 08 10月, 2011 2 次提交
    • A
      perf tools: Make --no-asm-raw the default · 64c6f0c7
      Arnaldo Carvalho de Melo 提交于
      And add the annotation output knobs to all the tools that have
      integrated annotation (top, report).
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-gnlob67mke6sji2kf4nstp7m@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      64c6f0c7
    • S
      perf tools: Make perf.data more self-descriptive (v8) · fbe96f29
      Stephane Eranian 提交于
      The goal of this patch is to include more information about the host
      environment into the perf.data so it is more self-descriptive. Overtime,
      profiles are captured on various machines and it becomes hard to track
      what was recorded, on what machine and when.
      
      This patch provides a way to solve this by extending the perf.data file
      with basic information about the host machine. To add those extensions,
      we leverage the feature bits capabilities of the perf.data format.  The
      change is backward compatible with existing perf.data files.
      
      We define the following useful new extensions:
       - HEADER_HOSTNAME: the hostname
       - HEADER_OSRELEASE: the kernel release number
       - HEADER_ARCH: the hw architecture
       - HEADER_CPUDESC: generic CPU description
       - HEADER_NRCPUS: number of online/avail cpus
       - HEADER_CMDLINE: perf command line
       - HEADER_VERSION: perf version
       - HEADER_TOPOLOGY: cpu topology
       - HEADER_EVENT_DESC: full event description (attrs)
       - HEADER_CPUID: easy-to-parse low level CPU identication
      
      The small granularity for the entries is to make it easier to extend
      without breaking backward compatiblity. Many entries are provided as
      ASCII strings.
      
      Perf report/script have been modified to print the basic information as
      easy-to-parse ASCII strings. Extended information about CPU and NUMA
      topology may be requested with the -I option.
      
      Thanks to David Ahern for reviewing and testing the many versions of
      this patch.
      
       $ perf report --stdio
       # ========
       # captured on : Mon Sep 26 15:22:14 2011
       # hostname : quad
       # os release : 3.1.0-rc4-tip
       # perf version : 3.1.0-rc4
       # arch : x86_64
       # nrcpus online : 4
       # nrcpus avail : 4
       # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
       # cpuid : GenuineIntel,6,15,11
       # total memory : 8105360 kB
       # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
       # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
       # HEADER_CPU_TOPOLOGY info available, use -I to display
       # HEADER_NUMA_TOPOLOGY info available, use -I to display
       # ========
       #
       ...
      
       $ perf report --stdio -I
       # ========
       # captured on : Mon Sep 26 15:22:14 2011
       # hostname : quad
       # os release : 3.1.0-rc4-tip
       # perf version : 3.1.0-rc4
       # arch : x86_64
       # nrcpus online : 4
       # nrcpus avail : 4
       # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
       # cpuid : GenuineIntel,6,15,11
       # total memory : 8105360 kB
       # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
       # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
       # sibling cores   : 0-3
       # sibling threads : 0
       # sibling threads : 1
       # sibling threads : 2
       # sibling threads : 3
       # node0 meminfo  : total = 8320608 kB, free = 7571024 kB
       # node0 cpu list : 0-3
       # ========
       #
       ...
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110930134040.GA5575@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      [ committer notes: Use --show-info in the tools as was in the docs, rename
        perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
        conflict with f69b64f7 "perf: Support setting the disassembler style" ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fbe96f29
  3. 07 10月, 2011 3 次提交
  4. 30 9月, 2011 2 次提交
  5. 03 8月, 2011 1 次提交
  6. 05 7月, 2011 1 次提交
    • A
      perf report/annotate/script: Add option to specify a CPU range · 5d67be97
      Anton Blanchard 提交于
      Add an option to perf report/annotate/script to specify which
      CPUs to operate on. This enables us to take a single system wide
      profile and analyse each CPU (or group of CPUs) in isolation.
      
      This was useful when profiling a multiprocess workload where the
      bottleneck was on one CPU but this was hidden in the overall
      profile. Per process and per thread breakdowns didn't help
      because multiple processes were running on each CPU and no
      single process consumed an entire CPU.
      
      The patch converts the list of CPUs returned by cpu_map__new
      into a bitmap for fast lookup. I wanted to use -C to be
      consistent with perf top/record/stat, but unfortunately perf
      report already uses -C <comms>.
      
       v2: Incorporate suggestions from David Ahern:
      	- Added -c to perf script
      	- Check that SAMPLE_CPU is set when -c is used
      	- Update documentation
      
       v3: Create perf_session__cpu_bitmap()
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Link: http://lkml.kernel.org/r/20110704215750.11647eb9@krytenSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5d67be97
  7. 30 6月, 2011 2 次提交
    • F
      perf tools: Only display parent field if explictly sorted · cb1955b8
      Frederic Weisbecker 提交于
      We don't need to display the parent field if the parent
      sorting machinery is only used for parent filtering
      (as in "-p foo").
      
      However if parent filtering is used in combination with
      explicit parent sorting ( -s parent), we want to
      display it.
      
      Result with:
      
        perf report -p kernel_thread -s parent
      
      Before:
      
       # Overhead  Parent symbol
       # ........  .............
       #
           0.07%
                  |
                  --- ioread8
                      ata_sff_check_status
                      ata_sff_tf_load
                      ata_sff_qc_issue
                      ata_bmdma_qc_issue
                      ata_qc_issue
                      ata_scsi_translate
                      ata_scsi_queuecmd
                      scsi_dispatch_cmd
                      scsi_request_fn
                      __blk_run_queue
                      __make_request
                      generic_make_request
                      submit_bio
                      submit_bh
                      journal_submit_commit_record
                      jbd2_journal_commit_transaction
                      kjournald2
                      kthread
                      kernel_thread_helpe
      
      After:
      
       # Overhead  Parent symbol
       # ........  .............
       #
           0.07%  kernel_thread_helper
                  |
                  --- ioread8
                      ata_sff_check_status
                      ata_sff_tf_load
                      ata_sff_qc_issue
                      ata_bmdma_qc_issue
                      ata_qc_issue
                      ata_scsi_translate
                      ata_scsi_queuecmd
                      scsi_dispatch_cmd
                      scsi_request_fn
                      __blk_run_queue
                      __make_request
                      generic_make_request
                      submit_bio
                      submit_bh
                      journal_submit_commit_record
                      jbd2_journal_commit_transaction
                      kjournald2
                      kthread
                      kernel_thread_helper
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Sam Liao <phyomh@gmail.com>
      cb1955b8
    • S
      perf tools: Add inverted call graph report support. · d797fdc5
      Sam Liao 提交于
      Add "caller/callee" option to support inverted butterfly report,
      in the inverted report (with caller option), the call graph start
      from the callee's ancestor. Users can use such view to catch system's
      performance bottleneck from a sysprof like view. Using this option
      with specified sort order like pid gives us high level view of call
      graph statistics.
      
      Also add "-G" alias for inverted call graph.
      Signed-off-by: NSam Liao <phyomh@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d797fdc5
  8. 28 5月, 2011 1 次提交
  9. 26 5月, 2011 1 次提交
    • A
      perf symbols: Handle /proc/sys/kernel/kptr_restrict · ec80fde7
      Arnaldo Carvalho de Melo 提交于
      Perf uses /proc/modules to figure out where kernel modules are loaded.
      
      With the advent of kptr_restrict, non root users get zeroes for all module
      start addresses.
      
      So check if kptr_restrict is non zero and don't generate the syntethic
      PERF_RECORD_MMAP events for them.
      
      Warn the user about it in perf record and in perf report.
      
      In perf report the reference relocation symbol being zero means that
      kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
      use it to fixup symbol addresses when using a valid kallsyms (in the buildid
      cache) or vmlinux (in the vmlinux path) build-id located automatically or
      specified by the user.
      
      Provide an explanation about it in 'perf report' if kernel samples were taken,
      checking if a suitable vmlinux or kallsyms was found/specified.
      
      Restricted /proc/kallsyms don't go to the buildid cache anymore.
      
      Example:
      
       [acme@emilia ~]$ perf record -F 100000 sleep 1
      
       WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
       /proc/sys/kernel/kptr_restrict.
      
       Samples in kernel functions may not be resolved if a suitable vmlinux file is
       not found in the buildid cache or in the vmlinux path.
      
       Samples in kernel modules won't be resolved at all.
      
       If some relocation was applied (e.g. kexec) symbols may be misresolved even
       with a suitable vmlinux or kallsyms file.
      
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
       [acme@emilia ~]$
      
       [acme@emilia ~]$ perf report --stdio
       Kernel address maps (/proc/{kallsyms,modules}) were restricted,
       check /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
       If some relocation was applied (e.g. kexec) symbols may be misresolved.
      
       Samples in kernel modules can't be resolved as well.
      
       # Events: 13  cycles
       #
       # Overhead  Command      Shared Object                 Symbol
       # ........  .......  .................  .....................
       #
          20.24%    sleep  [kernel.kallsyms]  [k] page_fault
          20.04%    sleep  [kernel.kallsyms]  [k] filemap_fault
          19.78%    sleep  [kernel.kallsyms]  [k] __lru_cache_add
          19.69%    sleep  ld-2.12.so         [.] memcpy
          14.71%    sleep  [kernel.kallsyms]  [k] dput
           4.70%    sleep  [kernel.kallsyms]  [k] flush_signal_handlers
           0.73%    sleep  [kernel.kallsyms]  [k] perf_event_comm
           0.11%    sleep  [kernel.kallsyms]  [k] native_write_msr_safe
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [acme@emilia ~]$
      
      This is because it found a suitable vmlinux (build-id checked) in
      /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
      file name).
      
      If we remove that file from the vmlinux path:
      
       [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
      		     /lib/modules/2.6.39-rc7+/build/vmlinux.OFF
       [acme@emilia ~]$ perf report --stdio
       [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
       not found, continuing without symbols
      
       Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
       /proc/sys/kernel/kptr_restrict before running 'perf record'.
      
       As no suitable kallsyms nor vmlinux was found, kernel samples can't be
       resolved.
      
       Samples in kernel modules can't be resolved as well.
      
       # Events: 13  cycles
       #
       # Overhead  Command      Shared Object  Symbol
       # ........  .......  .................  ......
       #
          80.31%    sleep  [kernel.kallsyms]  [k] 0xffffffff8103425a
          19.69%    sleep  ld-2.12.so         [.] memcpy
      
       #
       # (For a higher level overview, try: perf report --sort comm,dso)
       #
       [acme@emilia ~]$
      Reported-by: NStephane Eranian <eranian@google.com>
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec80fde7
  10. 24 3月, 2011 1 次提交
    • A
      perf session: Pass evsel in event_ops->sample() · 9e69c210
      Arnaldo Carvalho de Melo 提交于
      Resolving the sample->id to an evsel since the most advanced tools,
      report and annotate, and the others will too when they evolve to
      properly support multi-event perf.data files.
      
      Good also because it does an extra validation, checking that the ID is
      valid when present. When that is not the case, the overhead is just a
      branch + function call (perf_evlist__id2evsel).
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9e69c210
  11. 10 3月, 2011 1 次提交
    • A
      perf session: Use evlist/evsel for managing perf.data attributes · a91e5431
      Arnaldo Carvalho de Melo 提交于
      So that we can reuse things like the id to attr lookup routine
      (perf_evlist__id2evsel) that uses a hash table instead of the linear
      lookup done in the older perf_header_attr routines, etc.
      
      Also to make evsels/evlist more pervasive an API, simplyfing using the
      emerging perf lib.
      
      cc: Arun Sharma <arun@sharma-home.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a91e5431
  12. 07 3月, 2011 2 次提交
    • A
      perf report tui: Improve multi event session support · 7f0030b2
      Arnaldo Carvalho de Melo 提交于
      When multiple events were used in 'perf record', allow the user to
      choose which one is wanted before showing the per event histograms.
      
      Annotations will be performed on the chosen event.
      
      Allow going back and forth from event to event quickly using just the
      arrow keys and enter.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: William Cohen <wcohen@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7f0030b2
    • A
      perf tools: Improve support for sessions with multiple events · e248de33
      Arnaldo Carvalho de Melo 提交于
      By creating an perf_evlist out of the attributes in the perf.data file
      header, so that we can use evlists and evsels when reading recorded
      sessions in addition to when we record sessions.
      
      More work is needed to allow tools to allow the user to select which
      events are wanted when browsing sessions, be it just one or a subset of
      them, aggregated or showed at the same time but with different
      indications on the UI to allow seeing workloads thru different views at
      the same time.
      
      But the overall goal/trend is to more uniformly use evsels and evlists.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e248de33
  13. 18 2月, 2011 1 次提交
  14. 11 2月, 2011 1 次提交
    • A
      perf report: Fix initializion of annotate symbol priv area · 0849327d
      Arnaldo Carvalho de Melo 提交于
      We only allocate it when in TUI mode. In --stdio mode unconditionally
      initializing this area leads to memory corruption.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0849327d
  15. 09 2月, 2011 1 次提交
    • A
      perf annotate: Move locking to struct annotation · ce6f4fab
      Arnaldo Carvalho de Melo 提交于
      Since we'll need it when implementing the live annotate TUI browser.
      
      This also simplifies things a bit by having the list head for the source
      code to be in the dynamicly allocated part of struct annotation, that
      way we don't have to pass it around, it can be found from the struct
      symbol that is passed everywhere.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ce6f4fab
  16. 05 2月, 2011 2 次提交
    • A
      perf annotate: Support multiple histograms in annotation · 2f525d01
      Arnaldo Carvalho de Melo 提交于
      The perf annotate tool continues aggregating everything on just one
      histograms, but to support the top model add support for one histogram
      perf evsel in the evlist.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2f525d01
    • A
      perf annotate: Move annotate functions to util/ · 78f7defe
      Arnaldo Carvalho de Melo 提交于
      They will be used by perf top, so that we have just one set of routines
      to do annotation.
      
      Rename "struct sym_priv" to "struct annotation", etc, to clarify this
      code a bit.
      
      Rename "struct sym_ext" to "struct source_line", to give it a meaningful
      name, that clarifies that it is a the result of an addr2line call, that
      is sorted by percentage one particular source code line appeared in the
      annotation.
      
      And since we're moving things around also rename 'sym_hist->ip' to
      'sym_hist->addr' as we want to do data structure annotation at some
      point.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      78f7defe
  17. 01 2月, 2011 1 次提交
  18. 30 1月, 2011 2 次提交
  19. 23 1月, 2011 3 次提交
    • F
      perf callchain: Rename register_callchain_param into callchain_register_param · 16537f13
      Frederic Weisbecker 提交于
      To make the callchain API naming more consistent.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294977121-5700-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      16537f13
    • F
      perf callchain: Feed callchains into a cursor · 1b3a0e95
      Frederic Weisbecker 提交于
      The callchains are fed with an array of a fixed size.
      As a result we iterate over each callchains three times:
      
      - 1st to resolve symbols
      - 2nd to filter out context boundaries
      - 3rd for the insertion into the tree
      
      This also involves some pairs of memory allocation/deallocation
      everytime we insert a callchain, for the filtered out array of
      addresses and for the array of symbols that comes along.
      
      Instead, feed the callchains through a linked list with persistent
      allocations. It brings several pros like:
      
      - Merge the 1st and 2nd iterations in one. That was possible before
      but in a way that would involve allocating an array slightly taller
      than necessary because we don't know in advance the number of context
      boundaries to filter out.
      
      - Much lesser allocations/deallocations. The linked list keeps
      persistent empty entries for the next usages and is extendable at
      will.
      
      - Makes it easier for multiple sources of callchains to feed a
      stacktrace together. This is deemed to pave the way for cfi based
      callchains wherein traditional frame pointer based kernel
      stacktraces will precede cfi based user ones, producing an overall
      callchain which size is hardly predictable. This requirement
      makes the static array obsolete and makes a linked list based
      iterator a much more flexible fit.
      
      Basic testing on a big perf file containing callchains (~ 176 MB)
      has shown a throughput gain of about 11% with perf report.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294977121-5700-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b3a0e95
    • A
      perf tools: Fix 64 bit integer format strings · 9486aa38
      Arnaldo Carvalho de Melo 提交于
      Using %L[uxd] has issues in some architectures, like on ppc64.  Fix it
      by making our 64 bit integers typedefs of stdint.h types and using
      PRI[ux]64 like, for instance, git does.
      
      Reported by Denis Kirjanov that provided a patch for one case, I went
      and changed all cases.
      Reported-by: NDenis Kirjanov <dkirjanov@kernel.org>
      Tested-by: NDenis Kirjanov <dkirjanov@kernel.org>
      LKML-Reference: <20110120093246.GA8031@hera.kernel.org>
      Cc: Denis Kirjanov <dkirjanov@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pingtian Han <phan@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9486aa38
  20. 22 12月, 2010 3 次提交
    • D
      perf symbols: Add symfs option for off-box analysis using specified tree · ec5761ea
      David Ahern 提交于
      The symfs argument allows analysis of perf.data file using a locally accessible
      filesystem tree with debug symbols - e.g., tree created during image builds,
      sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
      with an OS tree can be analyzed from anywhere without the need to populate a
      local data store with build-ids.
      
      Commiter notes:
      
      o Fixed up symfs="/" variants handling.
      
      o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
        outside the symfs directory.
      
      LKML-Reference: <1291926427-28846-1-git-send-email-daahern@cisco.com>
      Signed-off-by: NDavid Ahern <daahern@cisco.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec5761ea
    • I
      perf record,report,annotate,diff: Process events in order · eac23d1c
      Ian Munsie 提交于
      This patch changes perf report to ask for the ID info on all events be
      default if recording from multiple CPUs.
      
      Perf report, annotate and diff will now process the events in order if
      the kernel is able to provide timestamps on all events. This ensures
      that events such as COMM and MMAP which are necessary to correctly
      interpret samples are processed prior to those samples so that they are
      attributed correctly.
      
      Before:
       # perf record ./cachetest
       # perf report
      
       # Events: 6K cycles
       #
       # Overhead  Command      Shared Object                           Symbol
       # ........  .......  .................  ...............................
       #
           74.11%    :3259  [unknown]          [k] 0x4a6c
            1.50%  cachetest  ld-2.11.2.so       [.] 0x1777c
            1.46%    :3259  [kernel.kallsyms]  [k] .perf_event_mmap_ctx
            1.25%    :3259  [kernel.kallsyms]  [k] restore
            0.74%    :3259  [kernel.kallsyms]  [k] ._raw_spin_lock
            0.71%    :3259  [kernel.kallsyms]  [k] .filemap_fault
            0.66%    :3259  [kernel.kallsyms]  [k] .memset
            0.54%  cachetest  [kernel.kallsyms]  [k] .sha_transform
            0.54%    :3259  [kernel.kallsyms]  [k] .copy_4K_page
            0.54%    :3259  [kernel.kallsyms]  [k] .find_get_page
            0.52%    :3259  [kernel.kallsyms]  [k] .trace_hardirqs_off
            0.50%    :3259  [kernel.kallsyms]  [k] .__do_fault
      <SNIP>
      
      After:
       # perf report
      
       # Events: 6K cycles
       #
       # Overhead  Command      Shared Object                           Symbol
       # ........  .......  .................  ...............................
       #
           44.28%  cachetest  cachetest          [.] sumArrayNaive
           22.53%  cachetest  cachetest          [.] sumArrayOptimal
            6.59%  cachetest  ld-2.11.2.so       [.] 0x1777c
            2.13%  cachetest  [unknown]          [k] 0x340
            1.46%  cachetest  [kernel.kallsyms]  [k] .perf_event_mmap_ctx
            1.25%  cachetest  [kernel.kallsyms]  [k] restore
            0.74%  cachetest  [kernel.kallsyms]  [k] ._raw_spin_lock
            0.71%  cachetest  [kernel.kallsyms]  [k] .filemap_fault
            0.66%  cachetest  [kernel.kallsyms]  [k] .memset
            0.54%  cachetest  [kernel.kallsyms]  [k] .copy_4K_page
            0.54%  cachetest  [kernel.kallsyms]  [k] .find_get_page
            0.54%  cachetest  [kernel.kallsyms]  [k] .sha_transform
            0.52%  cachetest  [kernel.kallsyms]  [k] .trace_hardirqs_off
            0.50%  cachetest  [kernel.kallsyms]  [k] .__do_fault
      <SNIP>
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <1291872833-839-1-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eac23d1c
    • I
      perf session: Fallback to unordered processing if no sample_id_all · 21ef97f0
      Ian Munsie 提交于
      If we are running the new perf on an old kernel without support for
      sample_id_all, we should fall back to the old unordered processing of
      events. If we didn't than we would *always* process events without
      timestamps out of order, whether or not we hit a reordering race. In
      other words, instead of there being a chance of not attributing samples
      correctly, we would guarantee that samples would not be attributed.
      
      While processing all events without timestamps before events with
      timestamps may seem like an intuitive solution, it falls down as
      PERF_RECORD_EXIT events would also be processed before any samples.
      Even with a workaround for that case, samples before/after an exec would
      not be attributed correctly.
      
      This patch allows commands to indicate whether they need to fall back to
      unordered processing, so that commands that do not care about timestamps
      on every event will not be affected. If we do fallback, this will print
      out a warning if report -D was invoked.
      
      This patch adds the test in perf_session__new so that we only need to
      test once per session. Commands that do not use an event_ops (such as
      record and top) can simply pass NULL in it's place.
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <1291951882-sup-6069@au1.ibm.com>
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      21ef97f0
  21. 09 12月, 2010 1 次提交
  22. 05 12月, 2010 1 次提交
    • A
      perf session: Parse sample earlier · 640c03ce
      Arnaldo Carvalho de Melo 提交于
      At perf_session__process_event, so that we reduce the number of lines in eache
      tool sample processing routine that now receives a sample_data pointer already
      parsed.
      
      This will also be useful in the next patch, where we'll allow sample the
      identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu,
      timestamp) just after before every event.
      
      Also validate callchains in perf_session__process_event, i.e. as early as
      possible, and keep a counter of the number of events discarded due to invalid
      callchains, warning the user about it if it happens.
      
      There is an assumption that was kept that all events have the same sample_type,
      that will be dealt with in the future, when this preexisting limitation will be
      removed.
      Tested-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <1291318772-30880-4-git-send-email-acme@infradead.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      640c03ce