1. 08 10月, 2011 9 次提交
    • A
      perf hists browser: Update the browser.nr_entries after the timer · be83f5ed
      Arnaldo Carvalho de Melo 提交于
      Previously the hist_browser dealt with a static tree of entries, now it
      needs to update the nr_entries in the browser after the timer runs.
      
      A better solution will come when moving using another thread for the
      collapse_resort, etc, but for now this is ok.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-9eno2iq55sjr4iyo899buzaw@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      be83f5ed
    • A
      perf hists browser: Fix TAB/UNTAB use with multiple events · 7d16320e
      Arnaldo Carvalho de Melo 提交于
      When requesting multiple events, say:
      
        # perf top -e instructions -e cycles -e cache-misses
      
      The first screen lets the user chose what to see first, then to switch
      one can either use the left key to get back to the event menu or simply
      use TAB to go the next and shift+TAB to go the prev.
      
      When using TAB/UNTAB the call to perf_evlist__set_selected(event) was
      missing, fix it.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-3xqqh3fwmt914gg43frey14y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d16320e
    • A
      perf hists browser: Don't offer symbol actions when symbols not on --sort · 724c9c9f
      Arnaldo Carvalho de Melo 提交于
      Removing all the entries that only apply to symbols from the menu.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-7bap0cy2fxtorlj5hgsp48m1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      724c9c9f
    • A
      perf annotate browser: Use -> to navigate on assembly lines · 234a5375
      Arnaldo Carvalho de Melo 提交于
      And add better explanations when the line isn't actionable, like non
      assembly lines and on other instructions.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-375n844b5wra7lgq08ou153j@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      234a5375
    • S
      perf tools: Fix broken number of samples for perf report -n · e39622ce
      Stephane Eranian 提交于
      The perf report -n option was broken because it was not reporting the
      correct number of samples depending on the sorting mode. By default,
      samples are sorted by comm,dso,sym. That means that samples for the same
      command (binary) get collapsed.
      
      The hists__collapse_insert_entry() had a bug whereby it was aggregating
      the number of events observed (periods) but not the number of samples.
      Consequently, the number of samples reported could be below reality. The
      percentage remained correct because based on the periods.
      
      This patch fixes the problem by also aggregating the number of samples.
      Here is an example:
      
      $ perf report -n --stdio
          12.38%        842     pong  [kernel.kallsyms]     [k] __lock_acquire
      
      Here pong (a ctxsw stress test), is the only program running
      and thus it is the only one responsible for the lock_acquire samples.
      
      If we change the sorting mode:
      
      $ perf report -n --stdio --sort=sym
          12.38%       1732  [k] __lock_acquire
      
      The actual number of samples is shown.
      
      With the fix:
      
      $ perf report -n --stdio
          12.38%       1732     pong  [kernel.kallsyms]     [k] __lock_acquire
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20111003093815.GA6393@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e39622ce
    • A
      perf top: Use the TUI interface by default · 8b1bfdbd
      Arnaldo Carvalho de Melo 提交于
      To disable it either:
      
      1. Make sure newt-devel is not installed when building it
      
      2. Use 'perf top --stdio' just like with report
      
      3. Edit your ~/.perfconfig or system wide config and have this there:
      
      [tui]
      
      	top = off
      
      But you shouldn't, since the TUI is so much more powerful, has
      integration with annotation and where lots more interesting features
      will be developed, so if something annoys you (the colors?) just let me
      know and I'll do my best to make it pleasant as a default.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-cy2tn4uj1t7c3aqss5l25of5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b1bfdbd
    • A
      perf annotate browser: Allow navigation to called functions · 34958544
      Arnaldo Carvalho de Melo 提交于
      I.e. when in the annotate TUI window, if Enter is pressed over an
      assembly line with a 'callq' it will try to open another TUI window with
      that symbol.
      
      This is just a proof of concept and works only on x86_64, more work is
      needed to support kernel modules, userland, other arches, etc, but
      should already be useful as-is.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-opyvskw5na3qdmkv8vxi3zbr@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      34958544
    • A
      perf top: Add callgraph support · 19d4ac3c
      Arnaldo Carvalho de Melo 提交于
      Just like in 'perf report', but live.
      
      Still needs to decay the callchains, but already somewhat useful as-is.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-cj3rmaf5jpsvi3v0tf7t4uvp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      19d4ac3c
    • A
      perf top: Reuse the 'report' hist_entry/hists classes · ab81f3fd
      Arnaldo Carvalho de Melo 提交于
      This actually fixes several problems we had in the old 'perf top':
      
      1. Unresolved symbols not show, limitation that came from the old
         "KernelTop" codebase, to solve it we would need to do changes
         that would make sym_entry have most of the hist_entry fields.
      2. It was using the number of samples, not the sum of sample->period.
      
      And brings the --sort code that allows us to have all the views in
      'perf report', for instance:
      
      [root@emilia ~]# perf top --sort dso
      PerfTop: 5903 irqs/sec kernel:77.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs)
      ------------------------------------------------------------------------------
      
          31.59%  libcrypto.so.1.0.0
          21.55%  [kernel]
          18.57%  libpython2.6.so.1.0
           7.04%  libc-2.12.so
           6.99%  _backend_agg.so
           4.72%  sshd
           1.48%  multiarray.so
           1.39%  libfreetype.so.6.3.22
           1.37%  perf
           0.71%  libgobject-2.0.so.0.2200.5
           0.53%  [tg3]
           0.48%  libglib-2.0.so.0.2200.5
           0.44%  libstdc++.so.6.0.13
           0.40%  libcairo.so.2.10800.8
           0.38%  libm-2.12.so
           0.34%  umath.so
           0.30%  libgdk-x11-2.0.so.0.1800.9
           0.22%  libpthread-2.12.so
           0.20%  libgtk-x11-2.0.so.0.1800.9
           0.20%  librt-2.12.so
           0.15%  _path.so
           0.13%  libpango-1.0.so.0.2800.1
           0.11%  libatlas.so.3.0
           0.09%  ft2font.so
           0.09%  libpangoft2-1.0.so.0.2800.1
           0.08%  libX11.so.6.3.0
           0.07%  [vdso]
           0.06%  cyclictest
      ^C
      
      All the filter lists can be used as well: --dsos, --comms, --symbols,
      etc.
      
      The 'perf report' TUI is also reused, being possible to apply all the
      zoom operations, do annotation, etc.
      
      This change will allow multiple simplifications in the symbol system as
      well, that will be detailed in upcoming changesets.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xzaaldxq7zhqrrxdxjifk1mh@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab81f3fd
  2. 07 10月, 2011 5 次提交
  3. 30 9月, 2011 17 次提交
  4. 24 9月, 2011 9 次提交
    • A
      perf python: Add missing perf_event__parse_sample 'swapped' parm · 2b022a82
      Arnaldo Carvalho de Melo 提交于
      Problem introduced in 936be503, that missed one perf_event__parse_sample
      user, the python binding.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b022a82
    • D
      perf tools: Add support for disabling -Werror via WERROR=0 · 9e59e099
      Darren Hart 提交于
      GCC often introduces new warnings with lots of false positives -
      breaking -Werror builds. WERROR=0 allows one to build perf without much
      fuss - while still encouraging people to send patches to avoid the fuss
      of having to type WERROR=0.
      
      Bisecting back to commits that produce a (mostly harmless) warning on
      some compilers is more difficult. With WERROR=0 one could bisect without
      worrying about harmless warnings.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Link: http://lkml.kernel.org/r/eac06c7cc4920e5d4830417d466161fb26c7359c.1315514559.git.dvhart@linux.intel.comSigned-off-by: NDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9e59e099
    • A
      perf top: Fix userspace sample addr map offset · af52aafa
      Arnaldo Carvalho de Melo 提交于
      The 'perf top' tool came from the kernel where we had each DSO (vmlinux,
      modules) loaded just once at a time.
      
      But userspace may have DSOs loaded in multiple addresses (shared
      libraries), requiring that we use the just resolved map instead of the
      first one found.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ag53wz0yllpgers0n2w7hchp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af52aafa
    • S
      perf symbols: Fix issue with binaries using 16-bytes buildids (v2) · be96ea8f
      Stephane Eranian 提交于
      Buildid can vary in size. According to the man page of ld, buildid can
      be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of
      20 bytes (160 bits) regardless. When dealing with md5 buildids, it would
      thus read more than needed and that would cause mismatches and samples
      without symbols.
      
      This patch fixes this by taking into account the actual buildid size as
      encoded int he section header. The leftover bytes are also cleared.
      
      This second version fixes a minor issue with the memset() base position.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@gmail.com>
      Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.comSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      be96ea8f
    • D
      perf tool: Fix endianness handling of u32 data in samples · 936be503
      David Ahern 提交于
      Currently, analyzing PPC data files on x86 the cpu field is always 0 and
      the tid and pid are backwards. For example, analyzing a PPC file on PPC
      the pid/tid fields show:
      
              rsyslogd  1210/1212
      
      and analyzing the same PPC file using an x86 perf binary shows:
      
              rsyslogd  1212/1210
      
      The problem is that the swap_op method for samples is
      perf_event__all64_swap which assumes all elements in the sample_data
      struct are u64s. cpu, tid and pid are u32s and need to be handled
      individually. Given that the swap is done before the sample is parsed,
      the simplest solution is to undo the 64-bit swap of those elements when
      the sample is parsed and do the proper swap.
      
      The RAW data field is generic and perf cannot have programmatic knowledge
      of how to treat that data. Instead a warning is given to the user.
      
      Thanks to Anton Blanchard for providing a data file for a mult-CPU
      PPC system so I could verify the fix for the CPU fields.
      
      v3 -> v4:
      - fixed use of WARN_ONCE
      
      v2 -> v3:
      - used WARN_ONCE for message regarding raw data
      - removed struct wrapper around union
      - fixed whitespace issues
      
      v1 -> v2:
      - added a union for undoing the byte-swap on u64 and redoing swap on
        u32's to address compiler errors (see git commit 65014ab3)
      
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      936be503
    • A
      perf sort: Fix symbol sort output by separating unresolved samples by type · 6bb8f311
      Anton Blanchard 提交于
      I took a profile that suggested 60% of total CPU time was in the
      hypervisor:
      
      ...
          60.20%  [H] 0x33d43c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
      
      Using perf stat to get the user/kernel/hypervisor breakdown contradicted
      this.
      
      The problem is we merge all unresolved samples into the one unknown
      bucket. If add a comparison by sample type to sort__sym_cmp we get the
      real picture:
      
      ...
          57.11%  [.] 0x80fbf63c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
           0.65%  [H] 0x33d43c
      
      So it was almost all userspace, not hypervisor as the initial profile
      suggested.
      
      I found another issue while adding this. Symbol sorting sometimes shows
      multiple entries for the unknown bucket:
      
      ...
          16.65%  [.] 0x6cd3a8
           7.25%  [.] 0x422460
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.82%  [.] 0x532908
           2.64%  [.] 0x36b538
           0.94%  [H] 0x8000000000e132a4
           0.82%  [H] 0x800000000000e8b0
      
      This happens because we aren't consistent with our sorting. On
      one hand we check to see if both symbols match and for two unresolved
      samples sym is NULL so we match:
      
              if (left->ms.sym == right->ms.sym)
                      return 0;
      
      On the other hand we use sample IP for unresolved samples when
      comparing against a symbol:
      
             ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
             ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
      
      This means unresolved samples end up spread across the rbtree and we
      can't merge them all.
      
      If we use cmp_null all unresolved samples will end up in the one bucket
      and the output makes more sense:
      
      ...
          39.12%  [.] 0x36b538
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.26%  [H] 0x800000000000e8b0
      Acked-by: NEric B Munson <emunson@mgebm.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6bb8f311
    • A
      perf symbols: Synthesize anonymous mmap events · 6a0e55d8
      Anton Blanchard 提交于
      perf_event__synthesize_mmap_events does not create anonymous mmap events
      even though the kernel does. As a result an already running application
      with dynamically created code will not get profiled - all samples end up
      in the unknown bucket.
      
      This patch skips any entries with '[' in the name to avoid adding events
      for special regions (eg the vsyscall page). All other executable mmaps
      are assumed to be anonymous and an event is synthesized.
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6a0e55d8
    • D
      perf record: Create events initially disabled and enable after init · 764e16a3
      David Ahern 提交于
      perf-record currently creates events enabled. When doing a system wide
      collection (-a arg) this causes data collection for perf's
      initialization activities -- eg., perf_event__synthesize_threads().
      
      For some events (e.g., context switch S/W event or tracepoints like
      syscalls) perf's initialization causes a lot of events to be captured
      frequently generating "Check IO/CPU overload!" warnings on larger
      systems (e.g., 2 socket, quad core, hyperthreading).
      
      perf's initialization phase can be skipped by creating events
      disabled and then enabling them once the initialization is done.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      764e16a3
    • A
      perf symbols: Add some heuristics for choosing the best duplicate symbol · 694bf407
      Anton Blanchard 提交于
      Try and pick the best symbol based on a few heuristics:
      
      -  Prefer a non weak symbol over a weak one
      -  Prefer a global symbol over a non global one
      -  Prefer a symbol with less underscores (idea taken from kallsyms.c)
      -  If all else fails, choose the symbol with the longest name
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      694bf407