1. 13 4月, 2017 2 次提交
  2. 12 4月, 2017 9 次提交
  3. 11 4月, 2017 3 次提交
    • T
      perf pmu: Refactor wordwrap() with ltrim() · aa4beb10
      Taeung Song 提交于
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1491575061-704-5-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa4beb10
    • J
      perf evsel: Return exact sub event which failed with EPERM for wildcards · 32ccb130
      Jin Yao 提交于
      The kernel has a special check for a specific irq_vectors trace event.
      
      TRACE_EVENT_PERF_PERM(irq_work_exit,
      	is_sampling_event(p_event) ? -EPERM : 0);
      
      The perf-record fails for this irq_vectors event when it is present,
      like when using a wildcard:
      
        root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
      
        To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
      
              kernel.perf_event_paranoid = -1
      
      This patch prints out the exact sub event that failed with EPERM for
      wildcards to help in understanding what went wrong when this event is
      present:
      
      After the patch:
      
        root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
        Error:
        No permission to enable irq_vectors:irq_work_exit event.
      
        You may not have permission to collect system-wide stats.
        ......
      
      Committer notes:
      
      So we have a lot of irq_vectors events:
      
        [root@jouet ~]# perf list irq_vectors:*
      
        List of pre-defined events (to be used in -e):
      
          irq_vectors:call_function_entry                    [Tracepoint event]
          irq_vectors:call_function_exit                     [Tracepoint event]
          irq_vectors:call_function_single_entry             [Tracepoint event]
          irq_vectors:call_function_single_exit              [Tracepoint event]
          irq_vectors:deferred_error_apic_entry              [Tracepoint event]
          irq_vectors:deferred_error_apic_exit               [Tracepoint event]
          irq_vectors:error_apic_entry                       [Tracepoint event]
          irq_vectors:error_apic_exit                        [Tracepoint event]
          irq_vectors:irq_work_entry                         [Tracepoint event]
          irq_vectors:irq_work_exit                          [Tracepoint event]
          irq_vectors:local_timer_entry                      [Tracepoint event]
          irq_vectors:local_timer_exit                       [Tracepoint event]
          irq_vectors:reschedule_entry                       [Tracepoint event]
          irq_vectors:reschedule_exit                        [Tracepoint event]
          irq_vectors:spurious_apic_entry                    [Tracepoint event]
          irq_vectors:spurious_apic_exit                     [Tracepoint event]
          irq_vectors:thermal_apic_entry                     [Tracepoint event]
          irq_vectors:thermal_apic_exit                      [Tracepoint event]
          irq_vectors:threshold_apic_entry                   [Tracepoint event]
          irq_vectors:threshold_apic_exit                    [Tracepoint event]
          irq_vectors:x86_platform_ipi_entry                 [Tracepoint event]
          irq_vectors:x86_platform_ipi_exit                  [Tracepoint event]
        #
      
      And some may be sampled:
      
        [root@jouet ~]# perf record -e irq_vectors:local* sleep 20s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.020 MB perf.data (2 samples) ]
        [root@jouet ~]# perf report -D | egrep 'stats:|events:'
        Aggregated stats:
                   TOTAL events:        155
                    MMAP events:        144
                    COMM events:          2
                    EXIT events:          1
                  SAMPLE events:          2
                   MMAP2 events:          4
          FINISHED_ROUND events:          1
               TIME_CONV events:          1
        irq_vectors:local_timer_entry stats:
                   TOTAL events:          1
                  SAMPLE events:          1
        irq_vectors:local_timer_exit stats:
                   TOTAL events:          1
                  SAMPLE events:          1
        [root@jouet ~]#
      
      But, as shown in the tracepoint definition at the start of this message,
      some, like "irq_vectors:irq_work_exit", may not be sampled, just counted,
      i.e. if we try to sample, as when using 'perf record', we get an error:
      
        [root@jouet ~]# perf record -e irq_vectors:irq_work_exit
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
      <SNIP>
      
      The error message is misleading, this patch will help in pointing out
      what is the event causing such an error, but the error message needs
      improvement, i.e. we need to figure out a way to check if a tracepoint
      is counting only, like this one, when all we can do is to count it with
      'perf stat', at most printing the delta using interval printing, as in:
      
         [root@jouet ~]# perf stat -I 5000 -e irq_vectors:irq_work_*
        #           time             counts unit events
             5.000168871                  0      irq_vectors:irq_work_entry
             5.000168871                  0      irq_vectors:irq_work_exit
            10.000676730                  0      irq_vectors:irq_work_entry
            10.000676730                  0      irq_vectors:irq_work_exit
            15.001122415                  0      irq_vectors:irq_work_entry
            15.001122415                  0      irq_vectors:irq_work_exit
            20.001298051                  0      irq_vectors:irq_work_entry
            20.001298051                  0      irq_vectors:irq_work_exit
            25.001485020                  1      irq_vectors:irq_work_entry
            25.001485020                  1      irq_vectors:irq_work_exit
            30.001658706                  0      irq_vectors:irq_work_entry
            30.001658706                  0      irq_vectors:irq_work_exit
        ^C    32.045711878                  0      irq_vectors:irq_work_entry
            32.045711878                  0      irq_vectors:irq_work_exit
      
        [root@jouet ~]#
      
      But at least, when we use a wildcard, this patch helps a bit.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1491566932-503-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32ccb130
    • A
      perf callchains: Switch from strtok() to strtok_r() when parsing options · dadafc31
      Arnaldo Carvalho de Melo 提交于
      Trying to keep everything reentrant.
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/n/tip-rdce0p2k9e1b4qnrb8ki9mtf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dadafc31
  4. 07 4月, 2017 1 次提交
  5. 05 4月, 2017 1 次提交
    • T
      perf annotate: Fix missing number of samples for source_line_samples · 99094a5e
      Taeung Song 提交于
      The option 'show-total-period' works fine without a option '-l'.  But if
      running 'perf annotate --stdio -l --show-total-period', you can see a
      problem showing only zero '0' for number of samples.
      
      Before:
          $ perf annotate --stdio -l --show-total-period
      ...
             0 :        400816:       push   %rbp
             0 :        400817:       mov    %rsp,%rbp
             0 :        40081a:       mov    %edi,-0x24(%rbp)
             0 :        40081d:       mov    %rsi,-0x30(%rbp)
             0 :        400821:       mov    -0x24(%rbp),%eax
             0 :        400824:       mov    -0x30(%rbp),%rdx
             0 :        400828:       mov    (%rdx),%esi
             0 :        40082a:       mov    $0x0,%edx
      ...
      
      The reason is it was missed to set number of samples of
      source_line_samples, so set it ordinarily.
      
      After:
          $ perf annotate --stdio -l --show-total-period
      ...
             3 :        400816:       push   %rbp
             4 :        400817:       mov    %rsp,%rbp
             0 :        40081a:       mov    %edi,-0x24(%rbp)
             0 :        40081d:       mov    %rsi,-0x30(%rbp)
             1 :        400821:       mov    -0x24(%rbp),%eax
             2 :        400824:       mov    -0x30(%rbp),%rdx
             0 :        400828:       mov    (%rdx),%esi
             1 :        40082a:       mov    $0x0,%edx
      ...
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 0c4a5bce ("perf annotate: Display total number of samples with --show-total-period")
      Link: http://lkml.kernel.org/r/1490703125-13643-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99094a5e
  6. 04 4月, 2017 2 次提交
  7. 31 3月, 2017 1 次提交
  8. 30 3月, 2017 1 次提交
  9. 29 3月, 2017 1 次提交
    • J
      perf report: Drop cycles 0 for LBR print · c1dfcfad
      Jin Yao 提交于
      For some platforms, for example Broadwell, it doesn't support cycles
      for LBR. But the perf always prints cycles:0, it's not necessary.
      
      The patch refactors the LBR info print code and drops the cycles:0.
      
      For example: perf report --branch-history --no-children --stdio
      
      On Broadwell:
      --0.91%--__random_r random_r.c:394 (iterations:2)
                __random_r random_r.c:360 (predicted:0.0%)
                __random_r random_r.c:380 (predicted:0.0%)
                __random_r random_r.c:357
      
      On Skylake:
      --1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
                main div.c:44 (predicted:52.4% cycles:1)
                main div.c:42 (cycles:2)
                compute_flag div.c:28 (cycles:2)
                compute_flag div.c:27 (cycles:1)
                rand rand.c:28 (cycles:1)
                rand rand.c:28 (cycles:1)
                __random random.c:298 (cycles:1)
                __random random.c:297 (cycles:1)
                __random random.c:295 (cycles:1)
                __random random.c:295 (cycles:1)
                __random random.c:295 (cycles:1)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1489046786-10061-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1dfcfad
  10. 28 3月, 2017 9 次提交
    • R
      perf/sdt/x86: Move OP parser to tools/perf/arch/x86/ · d451a205
      Ravi Bangoria 提交于
      SDT marker argument is in N@OP format. N is the size of argument and OP
      is the actual assembly operand. OP is arch dependent component and hence
      it's parsing logic also should be placed under tools/perf/arch/.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170328094754.3156-3-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d451a205
    • A
      perf tools: Remove support for command aliases · c6867701
      Arnaldo Carvalho de Melo 提交于
      This came from 'git', but isn't documented anywhere in
      tools/perf/Documentation/, looks like baggage we can do without, ditch
      it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-e7uwkn60t4hmlnwj99ba4t2s@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c6867701
    • T
      perf utils: Readlink /proc/self/exe to find the perf binary · 55f77128
      Tommi Rantala 提交于
      Simplification: it is easier to open /proc/self/exe than /proc/$pid/exe.
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170322130624.21881-7-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      55f77128
    • T
      perf utils: Null terminate buf in read_ftrace_printk() · d4b364df
      Tommi Rantala 提交于
      Ensure that the string that we read from the data file is null terminated.
      
      Valgrind was complaining:
      
        ==31357== Invalid read of size 1
        ==31357==    at 0x4EC8C1: __strtok_r_1c (string2.h:200)
        ==31357==    by 0x4EC8C1: parse_ftrace_printk (trace-event-parse.c:161)
        ==31357==    by 0x4F82A8: read_ftrace_printk (trace-event-read.c:204)
        ==31357==    by 0x4F82A8: trace_report (trace-event-read.c:468)
        ==31357==    by 0x4CD552: process_tracing_data (header.c:1576)
        ==31357==    by 0x4D3397: perf_file_section__process (header.c:2705)
        ==31357==    by 0x4D3397: perf_header__process_sections (header.c:2488)
        ==31357==    by 0x4D3397: perf_session__read_header (header.c:2925)
        ==31357==    by 0x4E71E2: perf_session__open (session.c:32)
        ==31357==    by 0x4E71E2: perf_session__new (session.c:139)
        ==31357==    by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
        ==31357==    by 0x497150: run_builtin (perf.c:359)
        ==31357==    by 0x428CE0: handle_internal_command (perf.c:421)
        ==31357==    by 0x428CE0: run_argv (perf.c:467)
        ==31357==    by 0x428CE0: main (perf.c:614)
        ==31357==  Address 0x8ac0efb is 0 bytes after a block of size 1,963 alloc'd
        ==31357==    at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
        ==31357==    by 0x4F827B: read_ftrace_printk (trace-event-read.c:195)
        ==31357==    by 0x4F827B: trace_report (trace-event-read.c:468)
        ==31357==    by 0x4CD552: process_tracing_data (header.c:1576)
        ==31357==    by 0x4D3397: perf_file_section__process (header.c:2705)
        ==31357==    by 0x4D3397: perf_header__process_sections (header.c:2488)
        ==31357==    by 0x4D3397: perf_session__read_header (header.c:2925)
        ==31357==    by 0x4E71E2: perf_session__open (session.c:32)
        ==31357==    by 0x4E71E2: perf_session__new (session.c:139)
        ==31357==    by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
        ==31357==    by 0x497150: run_builtin (perf.c:359)
        ==31357==    by 0x428CE0: handle_internal_command (perf.c:421)
        ==31357==    by 0x428CE0: run_argv (perf.c:467)
        ==31357==    by 0x428CE0: main (perf.c:614)
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170322130624.21881-6-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d4b364df
    • T
      perf utils: use sizeof(buf) - 1 in readlink() call · b7126ef7
      Tommi Rantala 提交于
      Ensure that we have space for the null byte in buf.
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170322130624.21881-5-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b7126ef7
    • T
      perf buildid: Do not assume that readlink() returns a null terminated string · 5a234211
      Tommi Rantala 提交于
      Valgrind was complaining:
      
        $ valgrind ./perf list >/dev/null
        ==11643== Memcheck, a memory error detector
        ==11643== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
        ==11643== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
        ==11643== Command: ./perf list
        ==11643==
        ==11643== Conditional jump or move depends on uninitialised value(s)
        ==11643==    at 0x4C30620: rindex (vg_replace_strmem.c:199)
        ==11643==    by 0x49DAA9: build_id_cache__origname (build-id.c:198)
        ==11643==    by 0x49E1C7: build_id_cache__valid_id (build-id.c:222)
        ==11643==    by 0x49E1C7: build_id_cache__list_all (build-id.c:507)
        ==11643==    by 0x4B9C8F: print_sdt_events (parse-events.c:2067)
        ==11643==    by 0x4BB0B3: print_events (parse-events.c:2313)
        ==11643==    by 0x439501: cmd_list (builtin-list.c:53)
        ==11643==    by 0x497150: run_builtin (perf.c:359)
        ==11643==    by 0x428CE0: handle_internal_command (perf.c:421)
        ==11643==    by 0x428CE0: run_argv (perf.c:467)
        ==11643==    by 0x428CE0: main (perf.c:614)
        [...]
      
      Additionally, a zero length result from readlink() is not very interesting.
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170322130624.21881-3-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a234211
    • T
      perf buildid: Do not update SDT cache with null filename · 2ccc2202
      Tommi Rantala 提交于
      Valgrind was complaining:
      
        ==2633== Syscall param open(filename) points to unaddressable byte(s)
        ==2633==    at 0x5281CC0: __open_nocancel (syscall-template.S:84)
        ==2633==    by 0x537D38: open (fcntl2.h:53)
        ==2633==    by 0x537D38: get_sdt_note_list (symbol-elf.c:2017)
        ==2633==    by 0x5396FD: probe_cache__scan_sdt (probe-file.c:700)
        ==2633==    by 0x49EA2C: build_id_cache__add_sdt_cache (build-id.c:625)
        ==2633==    by 0x49EA2C: build_id_cache__add_s (build-id.c:697)
        ==2633==    by 0x49EE72: build_id_cache__add_b (build-id.c:717)
        ==2633==    by 0x49EE72: dso__cache_build_id (build-id.c:782)
        ==2633==    by 0x49F190: __dsos__cache_build_ids (build-id.c:793)
        ==2633==    by 0x49F190: machine__cache_build_ids (build-id.c:801)
        ==2633==    by 0x49F190: perf_session__cache_build_ids (build-id.c:815)
        ==2633==    by 0x4CD4F2: write_build_id (header.c:165)
        ==2633==    by 0x4D26F7: do_write_feat (header.c:2296)
        ==2633==    by 0x4D26F7: perf_header__adds_write (header.c:2335)
        ==2633==    by 0x4D26F7: perf_session__write_header (header.c:2414)
        ==2633==    by 0x43B324: __cmd_record (builtin-record.c:1154)
        ==2633==    by 0x43B324: cmd_record (builtin-record.c:1839)
        ==2633==    by 0x455A07: __cmd_record (builtin-kmem.c:1868)
        ==2633==    by 0x455A07: cmd_kmem (builtin-kmem.c:1944)
        ==2633==    by 0x497150: run_builtin (perf.c:359)
        ==2633==    by 0x428CE0: handle_internal_command (perf.c:421)
        ==2633==    by 0x428CE0: run_argv (perf.c:467)
        ==2633==    by 0x428CE0: main (perf.c:614)
        ==2633==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
      Link: http://lkml.kernel.org/r/20170322130624.21881-2-tommi.t.rantala@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2ccc2202
    • T
      perf annotate: Fix a bug of division by zero when calculating percent · 2e933b12
      Taeung Song 提交于
      Currently perf-annotate with --print-line can print
      -nan(0x8000000000000) because of division by zero when calculating
      percent. The division by zero happens when a sum of samples is zero in
      symbol__get_source_line(), so fix it.
      
      For example:
      
      After running 'perf record' like below,
      
          $ perf record -e "{cycles,page-faults,branch-misses}" ./a.out
      
      Before:
      
          $ perf annotate --stdio -l
      
        Sorted summary for file /home/taeung/workspace/a.out
        ----------------------------------------------
      
         32.89    -nan    7.04 a.c:38
         25.14    -nan    0.00 a.c:34
         16.26    -nan   56.34 a.c:31
         15.88    -nan    1.41 a.c:37
          5.67    -nan    0.00 a.c:39
          1.13    -nan   35.21 a.c:26
          0.95    -nan    0.00 a.c:44
          0.57    -nan    0.00 a.c:32
         Percent                 |      Source code & Disassembly of a.out for cycles (529 samples)
        -----------------------------------------------------------------------------------------
                               :
        ...
      
         a.c:26    0.57    -nan    4.23 :         40081a:       mov    %edi,-0x24(%rbp)
         a.c:26    0.00    -nan    9.86 :         40081d:       mov    %rsi,-0x30(%rbp)
      
        ...
      
      However, if a sum of samples is zero (e.g. 'page-faults'),
      skip calculating percent.
      
      After:
      
          $ perf annotate --stdio -l
      
        Sorted summary for file /home/taeung/workspace/a.out
        ----------------------------------------------
      
         32.89    0.00    7.04 a.c:38
         25.14    0.00    0.00 a.c:34
         16.26    0.00   56.34 a.c:31
         15.88    0.00    1.41 a.c:37
          5.67    0.00    0.00 a.c:39
          1.13    0.00   35.21 a.c:26
          0.95    0.00    0.00 a.c:44
          0.57    0.00    0.00 a.c:32
         Percent                 |      Source code & Disassembly of old for cycles (529 samples)
        -----------------------------------------------------------------------------------------
                               :
        ...
      
        a.c:26    0.57    0.00    4.23 :         40081a:       mov    %edi,-0x24(%rbp)
        a.c:26    0.00    0.00    9.86 :         40081d:       mov    %rsi,-0x30(%rbp)
      
        ...
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1490598638-13947-3-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2e933b12
    • T
      perf annotate: Fix a bug following symbolic link of a build-id file · 6ebd2547
      Taeung Song 提交于
      It is wrong way to read link name from a build-id file.  Because a
      build-id file is not anymore a symbolic link but build-id directory of
      it is symbolic link, so fix it.
      
      For example, if build-id file name gotten from
      dso__build_id_filename() is as below,
      
        /root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1/elf
      
      To correctly read link name of build-id, use the build-id dir path that
      is a symbolic link, instead of the above build-id file name like below.
      
        /root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1490598638-13947-2-git-send-email-treeze.taeung@gmail.com
      Fixes: 01412261 ("perf buildid-cache: Use path/to/bin/buildid/elf instead of path/to/bin/buildid")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6ebd2547
  11. 27 3月, 2017 6 次提交
    • M
      perf report: Enable sorting by srcline as key · 5dfa210e
      Milian Wolff 提交于
      Often it is interesting to know how costly a given source line is in
      total. Previously, one had to build these sums manually based on all
      addresses that pointed to the same source line. This patch introduces
      srcline as a sort key, which will do the aggregation for us.
      
      Paired with the recent addition of showing inline frames, this makes
      perf report much more useful for many C++ work loads.
      
      The following shows the new feature in action. First, let's show the
      status quo output when we sort by address. The result contains many hist
      entries that generate the same output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g address
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--60.31%--hypot +20
                  |          |          |
                  |          |          |--8.52%--__hypot_finite +273
                  |          |          |
                  |          |          |--7.32%--__hypot_finite +411
      ...
                   --35.34%--_start +4194346
                             __libc_start_main +241
                             |
                             |--6.65%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--2.70%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                             |
                             |--1.69%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      
      With this patch and `-g srcline` we instead get the following output:
      
        ~~~~~~~~~~~~~~~~
        $ perf report --stdio --inline -g srcline
        # Children      Self  Command       Shared Object        Symbol
        # ........  ........  ............  ...................  .........................................
        #
            99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
                  |
                  |--64.55%--main complex:655
                  |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                  |          /usr/include/c++/6.3.1/complex:664 (inline)
                  |          |
                  |          |--64.02%--hypot
                  |          |          |
                  |          |           --59.81%--__hypot_finite
                  |          |
                  |           --0.53%--cabs
                  |
                   --35.34%--_start
                             __libc_start_main
                             |
                             |--12.48%--main random.tcc:3326
                             |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                             |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
        ...
        ~~~~~~~~~~~~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dfa210e
    • J
      perf report: Show inline stack for browser mode · 0d3eb0b7
      Jin Yao 提交于
      If the address belongs to an inlined function, the source information
      back to the first non-inlined function will be printed.
      
      For example:
      
      1. Show inlined function name
         perf report -g function --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
         - dl_main
              0.56% _dl_relocate_object
               _dl_relocate_object (inline)
               elf_dynamic_do_Rela (inline)
      
      2. Show the file/line information
         perf report -g address --inline
      
      -    0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start
           _dl_start rtld.c:307
            /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
         + _dl_sysdep_start dl-sysdep.c:250
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d3eb0b7
    • J
      perf report: Introduce --inline option · f3a60646
      Jin Yao 提交于
      It takes some time to look for inline stack for callgraph addresses.  So
      it provides new option "--inline" to let user decide if enable this
      feature.
      
        --inline:
      
        If a callgraph address belongs to an inlined function, the inline stack
        will be printed. Each entry is the inline function name or file/line.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3a60646
    • J
      perf report: Find the inline stack for a given address · a64489c5
      Jin Yao 提交于
      It would be useful for perf to support a mode to query the inline stack
      for a given callgraph address. This would simplify finding the right
      code in code that does a lot of inlining.
      
      The srcline.c has contained the code which supports to translate the
      address to filename:line_nr. This patch just extends the function to let
      it support getting the inline stacks.
      
      It introduces the inline_list which will store the inline function
      result (filename:line_nr and funcname).
      
      If BFD lib is not supported, the result is only filename:line_nr.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a64489c5
    • J
      perf report: Refactor common code in srcline.c · 5580338d
      Jin Yao 提交于
      Introduce dso__name() and filename_split() out of existing code because
      these codes will be used in several places in next patch.
      
      For filename_split(), it may also solve a potential memory leak in
      existing code. In existing addr2line(),
      
              sep = strchr(filename, ':');
              if (sep) {
                      *sep++ = '\0';
                      *file = filename;
                      *line_nr = strtoul(sep, NULL, 0);
                      ret = 1;
              }
      
      out:
              pclose(fp);
              return ret;
      
      If sep is NULL, filename is not freed or returned via file.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/1490474069-15823-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5580338d
    • A
      perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms() · c3a0bbc7
      Adrian Hunter 提交于
      Address filtering with kernel symbols incorrectly resulted in the error
      "Cannot determine size of symbol" because the no_size logic was the wrong
      way around.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NAndi Kleen <ak@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.9+
      Link: http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c3a0bbc7
  12. 23 3月, 2017 4 次提交
    • A
      perf list: Move extra details printing to new option · bf874fcf
      Andi Kleen 提交于
      Move the printing of perf expressions and internal events to a new
      clearer --details flag, instead of lumping it together with other debug
      options in --debug. This makes it clearer to use.
      
      Before
      
        perf list --debug
        ...
        unc_m_power_critical_throttle_cycles
               [Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
                uncore_imc_2/event=0x86/  MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
      
      after
      
        perf list --details
        ...
        unc_m_power_critical_throttle_cycles
               [Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
                uncore_imc_2/event=0x86/  MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/20170320201711.14142-14-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf874fcf
    • A
      perf pmu: Add support for MetricName JSON attribute · 96284814
      Andi Kleen 提交于
      Add support for a new JSON event attribute to name MetricExpr for better
      output in perf stat.
      
      If the event has no MetricName it uses the normal event name instead to
      describe the metric.
      
      Before
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time unc_p_freq_max_os_cycles
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      
      After
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time freq_max_os_cycles %
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      96284814
    • A
      perf list: Support printing MetricExpr with --debug · 7f372a63
      Andi Kleen 提交于
      Output the metric expr in perf list when --debug is specified, so that
      the user can check the formula.
      
      Before:
      
        % perf list
          ...
          unc_m_power_channel_ppd
               [Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
                uncore_imc]
                uncore_imc_2/event=0x85/
      
      After:
      
        % perf list --debug
          ...
          unc_m_power_channel_ppd
               [Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
                uncore_imc]
                Perf: uncore_imc_2/event=0x85/ MetricExpr: (unc_m_power_channel_ppd / unc_m_clockticks) * 100.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-12-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7f372a63
    • A
      perf stat: Output JSON MetricExpr metric · 37932c18
      Andi Kleen 提交于
      Add generic infrastructure to perf stat to output ratios for
      "MetricExpr" entries in the event lists. Many events are more useful as
      ratios than in raw form, typically some count in relation to total
      ticks.
      
      Transfer the MetricExpr information from the alias to the evsel.
      
      We mark the events that need to be collected for MetricExpr, and also
      link the events using them with a pointer. The code is careful to always
      prefer the right event in the same group to minimize multiplexing
      errors. At the moment only a single relation is supported.
      
      Then add a rblist to the stat shadow code that remembers stats based on
      the cpu and context.
      
      Then finally update and retrieve and print these values similarly to the
      existing hardcoded perf metrics. We use the simple expression parser
      added earlier to evaluate the expression.
      
      Normally we just output the result without further commentary, but for
      --metric-only this would lead to empty columns. So for this case use the
      original event as description.
      
      There is no attempt to automatically add the MetricExpr event, if it is
      missing, however we suggest it to the user, because the user tool
      doesn't have enough information to reliably construct a group that is
      guaranteed to schedule. So we leave that to the user.
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
             1.000147889        800,085,181      unc_p_clockticks
             1.000147889         93,126,241      unc_p_freq_max_os_cycles  #     11.6
             2.000448381        800,218,217      unc_p_clockticks
             2.000448381        142,516,095      unc_p_freq_max_os_cycles  #     17.8
             3.000639852        800,243,057      unc_p_clockticks
             3.000639852        162,292,689      unc_p_freq_max_os_cycles  #     20.3
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
        #    time         freq_max_os_cycles %
             1.000127077      0.9
             2.000301436      0.7
             3.000456379      0.0
      
      v2: Change from DivideBy to MetricExpr
      v3: Use expr__ prefix.  Support more than one other event.
      v4: Update description
      v5: Only print warning message once for multiple PMUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37932c18