1. 22 1月, 2019 1 次提交
  2. 04 1月, 2019 1 次提交
    • J
      perf report: Fix wrong iteration count in --branch-history · a3366db0
      Jin Yao 提交于
      By calculating the removed loops, we can get the iteration count.
      
      But the iteration count could be reported incorrectly, reporting
      impossibly high counts.
      
      That's because previous code uses the number of removed LBR entries for
      the iteration count. That's not good. Fix this by increasing the
      iteration count when a loop is detected.
      
      When matching the chain, the iteration count would be added up, finally we need
      to compute the average value when printing out.
      
      For example,
      
        $ perf report --branch-history --stdio --no-children
      
      Before:
      
        ---f2 +0
           |
           |--33.62%--f1 +9 (cycles:1)
           |          f1 +0
           |          main +22 (cycles:1)
           |          main +17
           |          main +38 (cycles:1)
           |          main +27
           |          f1 +26 (cycles:1)
           |          f1 +24
           |          f2 +27 (cycles:7)
           |          f2 +0
           |          f1 +19 (cycles:1)
           |          f1 +14
           |          f2 +27 (cycles:11)
           |          f2 +0
           |          f1 +9 (cycles:1 iter:2968 avg_cycles:3)
           |          f1 +0
           |          main +22 (cycles:1 iter:2968 avg_cycles:3)
           |          main +17
           |          main +38 (cycles:1 iter:2968 avg_cycles:3)
      
      2968 is an impossible high iteration count and avg_cycles is too small.
      
      After:
      
        ---f2 +0
           |
           |--33.62%--f1 +9 (cycles:1)
           |          f1 +0
           |          main +22 (cycles:1)
           |          main +17
           |          main +38 (cycles:1)
           |          main +27
           |          f1 +26 (cycles:1)
           |          f1 +24
           |          f2 +27 (cycles:7)
           |          f2 +0
           |          f1 +19 (cycles:1)
           |          f1 +14
           |          f2 +27 (cycles:11)
           |          f2 +0
           |          f1 +9 (cycles:1 iter:1 avg_cycles:23)
           |          f1 +0
           |          main +22 (cycles:1 iter:1 avg_cycles:23)
           |          main +17
           |          main +38 (cycles:1 iter:1 avg_cycles:23)
      
      avg_cycles:23 is the average cycles of this iteration.
      
      Fixes: c4ee0625 ("perf report: Calculate the average cycles of iterations")
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1546582230-17507-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a3366db0
  3. 18 12月, 2018 3 次提交
  4. 31 10月, 2018 2 次提交
    • D
      perf tools: Don't clone maps from parent when synthesizing forks · 4f8f382e
      David Miller 提交于
      When synthesizing FORK events, we are trying to create thread objects
      for the already running tasks on the machine.
      
      Normally, for a kernel FORK event, we want to clone the parent's maps
      because that is what the kernel just did.
      
      But when synthesizing, this should not be done.  If we do, we end up
      with overlapping maps as we process the sythesized MMAP2 events that
      get delivered shortly thereafter.
      
      Use the FORK event misc flags in an internal way to signal this
      situation, so we can elide the map clone when appropriate.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joe Mario <jmario@redhat.com>
      Link: http://lkml.kernel.org/r/20181030.222404.2085088822877051075.davem@davemloft.net
      [ Added comment about flag use in machine__process_fork_event(),
        use ternary op in thread__clone_map_groups() as suggested by Jiri ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f8f382e
    • D
      perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} · e9024d51
      David S. Miller 提交于
      When processing using 'perf report -g caller', which is the default, we
      ended up reverting the callchain entries received from the kernel, but
      simply reverting throws away the information that tells that from a
      point onwards the addresses are for userspace, kernel, guest kernel,
      guest user, hypervisor.
      
      The idea is that if we are walking backwards, for each cluster of
      non-cpumode entries we have to first scan backwards for the next one and
      use that for the cluster.
      
      This seems silly and more expensive than it needs to be but it is enough
      for a initial fix.
      
      The code here is really complicated because it is intimately intertwined
      with the lbr and branch handling, as well as this callchain order,
      further fixes will be needed to properly take into account the cpumode
      in those cases.
      
      Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
      end of most callchains shows up at the top of the histogram because
      every callchain contains it and with ORDER_CALLER it is the first entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Souvik Banerjee <souvik1997@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # 4.19
      Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9024d51
  5. 05 10月, 2018 1 次提交
  6. 28 9月, 2018 1 次提交
    • M
      perf report: Don't try to map ip to invalid map · ff4ce288
      Milian Wolff 提交于
      Fixes a crash when the report encounters an address that could not be
      associated with an mmaped region:
      
        #0  0x00005555557bdc4a in callchain_srcline (ip=<error reading variable: Cannot access memory at address 0x38>, sym=0x0, map=0x0) at util/machine.c:2329
        #1  unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
        #2  0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 <unwind_entry>, thread=<optimized out>, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
        #3  get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 <unwind_entry>, arg=0x7ffff5642498, max_stack=<optimized out>) at util/unwind-libunwind-local.c:703
        #4  0x0000555555837192 in _unwind__get_entries (cb=<optimized out>, arg=<optimized out>, thread=<optimized out>, data=<optimized out>, max_stack=<optimized out>) at util/unwind-libunwind-local.c:725
        #5  0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
        #6  thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
        #7  0x00005555557ba4ee in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fffffff97b8, evsel=<optimized out>, al=al@entry=0x7fffffff9750,
            max_stack=<optimized out>) at util/callchain.c:1085
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Tested-by: NSandipan Das <sandipan@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 2a9d5050 ("perf script: Show correct offsets for DWARF-based unwinding")
      Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff4ce288
  7. 20 8月, 2018 1 次提交
  8. 25 7月, 2018 4 次提交
    • J
      perf machine: Use last_match threads cache only in single thread mode · b57334b9
      Jiri Olsa 提交于
      There's an issue with using threads::last_match in multithread mode
      which is enabled during the perf top synthesize. It might crash with
      following assertion:
      
        perf: ...include/linux/refcount.h:109: refcount_inc:
              Assertion `!(!refcount_inc_not_zero(r))' failed.
      
      The gdb backtrace looks like this:
      
        0x00007ffff50839fb in raise () from /lib64/libc.so.6
        (gdb)
        #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
        #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
        #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
        #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
        #4  0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
            at ...include/linux/refcount.h:109
        #5  0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
            at util/thread.c:115
        #6  0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
            threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
        #7  0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
            pid=2, tid=2) at util/machine.c:489
        #8  0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
            pid=2, tid=2) at util/machine.c:499
        #9  0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
        ...
      
      The failing assertion is this one:
      
        REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
      
      the problem is that we don't serialize access to threads::last_match.
      We serialize the access to the threads tree, but we don't care how's
      threads::last_match being accessed. Both locked/unlocked paths use
      that data and can set it. In multithreaded mode we can end up with
      invalid object in thread__get call, like in following paths race:
      
        thread 1
          ...
          machine__findnew_thread
            down_write(&threads->lock);
            __machine__findnew_thread
              ____machine__findnew_thread
                th = threads->last_match;
                if (th->tid == tid) {
                  thread__get
      
        thread 2
          ...
          machine__find_thread
            down_read(&threads->lock);
            __machine__findnew_thread
              ____machine__findnew_thread
                th = threads->last_match;
                if (th->tid == tid) {
                  thread__get
      
        thread 3
          ...
          machine__process_fork_event
            machine__remove_thread
              __machine__remove_thread
                threads->last_match = NULL
                thread__put
            thread__put
      
      Thread 1 and 2 might got stale last_match, before thread 3 clears
      it. Thread 1 and 2 then race with thread 3's thread__put and they
      might trigger the refcnt == 0 assertion above.
      
      The patch is disabling the last_match cache for multiple thread
      mode. It was originally meant for single thread scenarios, where
      it's common to have multiple sequential searches of the same
      thread.
      
      In multithread mode this does not make sense, because top's threads
      processes different /proc entries and so the 'struct threads' object
      is queried for various threads. Moreover we'd need to add more locks
      to make it work.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b57334b9
    • J
      perf machine: Add threads__set_last_match function · 67fda0f3
      Jiri Olsa 提交于
      Separating threads::last_match cache set into separate
      threads__set_last_match function.  This will be useful in following
      patch.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      67fda0f3
    • J
      perf machine: Add threads__get_last_match function · f8b2ebb5
      Jiri Olsa 提交于
      Separating threads::last_match cache read/check into separate
      threads__get_last_match function. This will be useful in following
      patch.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f8b2ebb5
    • S
      perf script: Show correct offsets for DWARF-based unwinding · 2a9d5050
      Sandipan Das 提交于
      When perf/data is recorded with the dwarf call-graph option, the
      callchain shown by 'perf script' still shows the binary offsets of the
      userspace symbols instead of their virtual addresses. Since the symbol
      offset calculation is based on using virtual address as the ip, we see
      incorrect offsets as well.
      
      The use of virtual addresses affects the ability to find out the
      line number in the corresponding source file to which an address
      maps to as described in commit 67540759 ("perf unwind: Use
      addr_location::addr instead of ip for entries").
      
      This has also been addressed by temporarily converting the virtual
      address to the correponding binary offset so that it can be mapped
      to the source line number correctly.
      
      This is a follow-up for commit 19610184 ("perf script: Show
      virtual addresses instead of offsets").
      
      This can be verified on a powerpc64le system running Fedora 27 as
      shown below:
      
        # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
        # perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1
      
      Before:
      
        # perf report --stdio --no-children -s sym,srcline -g address
      
        # Samples: 1  of event 'probe_libc:inet_pton'
        # Event count (approx.): 1
        #
        # Overhead  Symbol                Source:Line
        # ........  ....................  ...........
        #
           100.00%  [.] __GI___inet_pton  inet_pton.c
                    |
                    ---gaih_inet getaddrinfo.c:537 (inlined)
                       __GI_getaddrinfo getaddrinfo.c:2304 (inlined)
                       main ping.c:519
                       generic_start_main libc-start.c:308 (inlined)
                       __libc_start_main libc-start.c:102
        ...
      
        # perf script -F comm,ip,sym,symoff,srcline,dso
      
        ping
                          15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
          libc-2.26.so[ffff80004ca0af28]
                          10fa53 gaih_inet+0xffff000099160f43
          libc-2.26.so[ffff80004c9bfa53] (inlined)
                          1105b3 __GI_getaddrinfo+0xffff000099160163
          libc-2.26.so[ffff80004c9c05b3] (inlined)
                            2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
          ping[fffffffecf882d6f]
                           2369f generic_start_main+0xffff00009916013f
          libc-2.26.so[ffff80004c8d369f] (inlined)
                           23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
          libc-2.26.so[ffff80004c8d3897]
      
      After:
      
        # perf report --stdio --no-children -s sym,srcline -g address
      
        # Samples: 1  of event 'probe_libc:inet_pton'
        # Event count (approx.): 1
        #
        # Overhead  Symbol                Source:Line
        # ........  ....................  ...........
        #
           100.00%  [.] __GI___inet_pton  inet_pton.c
                    |
                    ---gaih_inet.constprop.7 getaddrinfo.c:537
                       getaddrinfo getaddrinfo.c:2304
                       main ping.c:519
                       generic_start_main.isra.0 libc-start.c:308
                       __libc_start_main libc-start.c:102
        ...
      
        # perf script -F comm,ip,sym,symoff,srcline,dso
      
        ping
                    7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
          inet_pton.c:68
                    7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
          getaddrinfo.c:537
                    7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
          getaddrinfo.c:2304
                       130782d6f main+0x3df (/usr/bin/ping)
          ping.c:519
                    7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
          libc-start.c:308
                    7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
          libc-start.c:102
      Signed-off-by: NSandipan Das <sandipan@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Fixes: 67540759 ("perf unwind: Use addr_location::addr instead of ip for entries")
      Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a9d5050
  9. 23 5月, 2018 2 次提交
  10. 22 5月, 2018 3 次提交
  11. 19 5月, 2018 2 次提交
  12. 18 5月, 2018 1 次提交
    • S
      perf script: Show virtual addresses instead of offsets · 19610184
      Sandipan Das 提交于
      When perf data is recorded with the call-graph option enabled, the
      callchain shown by perf script shows the binary offsets of the symbols
      as the ip. This is incorrect for kernel symbols as the ip values are
      always off by a fixed offset depending on the architecture. If the
      offsets from the start of the symbols are printed, they are also
      incorrect for both kernel and userspace symbols.
      
      Without the call-graph option, the callchain shows the virtual addresses
      of the symbols rather than their binary offsets. The offsets printed in
      this case are also correct.
      
      This fixes the inconsistency in perf script's output.
      
      This can be verified on a powerpc64le system running Fedora 27 as
      follows:
      
        # cat /proc/kallsyms | grep sys_write
        ...
        c0000000004025a0 T sys_write
        c0000000004025a0 T __se_sys_write
        ...
      
        # perf probe -a sys_write
      
      Before applying this patch:
      
        # perf record -e probe:sys_write -g ~/test
        # perf script -F ip,sym,symoff
      
                          4125b0 sys_write+0x8000000000008010
                           1b9e0 system_call+0x8000000000008058
                          118234 __GI___libc_write+0xffff0000f52c0024
                           92c74 _IO_file_write@@GLIBC_2.17+0xffff0000f52c0044
                        5afbfd8a [unknown]
                           91a60 new_do_write+0xffff0000f52c0090
                           94638 _IO_do_write@@GLIBC_2.17+0xffff0000f52c0038
                           94bbc _IO_file_overflow@@GLIBC_2.17+0xffff0000f52c014c
                           95a24 __overflow+0xffff0000f52c0064
                           84548 _IO_puts+0xffff0000f52c0218
                             440 main+0xffffffffe0000020
                           236a0 generic_start_main.isra.0+0xffff0000f52c0140
                           23898 __libc_start_main+0xffff0000f52c00b8
                               0 [unknown]
        ...
      
        # perf record -e probe:sys_write ~/test
        # perf script -F ip,sym,symoff
      
        c0000000004025b0 sys_write+0x10
        ...
      
      After applying this patch:
      
        # perf record -e probe:sys_write -g ~/test
        # perf script -F ip,sym,symoff
      
                c0000000004025b0 sys_write+0x10
                c00000000000b9e0 system_call+0x58
                    7fffb70d8234 __GI___libc_write+0x24
                    7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44
                        5afc1818 [unknown]
                    7fffb7051a60 new_do_write+0x90
                    7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38
                    7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c
                    7fffb7055a24 __overflow+0x64
                    7fffb7044548 _IO_puts+0x218
                        10000440 main+0x20
                    7fffb6fe36a0 generic_start_main.isra.0+0x140
                    7fffb6fe3898 __libc_start_main+0xb8
                               0 [unknown]
        ...
      
        # perf record -e probe:sys_write ~/test
        # perf script -F ip,sym,symoff
      
        c0000000004025b0 sys_write+0x10
        ...
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180517063326.6319-1-sandipan@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      19610184
  13. 30 4月, 2018 1 次提交
  14. 27 4月, 2018 9 次提交
  15. 23 4月, 2018 1 次提交
  16. 17 3月, 2018 1 次提交
  17. 08 3月, 2018 1 次提交
  18. 19 2月, 2018 1 次提交
  19. 17 2月, 2018 4 次提交