1. 15 6月, 2017 1 次提交
    • A
      perf evsel: Fix probing of precise_ip level for default cycles event · 7a1ac110
      Arnaldo Carvalho de Melo 提交于
      Since commit 18e7a45a ("perf/x86: Reject non sampling events with
      precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
      with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
      in the routine used to probe the max precise level when no events were
      passed to 'perf record' or 'perf top', i.e.:
      
      	perf_evsel__new_cycles()
      		perf_event_attr__set_max_precise_ip()
      
      The x86 code, in x86_pmu_hw_config(), which is called all the way from
      sys_perf_event_open() did, starting with the aforementioned commit:
      
                      /* There's no sense in having PEBS for non sampling events: */
                      if (!is_sampling_event(event))
                              return -EINVAL;
      
      Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
      just the non precise cycles variant.
      
      To make sure that this is the case, I tested it, before this patch,
      with:
      
        # perf probe -L x86_pmu_hw_config
        <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
              0  int x86_pmu_hw_config(struct perf_event *event)
              1  {
              2         if (event->attr.precise_ip) {
      <SNIP>
             17                 if (event->attr.precise_ip > precise)
             18                         return -EOPNOTSUPP;
      
                                /* There's no sense in having PEBS for non sampling events: */
             21                 if (!is_sampling_event(event))
             22                         return -EINVAL;
                        }
      <SNIP>
        # perf probe x86_pmu_hw_config:22
        Added new events:
          probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
          probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
      
        You can now use it in all perf tools, such as:
      
              perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
      
        # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
           0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
          41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
          41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
          41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
          41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
        #
      
      I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
      
      So fix it by just setting attr.sample_period
      
      Now, after this patch:
      
        # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
           0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_open_cloexec_flag (/home/acme/bin/perf)
           0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
           0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
        [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
        #
      
      The probe one called from perf_event_attr__set_max_precise_ip() works
      the first time, with attr.precise_ip = 3, wit hthe next ones being the
      per cpu ones for the cycles:ppp event.
      
      And here is the text from a report and alternative proposed patch by
      Thomas-Mich Richter:
      
       ---
      
      On s390 the counter and sampling facility do not support a precise IP
      skid level and sometimes returns EOPNOTSUPP when structure member
      precise_ip in struct perf_event_attr is not set to zero.
      
      On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
      happens only when no events are specified on command line.
      
      The functions called are
      ...
        --> perf_evlist__add_default
            --> perf_evsel__new_cycles
                --> perf_event_attr__set_max_precise_ip
      
      The last function determines the value of structure member precise_ip by
      invoking the perf_event_open() system call and checking the return code.
      The first successful open is the value for precise_ip.
      
      However the value is determined without setting member sample_period and
      indicates no sampling.
      
      On s390 the counter facility and sampling facility are different.  The
      above procedure determines a precise_ip value of 3 using the counter
      facility. Later it uses the sampling facility with a value of 3 and
      fails with EOPNOTSUPP.
      
       ---
      
      v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
          of unnamed union members in the container struct initialization, so
          move from:
      
      	struct perf_event_attr attr = {
      		...
      		.sample_period = 1,
      	};
      
      to right after it as:
      
      	struct perf_event_attr attr = {
      		...
      	};
      
      	attr.sample_period = 1;
      
      v3: We need to reset .sample_period to 0 to let the users of
      perf_evsel__new_cycles() to properly setup attr.sample_period or
      attr.sample_freq. Reported by Ingo Molnar.
      Reported-and-Acked-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
      Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a1ac110
  2. 09 6月, 2017 9 次提交
  3. 08 6月, 2017 6 次提交
  4. 06 6月, 2017 7 次提交
    • M
      perf report: Ensure the perf DSO mapping matches what libdw sees · 2538b9e2
      Milian Wolff 提交于
      In some situations the libdw unwinder stopped working properly.  I.e.
      with libunwind we see:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	           608f3 _GLOBAL__sub_I_kdynamicjobtracker.cpp (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	            f199 call_init.part.0 (/usr/lib/ld-2.25.so)
      	            f2a5 _dl_init (/usr/lib/ld-2.25.so)
      	             db9 _dl_start_user (/usr/lib/ld-2.25.so)
      ~~~~~
      
      But with libdw and without this patch this sample is not properly
      unwound:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      ~~~~~
      
      Debug output showed me that libdw found a module for the last frame
      address, but it thinks it belongs to /usr/lib/ld-2.25.so. This patch
      double-checks what libdw sees and what perf knows. If the mappings
      mismatch, we now report the elf known to perf. This fixes the situation
      above, and the libdw unwinder produces the same stack as libunwind.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2538b9e2
    • M
      perf report: Include partial stacks unwound with libdw · 5ea0416f
      Milian Wolff 提交于
      So far the whole stack was thrown away when any error occurred before
      the maximum stack depth was unwound. This is actually a very common
      scenario though. The stacks that got unwound so far are still
      interesting. This removes a large chunk of differences when comparing
      perf script output for libunwind and libdw perf unwinding.
      
      E.g. with libunwind:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
                         192ca mmap64 (/usr/lib/ld-2.25.so)
                          59a9 _dl_map_object_from_fd (/usr/lib/ld-2.25.so)
                          83d0 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
                         1a3e0 strcmp (/usr/lib/ld-2.25.so)
                          82b2 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      ~~~~~
      
      With libdw without this patch:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
      ~~~~~
      
      With this patch applied, the libdw unwinder will produce the same
      output as the libunwind unwinder.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170601210021.20046-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5ea0416f
    • K
      perf annotate: Add missing powerpc triplet · 6db47fde
      Kim Phillips 提交于
      On an Ubuntu xenial system, 'perf annotate' says to install powerpc
      objdump on a system that already has binutils-powerpc-linux-gnu
      installed.  Make perf aware of the missing triplet for the
      powerpc-linux-gnu target.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170529142754.7fbfb1152fd8f2663de0ea70@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6db47fde
    • J
      perf test: Disable breakpoint signal tests for powerpc · 598762cf
      Jiri Olsa 提交于
      The following tests are failing on powerpc:
      
        # perf test break
        18: Breakpoint overflow signal handler  : FAILED!
        19: Breakpoint overflow sampling        : FAILED!
      
      The powerpc kenel so far does not have support to even create
      instruction breakpoints using the perf event interface, so those tests
      fail early in the config phase.
      
      I added a '->is_supported()' callback to test struct to be able to
      disable specific tests. It seems better than putting ifdefs directly to
      the test array.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170601205450.GA398@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      598762cf
    • N
      perf symbols: Use correct filename for compressed modules in build-id cache · a09935b8
      Namhyung Kim 提交于
      The decompress_kmodule() decompresses kernel modules in order to load
      symbols from it.  In the DSO_BINARY_TYPE__BUILD_ID_CACHE case, it needs
      the full file path to extract the file extension to determine the
      decompression method.  But overwriting 'name' will fail the
      decompression since it might point to a non-existing old file.
      
      Instead, use dso->long_name for having the correct extension and use the
      real filename to decompress.
      
      In the DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP case, both names should
      be the same.  This allows resolving symbols in the old modules.
      
      Before:
      
        $ perf report -i perf.data.old | grep scsi_mod
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000004aa6
           0.00%  as       [scsi_mod]    [k] 0x00000000000099e1
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000009830
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000001b8f
      
      After:
      
           0.00%  cc1      [scsi_mod]    [k] scsi_handle_queue_ramp_up
           0.00%  as       [scsi_mod]    [k] scsi_sg_alloc
           0.00%  cc1      [scsi_mod]    [k] scsi_setup_cmnd
           0.00%  cc1      [scsi_mod]    [k] scsi_get_command
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a09935b8
    • N
      perf symbols: Set module info when build-id event found · 6b335e8f
      Namhyung Kim 提交于
      Like machine__findnew_module_dso(), it should set necessary info for
      kernel modules to find symbol info from the file.  Factor out
      dso__set_module_info() to do it.
      
      This is needed for dso__needs_decompress() to detect such DSOs.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6b335e8f
    • N
      perf header: Set proper module name when build-id event found · 1deec1bd
      Namhyung Kim 提交于
      When perf processes build-id event, it creates DSOs with the build-id.
      But it didn't set the module short name (like '[module-name]') so when
      processing a kernel mmap event of the module, it cannot found the DSO as
      it only checks the short names.
      
      That leads for perf to create a same DSO without the build-id info and
      it'll lookup the system path even if the DSO is already in the build-id
      cache.  After kernel was updated, perf cannot find the DSO  and cannot
      show symbols in it anymore.
      
      You can see this if you have an old data file (w/ old kernel version):
      
        $ perf report -i perf.data.old -v |& grep scsi_mod
        build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz : cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1
        Failed to open /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz, continuing without symbols
        ...
      
      The second message didn't show the build-id.  With this patch:
      
        $ perf report -i perf.data.old -v |& grep scsi_mod
        build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz: cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1
        /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz with build id cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 not found, continuing without symbols
        ...
      
      Now it shows the build-id but still cannot load the symbol table.  This
      is a different problem which will be fixed in the next patch.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-1-namhyung@kernel.org
      [ Fix the build on older compilers (debian <= 8, fedora <= 21, etc) wrt kmod_path var init ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1deec1bd
  5. 02 6月, 2017 2 次提交
    • A
      perf stat: Only print NMI watchdog hint when enabled · 918c7b06
      Andi Kleen 提交于
      Only print the NMI watchdog hint when that watchdog it actually enabled.
      
      This avoids printing these unnecessarily.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857wz1i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      918c7b06
    • K
      perf annotate: Fix branch instruction with multiple operands · b13bbeee
      Kim Phillips 提交于
      'perf annotate' is dropping the cr* fields from branch instructions.
      
      Fix it by adding support to display branch instructions having
      multiple operands.
      
      Power Arch objdump of int_sqrt:
      
       20.36 | c0000000004d2694:   subf   r10,r10,r3
             | c0000000004d2698: v bgt    cr6,c0000000004d26a0 <int_sqrt+0x40>
        1.82 | c0000000004d269c:   mr     r3,r10
       29.18 | c0000000004d26a0:   mr     r10,r8
             | c0000000004d26a4: v bgt    cr7,c0000000004d26ac <int_sqrt+0x4c>
             | c0000000004d26a8:   mr     r10,r7
      
      Power Arch Before Patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    4c
             |       mr     r10,r7
      
      Power Arch After patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    cr6,40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    cr7,4c
             |       mr     r10,r7
      
      Also support AArch64 conditional branch instructions, which can
      have up to three operands:
      
      Aarch64 Non-simplified (raw objdump) view:
      
             │ffff0000083cd11c: ↑ cbz    w0, ffff0000083cd100 <security_fil▒
      ...
        4.44 │ffff000│083cd134: ↓ tbnz   w0, #26, ffff0000083cd190 <securit▒
      ...
        1.37 │ffff000│083cd144: ↓ tbnz   w22, #5, ffff0000083cd1a4 <securit▒
             │ffff000│083cd148:   mov    w19, #0x20000                   //▒
        1.02 │ffff000│083cd14c: ↓ tbz    w22, #2, ffff0000083cd1ac <securit▒
      ...
        0.68 │ffff000└──3cd16c: ↑ cbnz   w0, ffff0000083cd120 <security_fil▒
      
      Aarch64 Simplified, before this patch:
      
             │    ↑ cbz    40
      ...
        4.44 │   │↓ tbnz   w0, #26, ffff0000083cd190 <security_file_permiss▒
      ...
        1.37 │   │↓ tbnz   w22, #5, ffff0000083cd1a4 <security_file_permiss▒
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ffff0000083cd1ac <security_file_permiss▒
      ...
        0.68 │   └──cbnz   60
      
      the cbz operand is missing, and the tbz doesn't get simplified processing
      at all because the parsing function failed to match an address.
      
      Aarch64 Simplified, After this patch applied:
      
             │    ↑ cbz    w0, 40
      ...
        4.44 │   │↓ tbnz   w0, #26, d0
      ...
        1.37 │   │↓ tbnz   w22, #5, e4
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ec
      ...
        0.68 │   └──cbnz   w0, 60
      Originally-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@samba.org>
      Reported-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/20170601092959.f60d98912e8a1b66fd1e4c0e@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b13bbeee
  6. 01 6月, 2017 1 次提交
    • J
      perf trace: Add mmap alias for s390 · 54265664
      Jiri Olsa 提交于
      The s390 architecture maps sys_mmap (nr 90) into sys_old_mmap.  For this
      reason perf trace can't find the proper syscall event to get args format
      from and displays it wrongly as 'continued'.
      
      To fix that fill the "alias" field with "old_mmap" for trace's mmap record
      to get the correct translation.
      
      Before:
           0.042 ( 0.011 ms): vest/43052 fstat(statbuf: 0x3ffff89fd90                ) = 0
           0.042 ( 0.028 ms): vest/43052  ... [continued]: mmap()) = 0x3fffd6e2000
           0.072 ( 0.025 ms): vest/43052 read(buf: 0x3fffd6e2000, count: 4096        ) = 6
      
      After:
           0.045 ( 0.011 ms): fstat(statbuf: 0x3ffff8a0930                           ) = 0
           0.057 ( 0.018 ms): mmap(arg: 0x3ffff8a0858                                ) = 0x3fffd14a000
           0.076 ( 0.025 ms): read(buf: 0x3fffd14a000, count: 4096                   ) = 6
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170531113557.19175-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      54265664
  7. 27 5月, 2017 2 次提交
  8. 26 5月, 2017 1 次提交
  9. 24 5月, 2017 8 次提交
    • I
      tools/include: Sync kernel ABI headers with tooling headers · 6e30437b
      Ingo Molnar 提交于
      Sync (copy) the following v4.12 kernel headers to the tooling headers:
      
        arch/x86/include/asm/disabled-features.h:
        arch/x86/include/uapi/asm/kvm.h:
        arch/powerpc/include/uapi/asm/kvm.h:
        arch/s390/include/uapi/asm/kvm.h:
        arch/arm/include/uapi/asm/kvm.h:
        arch/arm64/include/uapi/asm/kvm.h:
      
         - 'struct kvm_sync_regs' got changed in an ABI-incompatible way,
           fortunately none of the (in-kernel) tooling relied on it
      
         - new KVM_DEV calls added
      
        arch/x86/include/asm/required-features.h:
      
         - 5-level paging hardware ABI detail added
      
        arch/x86/include/asm/cpufeatures.h:
      
         - new CPU feature added
      
        arch/x86/include/uapi/asm/vmx.h:
      
         - new VMX exit conditions
      
      None of the changes requires fixes in the tooling source code.
      
      This addresses the following warnings:
      
        Warning: include/uapi/linux/stat.h differs from kernel
        Warning: arch/x86/include/asm/disabled-features.h differs from kernel
        Warning: arch/x86/include/asm/required-features.h differs from kernel
        Warning: arch/x86/include/asm/cpufeatures.h differs from kernel
        Warning: arch/x86/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/x86/include/uapi/asm/vmx.h differs from kernel
        Warning: arch/powerpc/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/s390/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/arm/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/arm64/include/uapi/asm/kvm.h differs from kernel
      
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524065721.j2mlch6bgk5klgbc@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e30437b
    • N
      perf tools: Put caller above callee in --children mode · 7111ffff
      Namhyung Kim 提交于
      The __hpp__sort_acc() sorts entries using callchain depth in order to
      put callers above in children mode.  But it assumed the callchain order
      was callee-first.  Now default (for children) is caller-first so the
      order of entries is reverted.
      
      For example, consider following case:
      
        $ perf report --no-children
        ..l
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ..........................
        #
            99.44%  a.out    a.out                [.] main
                    |
                    ---main
                       __libc_start_main
                       _start
      
      Then children mode should show 'start' above '__libc_start_main' since
      it's the caller (parent) of the __libc_start_main.  But it's reversed:
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.61%     0.00%  a.out    a.out            [.] _start
            99.54%    99.44%  a.out    a.out            [.] main
      
      This patch fixes it.
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    a.out            [.] _start
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.54%    99.44%  a.out    a.out            [.] main
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-8-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7111ffff
    • M
      perf report: Do not drop last inlined frame · 4d53b9d5
      Milian Wolff 提交于
      The very last inlined frame, i.e. the one furthest away from the
      non-inlined frame, was silently dropped. This is apparent when
      comparing the output of `perf script` and `addr2line`:
      
      ~~~~~~
        $ perf script --inline
        ...
        a.out 26722 80836.309329:      72425 cycles:
                           21561 __hypot_finite (/usr/lib/libm-2.25.so)
                            ace3 hypot (/usr/lib/libm-2.25.so)
                             a4a main (a.out)
                                 std::abs<double>
                                 std::_Norm_helper<true>::_S_do_it<double>
                                 std::norm<double>
                                 main
                           20510 __libc_start_main (/usr/lib/libc-2.25.so)
                             bd9 _start (a.out)
      
        $ addr2line -a -f -i -e /tmp/a.out a4a | c++filt
        0x0000000000000a4a
        std::__complex_abs(doublecomplex )
        /usr/include/c++/6.3.1/complex:589
        double std::abs<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:597
        double std::_Norm_helper<true>::_S_do_it<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:654
        double std::norm<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:664
        main
        /tmp/inlining.cpp:14
      ~~~~~
      
      Note how `std::__complex_abs` is missing from the `perf script`
      output. This is similarly showing up in `perf report`. The patch
      here fixes this issue, and the output becomes:
      
      ~~~~~
        a.out 26722 80836.309329:      72425 cycles:
                           21561 __hypot_finite (/usr/lib/libm-2.25.so)
                            ace3 hypot (/usr/lib/libm-2.25.so)
                             a4a main (a.out)
                                 std::__complex_abs
                                 std::abs<double>
                                 std::_Norm_helper<true>::_S_do_it<double>
                                 std::norm<double>
                                 main
                           20510 __libc_start_main (/usr/lib/libc-2.25.so)
                             bd9 _start (a.out)
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-7-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d53b9d5
    • M
      perf report: Always honor callchain order for inlined nodes · 28071f51
      Milian Wolff 提交于
      So far, the inlined nodes where only reversed when we built perf
      against libbfd. If that was not available, the addr2line fallback
      code path was missing the inline_list__reverse call.
      
      Now we always add the nodes in the correct order within
      inline_list__append. This removes the need to reverse the list
      and also ensures that all callers construct the list in the right
      order.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-6-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      28071f51
    • N
      perf script: Add --inline option for debugging · 325fbff5
      Namhyung Kim 提交于
      The --inline option is to show inlined functions in callchains.
      
      For example:
      
        $ perf script
        a.out  5644 11611.467597:     309961 cycles:u:
                           790 main (/home/namhyung/tmp/perf/a.out)
                         20511 __libc_start_main (/usr/lib/libc-2.25.so)
                           8ba _start (/home/namhyung/tmp/perf/a.out)
        ...
      
        $ perf script --inline
        a.out  5644 11611.467597:     309961 cycles:u:
                           790 main (/home/namhyung/tmp/perf/a.out)
                               std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()
                               std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
                               std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >
                               main
                         20511 __libc_start_main (/usr/lib/libc-2.25.so)
                           8ba _start (/home/namhyung/tmp/perf/a.out)
        ...
      Reviewed-and-tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-5-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      325fbff5
    • M
      perf report: Fix off-by-one for non-activation frames · 1982ad48
      Milian Wolff 提交于
      As the documentation for dwfl_frame_pc says, frames that
      are no activation frames need to have their program counter
      decremented by one to properly find the function of the caller.
      
      This fixes many cases where perf report currently attributes
      the cost to the next line. I.e. I have code like this:
      
      ~~~~~~~~~~~~~~~
        #include <thread>
        #include <chrono>
      
        using namespace std;
      
        int main()
        {
          this_thread::sleep_for(chrono::milliseconds(1000));
          this_thread::sleep_for(chrono::milliseconds(100));
          this_thread::sleep_for(chrono::milliseconds(10));
      
          return 0;
        }
      ~~~~~~~~~~~~~~~
      
      Now compile and record it:
      
      ~~~~~~~~~~~~~~~
        g++ -std=c++11 -g -O2 test.cpp
        echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
        perf record \
          --event sched:sched_stat_sleep \
          --event sched:sched_process_exit \
          --event sched:sched_switch --call-graph=dwarf \
          --output perf.data.raw \
          ./a.out
        echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
        perf inject --sched-stat --input perf.data.raw --output perf.data
      ~~~~~~~~~~~~~~~
      
      Before this patch, the report clearly shows the off-by-one issue.
      Most notably, the last sleep invocation is incorrectly attributed
      to the "return 0;" line:
      
      ~~~~~~~~~~~~~~~
        Overhead  Source:Line
        ........  ...........
      
         100.00%  core.c:0
                  |
                  ---__schedule core.c:0
                     schedule
                     do_nanosleep hrtimer.c:0
                     hrtimer_nanosleep
                     sys_nanosleep
                     entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
                     __nanosleep_nocancel .:0
                     std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
                     |
                     |--90.08%--main test.cpp:9
                     |          __libc_start_main
                     |          _start
                     |
                     |--9.01%--main test.cpp:10
                     |          __libc_start_main
                     |          _start
                     |
                      --0.91%--main test.cpp:13
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~
      
      With this patch here applied, the issue is fixed. The report becomes
      much more usable:
      
      ~~~~~~~~~~~~~~~
        Overhead  Source:Line
        ........  ...........
      
         100.00%  core.c:0
                  |
                  ---__schedule core.c:0
                     schedule
                     do_nanosleep hrtimer.c:0
                     hrtimer_nanosleep
                     sys_nanosleep
                     entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
                     __nanosleep_nocancel .:0
                     std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
                     |
                     |--90.08%--main test.cpp:8
                     |          __libc_start_main
                     |          _start
                     |
                     |--9.01%--main test.cpp:9
                     |          __libc_start_main
                     |          _start
                     |
                      --0.91%--main test.cpp:10
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~
      
      Similarly it works for signal frames:
      
      ~~~~~~~~~~~~~~~
        __noinline void bar(void)
        {
          volatile long cnt = 0;
      
          for (cnt = 0; cnt < 100000000; cnt++);
        }
      
        __noinline void foo(void)
        {
          bar();
        }
      
        void sig_handler(int sig)
        {
          foo();
        }
      
        int main(void)
        {
          signal(SIGUSR1, sig_handler);
          raise(SIGUSR1);
      
          foo();
          return 0;
        }
      ~~~~~~~~~~~~~~~~
      
      Before, the report wrongly points to `signal.c:29` after raise():
      
      ~~~~~~~~~~~~~~~~
        $ perf report --stdio --no-children -g srcline -s srcline
        ...
         100.00%  signal.c:11
                  |
                  ---bar signal.c:11
                     |
                     |--50.49%--main signal.c:29
                     |          __libc_start_main
                     |          _start
                     |
                      --49.51%--0x33a8f
                                raise .:0
                                main signal.c:29
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~~
      
      With this patch in, the issue is fixed and we instead get:
      
      ~~~~~~~~~~~~~~~~
         100.00%  signal   signal            [.] bar
                  |
                  ---bar signal.c:11
                     |
                     |--50.49%--main signal.c:29
                     |          __libc_start_main
                     |          _start
                     |
                      --49.51%--0x33a8f
                                raise .:0
                                main signal.c:27
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~~
      
      Note how this patch fixes this issue for both unwinding methods, i.e.
      both dwfl and libunwind. The former case is straight-forward thanks
      to dwfl_frame_pc(). For libunwind, we replace the functionality via
      unw_is_signal_frame() for any but the very first frame.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-4-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1982ad48
    • M
      perf report: Fix memory leak in addr2line when called by addr2inlines · b21cc978
      Milian Wolff 提交于
      When a filename was found in addr2line it was duplicated via strdup()
      but never freed. Now we pass NULL and handle this gracefully in
      addr2line.
      
      Detected by Valgrind:
      
        ==16331== 1,680 bytes in 21 blocks are definitely lost in loss record 148 of 220
        ==16331==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
        ==16331==    by 0x672FA69: strdup (in /usr/lib/libc-2.25.so)
        ==16331==    by 0x52769F: addr2line (srcline.c:256)
        ==16331==    by 0x52769F: addr2inlines (srcline.c:294)
        ==16331==    by 0x52769F: dso__parse_addr_inlines (srcline.c:502)
        ==16331==    by 0x574D7A: inline__fprintf (hist.c:41)
        ==16331==    by 0x574D7A: ipchain__fprintf_graph (hist.c:147)
        ==16331==    by 0x57518A: __callchain__fprintf_graph (hist.c:212)
        ==16331==    by 0x5753CF: callchain__fprintf_graph.constprop.6 (hist.c:337)
        ==16331==    by 0x57738E: hist_entry__fprintf (hist.c:628)
        ==16331==    by 0x57738E: hists__fprintf (hist.c:882)
        ==16331==    by 0x44A20F: perf_evlist__tty_browse_hists (builtin-report.c:399)
        ==16331==    by 0x44A20F: report__browse_hists (builtin-report.c:491)
        ==16331==    by 0x44A20F: __cmd_report (builtin-report.c:624)
        ==16331==    by 0x44A20F: cmd_report (builtin-report.c:1054)
        ==16331==    by 0x4A49CE: run_builtin (perf.c:296)
        ==16331==    by 0x4A4CC0: handle_internal_command (perf.c:348)
        ==16331==    by 0x434371: run_argv (perf.c:392)
        ==16331==    by 0x434371: main (perf.c:530)
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-3-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b21cc978
    • M
      perf report: Don't crash on invalid maps in `-g srcline` mode · 7d4df089
      Milian Wolff 提交于
      I just hit a segfault when doing `perf report -g srcline`.
      Valgrind pointed me at this code as the culprit:
      
        ==8359== Invalid read of size 8
        ==8359==    at 0x3096D9: map__rip_2objdump (map.c:430)
        ==8359==    by 0x2FC1A3: match_chain_srcline (callchain.c:645)
        ==8359==    by 0x2FC1A3: match_chain (callchain.c:700)
        ==8359==    by 0x2FC1A3: append_chain (callchain.c:895)
        ==8359==    by 0x2FC1A3: append_chain_children (callchain.c:846)
        ==8359==    by 0x2FF719: callchain_append (callchain.c:944)
        ==8359==    by 0x2FF719: hist_entry__append_callchain (callchain.c:1058)
        ==8359==    by 0x32FA06: iter_add_single_cumulative_entry (hist.c:908)
        ==8359==    by 0x33195C: hist_entry_iter__add (hist.c:1050)
        ==8359==    by 0x258F65: process_sample_event (builtin-report.c:204)
        ==8359==    by 0x30D60C: perf_session__deliver_event (session.c:1310)
        ==8359==    by 0x30D60C: ordered_events__deliver_event (session.c:119)
        ==8359==    by 0x310D12: __ordered_events__flush (ordered-events.c:210)
        ==8359==    by 0x310D12: ordered_events__flush.part.3 (ordered-events.c:277)
        ==8359==    by 0x30DD3C: perf_session__process_user_event (session.c:1349)
        ==8359==    by 0x30DD3C: perf_session__process_event (session.c:1475)
        ==8359==    by 0x30FC3C: __perf_session__process_events (session.c:1867)
        ==8359==    by 0x30FC3C: perf_session__process_events (session.c:1921)
        ==8359==    by 0x25A985: __cmd_report (builtin-report.c:575)
        ==8359==    by 0x25A985: cmd_report (builtin-report.c:1054)
        ==8359==    by 0x2B9A80: run_builtin (perf.c:296)
        ==8359==  Address 0x70 is not stack'd, malloc'd or (recently) free'd
      
      This patch fixes the issue.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      [ Remove dependency from another change ]
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-2-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d4df089
  10. 19 5月, 2017 1 次提交
    • M
      selftests/powerpc: Fix TM resched DSCR test with some compilers · fe06fe86
      Michael Ellerman 提交于
      The tm-resched-dscr test has started failing sometimes, depending on
      what compiler it's built with, eg:
      
        test: tm_resched_dscr
        Check DSCR TM context switch: tm-resched-dscr: tm-resched-dscr.c:76: test_body: Assertion `rv' failed.
        !! child died by signal 6
      
      When it fails we see that the compiler doesn't initialise rv to 1 before
      entering the inline asm block. Although that's counter intuitive, it
      is allowed because we tell the compiler that the inline asm will write
      to rv (using "=r"), meaning the original value is irrelevant.
      
      Marking it as a read/write parameter would presumably work, but it seems
      simpler to fix it by setting the initial value of rv in the inline asm.
      
      Fixes: 96d01610 ("powerpc: Correct DSCR during TM context switch")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      fe06fe86
  11. 18 5月, 2017 2 次提交