1. 29 5月, 2019 15 次提交
    • A
      perf python: Remove -fstack-protector-strong if clang doesn't have it · 7952fa3b
      Arnaldo Carvalho de Melo 提交于
      Some distros put -fstack-protector-strong in the compiler flags to be
      used to build python extensions, but then, the clang version in that
      distro doesn't know about that, only gcc does.
      
      Check if that is the case and remove it from the set of options used to
      build the python binding with clang.
      
      Case at hand:
      
      oraclelinux:7
      
        $ head -2 /etc/os-release
        NAME="Oracle Linux Server"
        VERSION="7.6"
        $ grep stack-protector /usr/lib64/python2.7/_sysconfigdata.py | head -1 | cut -c-120
       'CFLAGS': '-fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --para
        $
        gcc version 4.8.5 20150623 (Red Hat 4.8.5-36.0.1) (GCC)
        clang version 3.4.2 (tags/RELEASE_34/dot2-final)
      
        clang: error: unknown argument: '-fstack-protector-strong'
        clang: error: unknown argument: '-fstack-protector-strong'
        error: command 'clang' failed with exit status 1
        cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
        make[2]: *** [/tmp/build/perf/python/perf.so] Error 1
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-brmp2415zxpbhz45etkgjoma@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7952fa3b
    • D
      perf machine: Return NULL instead of null-terminating /proc/version array · 34b65aff
      Donald Yandt 提交于
      Return NULL instead of null-terminating version char array when fgets
      fails due to end-of-file or error.
      Signed-off-by: NDonald Yandt <donald.yandt@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Fixes: 30ba5b0e ("perf machine: Null-terminate version char array upon fgets(/proc/version) error")
      Link: http://lkml.kernel.org/r/20190528134128.30841-1-donald.yandt@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      34b65aff
    • A
      perf version: Append 12 git SHA chars to the version string · 80ec26d1
      Arnaldo Carvalho de Melo 提交于
      Bumping it from just 4:
      
      Before:
      
        $ perf -v
        perf version 5.2.rc1.g80978f
        $
      
      After:
      
        $ perf -v
        perf version 5.2.rc1.g80978fc864c5
        $
      Requested-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-p4yun2nxlo7eeeohyx5v4kw7@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      80ec26d1
    • J
      perf script: Remove superfluous BPF event titles · 8201787c
      Jiri Olsa 提交于
      There's no need to display "ksymbol event with" text for the
      PERF_RECORD_KSYMBOL event and "bpf event with" test for the
      PERF_RECORD_BPF_EVENT event.
      
      Remove it so it also goes along with other side-band events display.
      
      Before:
      
        # perf script --show-bpf-events
        ...
        swapper     0 [000]     0.000000: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0ef971d len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
        swapper     0 [000]     0.000000: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 36
      
      After:
      
        # perf script --show-bpf-events
        ...
        swapper     0 [000]     0.000000: PERF_RECORD_KSYMBOL addr ffffffffc0ef971d len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
        swapper     0 [000]     0.000000: PERF_RECORD_BPF_EVENT type 1, flags 0, id 36
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-12-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8201787c
    • J
      perf tests: Add map_groups__merge_in test · 4f600bcf
      Jiri Olsa 提交于
      Add map_groups__merge_in test to test the map_groups__merge_in function
      usage - merging kcore maps into existing eBPF maps.
      
      Committer testing:
      
        # perf test merge
        59: map_groups__merge_in                                  : Ok
        # perf test -v merge
        59: map_groups__merge_in                                  :
        --- start ---
        test child forked, pid 8349
        test child finished with 0
        ---- end ----
        map_groups__merge_in: Ok
        #
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-10-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f600bcf
    • J
      perf script: Pad DSO name for --call-trace · 1c492422
      Jiri Olsa 提交于
      Pad the DSO name in --call-trace so we don't have the indent screwed by
      different DSO name lengths, as now for kernel there's also BPF code
      displayed.
      
        # perf-with-kcore record pt -e intel_pt//ku -- sleep 1
        # perf-core/perf-with-kcore script pt --call-trace
      
      Before:
      
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms])                      kretprobe_perf_func
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms])                          trace_call_bpf
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms])                              __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms])                                  __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_get_current_pid_tgid
         sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_ktime_get_ns
         sleep 3660 [16] 57036.806464725: ([kernel.kallsyms])                                          __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464725: ([kernel.kallsyms])                                              __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806465045: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         __htab_map_lookup_elem
         sleep 3660 [16] 57036.806465366: ([kernel.kallsyms])                                          memcmp
         sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_probe_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                          probe_kernel_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                              __check_object_size
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                                  check_stack_object
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                              copy_user_enhanced_fast_string
         sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_probe_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                          probe_kernel_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                              __check_object_size
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                                  check_stack_object
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms])                                              copy_user_enhanced_fast_string
         sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_get_current_uid_gid
         sleep 3660 [16] 57036.806466008: ([kernel.kallsyms])                                          from_kgid
         sleep 3660 [16] 57036.806466008: ([kernel.kallsyms])                                          from_kuid
         sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return)                                         bpf_perf_event_output
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms])                                          perf_event_output
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms])                                              perf_prepare_sample
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms])                                                  perf_misc_flags
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms])                                                      __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms])                                                          __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806466328: ([kvm])                                                      kvm_is_in_guest
         sleep 3660 [16] 57036.806466649: ([kernel.kallsyms])                                                  __perf_event_header__init_id.isra.0
         sleep 3660 [16] 57036.806466649: ([kernel.kallsyms])                                              perf_output_begin
      
      After:
      
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]                      )     kretprobe_perf_func
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]                      )         trace_call_bpf
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]                      )             __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]                      )                 __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_get_current_pid_tgid
         sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_ktime_get_ns
         sleep 3660 [16] 57036.806464725: ([kernel.kallsyms]                      )                         __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806464725: ([kernel.kallsyms]                      )                             __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806465045: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     __htab_map_lookup_elem
         sleep 3660 [16] 57036.806465366: ([kernel.kallsyms]                      )                         memcmp
         sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_probe_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                         probe_kernel_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                             __check_object_size
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                                 check_stack_object
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                             copy_user_enhanced_fast_string
         sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_probe_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                         probe_kernel_read
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                             __check_object_size
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                                 check_stack_object
         sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]                      )                             copy_user_enhanced_fast_string
         sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_get_current_uid_gid
         sleep 3660 [16] 57036.806466008: ([kernel.kallsyms]                      )                         from_kgid
         sleep 3660 [16] 57036.806466008: ([kernel.kallsyms]                      )                         from_kuid
         sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return )                     bpf_perf_event_output
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]                      )                         perf_event_output
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]                      )                             perf_prepare_sample
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]                      )                                 perf_misc_flags
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]                      )                                     __x86_indirect_thunk_rax
         sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]                      )                                         __x86_indirect_thunk_rax
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-8-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1c492422
    • J
      perf dso: Add BPF DSO read and size hooks · 6c398d72
      Jiri Olsa 提交于
      Add BPF related code into DSO reading paths to return size (bpf_size)
      and read the BPF code (bpf_read).
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-5-jolsa@kernel.org
      [ Use uintptr_t when casting from u64 to u8 pointers ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6c398d72
    • J
      perf dso: Simplify dso_cache__read function · cacddfe7
      Jiri Olsa 提交于
      There's no need for the while loop now, also we can connect two (ret >
      0) condition legs together.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cacddfe7
    • J
      perf dso: Separate generic code in dso_cache__read · ea5db1bd
      Jiri Olsa 提交于
      Move the file specific code in the dso_cache__read function to a
      separate file_read function. I'll add BPF specific code in the following
      patches.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea5db1bd
    • J
      perf dso: Separate generic code in dso__data_file_size() · 5523769e
      Jiri Olsa 提交于
      Moving file specific code in dso__data_file_size function into separate
      file_size function. I'll add bpf specific code in following patches.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5523769e
    • N
      perf tools: Remove const from thread read accessors · 7cb10a08
      Namhyung Kim 提交于
      The namespaces and comm fields of a thread are protected by rwsem and
      require write access for it.  So it ended up using a cast to remove
      the const qualifier.  Let's get rid of the const then.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Link: http://lkml.kernel.org/r/20190527061149.168640-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7cb10a08
    • J
      perf tools: Preserve eBPF maps when loading kcore · fb5a88d4
      Jiri Olsa 提交于
      We need to preserve eBPF maps even if they are covered by kcore, because
      we need to access eBPF dso for source data.
      
      Add the map_groups__merge_in function to do that.  It merges a map into
      map_groups by splitting the new map within the existing map regions.
      Suggested-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fb5a88d4
    • J
      perf machine: Keep zero in pgoff BPF map · 8529f2e6
      Jiri Olsa 提交于
      With pgoff set to zero, the map__map_ip function will return BPF
      addresses based from 0, which is what we need when we read the data from
      a BPF DSO.
      
      Adding BPF symbols with mapped IP addresses as well.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190508132010.14512-7-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8529f2e6
    • A
      perf auxtrace: Fix itrace defaults for perf script · 355200e0
      Adrian Hunter 提交于
      Commit 4eb06815 ("perf script: Make itrace script default to all
      calls") does not work for the case when '--itrace' only is used, because
      default_no_sample is not being passed.
      
      Example:
      
       Before:
      
        $ perf record -e intel_pt/cyc/u ls
        $ perf script --itrace > cmp1.txt
        $ perf script --itrace=cepwx > cmp2.txt
        $ diff -sq cmp1.txt cmp2.txt
        Files cmp1.txt and cmp2.txt differ
      
       After:
      
        $ perf script --itrace > cmp1.txt
        $ perf script --itrace=cepwx > cmp2.txt
        $ diff -sq cmp1.txt cmp2.txt
        Files cmp1.txt and cmp2.txt are identical
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 4eb06815 ("perf script: Make itrace script default to all calls")
      Link: http://lkml.kernel.org/r/20190520113728.14389-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      355200e0
    • A
      perf intel-pt: Fix itrace defaults for perf script · 26f19c2e
      Adrian Hunter 提交于
      Commit 4eb06815 ("perf script: Make itrace script default to all
      calls") does not work because 'use_browser' is being used to determine
      whether to default to periodic sampling (i.e. better for perf report).
      The result is that nothing but CBR events display for perf script when
      no --itrace option is specified.
      
      Fix by using 'default_no_sample' and 'inject' instead.
      
      Example:
      
       Before:
      
        $ perf record -e intel_pt/cyc/u ls
        $ perf script > cmp1.txt
        $ perf script --itrace=cepwx > cmp2.txt
        $ diff -sq cmp1.txt cmp2.txt
        Files cmp1.txt and cmp2.txt differ
      
       After:
      
        $ perf script > cmp1.txt
        $ perf script --itrace=cepwx > cmp2.txt
        $ diff -sq cmp1.txt cmp2.txt
        Files cmp1.txt and cmp2.txt are identical
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 90e457f7 ("perf tools: Add Intel PT support")
      Link: http://lkml.kernel.org/r/20190520113728.14389-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      26f19c2e
  2. 28 5月, 2019 4 次提交
  3. 17 5月, 2019 7 次提交
    • J
      perf stat: Support 'percore' event qualifier · 4fc4d8df
      Jin Yao 提交于
      With this patch, we can use the 'percore' event qualifier in perf-stat.
      
        root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
          1.000773050 S0-C0   98,352,832 cpu/event=0,umask=0x3,percore=1/  (50.01%)
          1.000773050 S0-C1  103,763,057 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C2  196,776,995 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C3  176,493,779 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 CPU0    47,699,641 cpu/event=0,umask=0x3/            (50.02%)
          1.000773050 CPU1    49,052,451 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU2   102,771,422 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU3   100,784,662 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU4    43,171,342 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU5    54,152,158 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU6    93,618,410 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU7    74,477,589 cpu/event=0,umask=0x3/            (49.99%)
      
      In this example, we count the event 'ref-cycles' per-core and per-CPU in
      one perf stat command-line. From the output, we can see:
      
        S0-C0 = CPU0 + CPU4
        S0-C1 = CPU1 + CPU5
        S0-C2 = CPU2 + CPU6
        S0-C3 = CPU3 + CPU7
      
      So the result is expected (tiny difference is ignored).
      
      Note that, the 'percore' event qualifier needs to use with option '-A'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4fc4d8df
    • J
      perf stat: Factor out aggregate counts printing · 40480a81
      Jin Yao 提交于
      Move the aggregate counts printing to a new function
      print_counter_aggrdata, which will be used in following patches.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40480a81
    • J
      perf tools: Add a 'percore' event qualifier · 064b4e82
      Jin Yao 提交于
      Add a 'percore' event qualifier, like cpu/event=0,umask=0x3,percore=1/,
      that sums up the event counts for both hardware threads in a core.
      
      We can already do this with --per-core, but it's often useful to do
      this together with other metrics that are collected per hardware thread.
      So we need to support this per-core counting on a event level.
      
      This can be implemented in only the user tool, no kernel support needed.
      
       v4:
       ---
       1. Add Arnaldo's patch which updates the documentation for
          this new qualifier.
       2. Rebase to latest perf/core branch
      
       v3:
       ---
       Simplify the code according to Jiri's comments.
       Before:
         "return term->val.percore ? true : false;"
       Now:
         "return term->val.percore;"
      
       v2:
       ---
       Change the qualifier name from 'coresum' to 'percore' according to
       comments from Jiri and Andi.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      064b4e82
    • A
      perf intel-pt: Fix sample timestamp wrt non-taken branches · 1b6599a9
      Adrian Hunter 提交于
      The sample timestamp is updated to ensure that the timestamp represents
      the time of the sample and not a branch that the decoder is still
      walking towards. The sample timestamp is updated when the decoder
      returns, but the decoder does not return for non-taken branches. Update
      the sample timestamp then also.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b6599a9
    • A
      perf intel-pt: Fix improved sample timestamp · 61b6e08d
      Adrian Hunter 提交于
      The decoder uses its current timestamp in samples. Usually that is a
      timestamp that has already passed, but in some cases it is a timestamp
      for a branch that the decoder is walking towards, and consequently
      hasn't reached.
      
      The intel_pt_sample_time() function decides which is which, but was not
      handling TNT packets exactly correctly.
      
      In the case of TNT, the timestamp applies to the first branch, so the
      decoder must first walk to that branch.
      
      That means intel_pt_sample_time() should return true for TNT, and this
      patch makes that change. However, if the first branch is a non-taken
      branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
      for subsequent taken branches in the same TNT packet.
      
      To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
      distinguish the cases.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61b6e08d
    • A
      perf intel-pt: Fix instructions sampling rate · 7ba8fa20
      Adrian Hunter 提交于
      The timestamp used to determine if an instruction sample is made, is an
      estimate based on the number of instructions since the last known
      timestamp. A consequence is that it might go backwards, which results in
      extra samples. Change it so that a sample is only made when the
      timestamp goes forwards.
      
      Note this does not affect a sampling period of 0 or sampling periods
      specified as a count of instructions.
      
      Example:
      
       Before:
      
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         10 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          8 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          6 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          4 instructions:u:      7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16423 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222734:      12731 instructions:u:      7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      
       After:
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16479 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f4aa0819 ("perf tools: Add Intel PT decoder")
      Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ba8fa20
    • K
      perf parse-regs: Add generic support for arch__intr/user_reg_mask() · af785e75
      Kan Liang 提交于
      There may be different register mask for use with intr or user on some
      platforms, e.g. Icelake.
      
      Add weak functions arch__intr_reg_mask() and arch__user_reg_mask() to
      return intr and user register mask respectively.
      
      Check mask before printing or comparing the register name.
      
      Generic code always return PERF_REGS_MASK. No functional change.
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af785e75
  4. 16 5月, 2019 14 次提交
    • K
      perf parse-regs: Split parse_regs · aeea9062
      Kan Liang 提交于
      The available registers for --int-regs and --user-regs may be different,
      e.g. XMM registers.
      
      Split parse_regs into two dedicated functions for --int-regs and
      --user-regs respectively.
      
      Modify the warning message. "--user-regs=?" should be applied to show
      the available registers for --user-regs.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-1-git-send-email-kan.liang@linux.intel.com
      [ Changed docs as suggested by Ravi and agreed by Kan ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aeea9062
    • A
      perf report: Implement perf.data record decompression · cb62c6f1
      Alexey Budankov 提交于
      zstd_init(, comp_level = 0) initializes decompression part of API only
      hat now consists of zstd_decompress_stream() function.
      
      The perf.data PERF_RECORD_COMPRESSED records are decompressed using
      zstd_decompress_stream() function into a linked list of mmaped memory
      regions of mmap_comp_len size (struct decomp).
      
      After decompression of one COMPRESSED record its content is iterated and
      fetched for usual processing. The mmaped memory regions with
      decompressed events are kept in the linked list till the tool process
      termination.
      
      When dumping raw records (e.g., perf report -D --header) file offsets of
      events from compressed records are printed as zero.
      
      Committer notes:
      
      Since now we have support for processing PERF_RECORD_COMPRESSED, we see
      none, in raw form, like we saw in the previous patch commiter notes,
      they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
      records, we only see the stats for those PERF_RECORD_COMPRESSED events,
      and since I used the file generated in the commiter notes for the
      previous patch, there they are, 2 compressed records:
      
        $ perf report --header-only | grep cmdline
        # cmdline : /home/acme/bin/perf record -z2 sleep 1
        $ perf report -D | grep COMPRESS
              COMPRESSED events:          2
              COMPRESSED events:          0
        $ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 15  of event 'cycles:u'
        # Event count (approx.): 962227
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ...........................
        #
            46.99%  sleep    libc-2.28.so      [.] _dl_addr
            29.24%  sleep    [unknown]         [k] 0xffffffffaea00a67
            16.45%  sleep    libc-2.28.so      [.] __GI__IO_un_link.part.1
             5.92%  sleep    ld-2.28.so        [.] _dl_setup_hash
             1.40%  sleep    libc-2.28.so      [.] __nanosleep
             0.00%  sleep    [unknown]         [k] 0xffffffffaea00163
      
        #
        # (Tip: To see callchains in a more compact form: perf report -g folded)
        #
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb62c6f1
    • A
      perf report: Add stub processing of compressed events for -D · 61a7773c
      Alexey Budankov 提交于
      Committer note:
      
      Split from a larger patch, this only dumps PERF_RECORD_COMPRESSED as
      unhandled, so that when we introduce the record part in the next patch,
      we don't see unhandled events when using 'perf record -D'.
      
      Changed it so that we dump the event if the handler is just a stub, i.e.
      for the case where we don't have ZSTD linked but we're processing a
      perf.data file generated by a tool with that linked.
      
      Also when failing to decompress we can't just dump the uncompressed
      event and return 0, we have to propagate the error.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61a7773c
    • A
      perf record: Implement compression for AIO trace streaming · ef781128
      Alexey Budankov 提交于
      Compression is implemented using the functions from zstd.c. As the memory
      to operate on the compression uses mmap->aio.data[] buffers. If Zstd
      streaming compression API fails for some reason the data to be compressed
      are just copied into the memory buffers using plain memcpy().
      
      Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED
      records. Each element of the array is not longer that PERF_SAMPLE_MAX_SIZE
      and consists of perf_event_header followed by the compressed chunk
      that is decompressed on the loading stage.
      
      perf_mmap__aio_push() is replaced by perf_mmap__push() which is now used
      in the both serial and AIO streaming cases. perf_mmap__push() is extended
      with positive return values to signify absence of data ready for
      processing.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/77db2b2c-5d03-dbb0-aeac-c4dd92129ab9@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef781128
    • A
      perf record: Implement compression for serial trace streaming · 5d7f4116
      Alexey Budankov 提交于
      Compression is implemented using the functions from zstd.c. As the
      memory to operate on the compression uses mmap->data buffer.
      
      If Zstd streaming compression API fails for some reason the data to be
      compressed are just copied into the memory buffers using plain memcpy().
      
      Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED
      records. Each element of the array is not longer that
      PERF_SAMPLE_MAX_SIZE and consists of perf_event_header followed by the
      compressed chunk that is decompressed on the loading stage.
      
      Comitter notes:
      
      Undo some unnecessary line breaks, remove some unnecessary () around
      zstd_data to then just get its address, and fix conflicts with
      BPF_PROG_INFO/BPF_BTF patchkits.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/744df43f-3932-2594-ddef-1e99a3cad03a@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5d7f4116
    • A
      perf tools: Introduce Zstd streaming based compression API · f24c1d75
      Alexey Budankov 提交于
      Implemented functions are based on Zstd streaming compression API.
      
      The functions are used in runtime to compress data that come from mmaped
      kernel buffer. zstd_init(), zstd_fini() are used for initialization and
      finalization to allocate and deallocate internal zstd objects.
      zstd_compress_stream_to_records() is used to convert parts of mmaped
      kernel buffer into an array of PERF_RECORD_COMPRESSED records.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/18bf36f3-b85a-1fe2-dd83-10e0c6069568@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f24c1d75
    • A
      perf mmap: Implement dedicated memory buffer for data compression · 51255a8a
      Alexey Budankov 提交于
      Implemented mmap data buffer that is used as the memory to operate
      on when compressing data in case of serial trace streaming.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/49b31321-0f70-392b-9a4f-649d3affe090@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51255a8a
    • A
      perf record: Implement COMPRESSED event record and its attributes · 42e1fd80
      Alexey Budankov 提交于
      Implemented PERF_RECORD_COMPRESSED event, related data types, header
      feature and functions to write, read and print feature attributes from
      the trace header section.
      
      comp_mmap_len preserves the size of mmaped kernel buffer that was used
      during collection. comp_mmap_len size is used on loading stage as the
      size of decomp buffer for decompression of COMPRESSED events content.
      
      Committer notes:
      
      Fixed up conflict with BPF_PROG_INFO and BTF_BTF header features.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/ebbaf031-8dda-3864-ebc6-7922d43ee515@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42e1fd80
    • A
      perf session: Define 'bytes_transferred' and 'bytes_compressed' metrics · d3c8c08e
      Alexey Budankov 提交于
      Define 'bytes_transferred' and 'bytes_compressed' metrics to calculate
      ratio in the end of the data collection:
      
      	compression ratio = bytes_transferred / bytes_compressed
      
      The 'bytes_transferred' metric accumulates the amount of bytes that was
      extracted from the mmaped kernel buffers for compression, while
      'bytes_compressed' accumulates the amount of bytes that was received
      after applying compression.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1d4bf499-cb03-26dc-6fc6-f14fec7622ce@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d3c8c08e
    • D
      perf machine: Null-terminate version char array upon fgets(/proc/version) error · 30ba5b0e
      Donald Yandt 提交于
      If fgets() fails due to any other error besides end-of-file, the version
      char array may not even be null-terminated.
      Signed-off-by: NDonald Yandt <donald.yandt@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Fixes: a1645ce1 ("perf: 'perf kvm' tool for monitoring guest performance from host")
      Link: http://lkml.kernel.org/r/20190514110100.22019-1-donald.yandt@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      30ba5b0e
    • A
      perf tools x86: Add support for recording and printing XMM registers · ca138a7a
      Andi Kleen 提交于
      Icelake and later platforms support collecting XMM registers with PEBS
      event.
      
      Add support for 'perf script' to dump them, and support for the register
      parser in 'perf record -I=' ... to configure them.
      
      For now they are just printed in hex, we could potentially later add
      other formats too.
      
      Committer testing:
      
      Before:
      
        # perf record -IXMM0
        Warning:
        unknown register XMM0, check man page or run 'perf record -I?'
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
        #
        # perf record -I?
        available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
        #
      
      After:
      
        # perf record -IXMM0
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
        /bin/dmesg | grep -i perf may provide additional information.
      
        #
        # perf record -I?
        available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -I, --intr-regs[=<any register>]
                                  sample selected machine registers on interrupt, use -I ? to list register names
        #
      
      More work is needed to, when faced with such error, warn the user that
      that register is not available on the running platform.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190506141926.13659-1-kan.liang@linux.intel.comSigned-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ca138a7a
    • A
      perf parse-regs: Improve error output when faced with unknown register name · 4c1cf203
      Arnaldo Carvalho de Melo 提交于
      Add quotes around the register name and suggest using 'perf record -I?'
      to get the list of available registers.
      
      Before:
      
        # perf record -Idi,xmm20,xmm1
        Warning:
        unknown register xmm20, check man page
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -I, --intr-regs[=<any register>]
                                  sample selected machine registers on interrupt, use -I ? to list register names
        #
        # perf record -Idi,xmm20,xmm1
        Warning:
        unknown register "xmm20", check man page or run "perf record -I?"
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -I, --intr-regs[=<any register>]
                                  sample selected machine registers on interrupt, use -I ? to list register names
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lkml.kernel.org/n/tip-9a9hyuum8c0oggg86xd3sxc5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4c1cf203
    • J
      perf tools: Speed up report for perf compiled with linwunwind · 382619c0
      Jiri Olsa 提交于
      When compiled with libunwind, perf does some preparatory work when
      processing side-band events. This is not needed when report actually
      don't unwind dwarf callchains, so it's disabled with
      dwarf_callchain_users bool.
      
      However we could move that check to higher level and shield more
      unwanted code for normal report processing, giving us following speed up
      on kernel build profile:
      
      Before:
      
        $ perf record make -j40
        ...
        $ ll ../../perf.data
        -rw-------. 1 jolsa jolsa 461783932 Apr 26 09:11 perf.data
        $ perf stat -e cycles:u,instructions:u perf report -i perf.data > out
      
         Performance counter stats for 'perf report -i perf.data':
      
          78,669,920,155      cycles:u
          99,076,431,951      instructions:u            #    1.26  insn per cycle
      
            55.382823668 seconds time elapsed
      
            27.512341000 seconds user
            27.712871000 seconds sys
      
      After:
      
        $ perf stat -e cycles:u,instructions:u perf report -i perf.data > out
      
         Performance counter stats for 'perf report -i perf.data':
      
          59,626,798,904      cycles:u
          88,583,575,849      instructions:u            #    1.49  insn per cycle
      
            21.296935559 seconds time elapsed
      
            20.010191000 seconds user
             1.202935000 seconds sys
      
      The speed is higher with profile having many side-band events,
      because these trigger libunwind preparatory code.
      
      This does not apply for perf compiled with libdw for dwarf unwind,
      only for build with libunwind.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190426073804.17238-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      382619c0
    • J
      perf annotate: Remove hist__account_cycles() from callback · bdd1666b
      Jin Yao 提交于
      The hist__account_cycles() function is executed when the
      hist_iter__branch_callback() is called.
      
      But it looks it's not necessary.  In hist__account_cycles, it already
      walks on all branch entries.
      
      This patch moves the hist__account_cycles out of callback, now the data
      processing is much faster than before.
      
      Previous code has an issue that the ch[offset].num++ (in
      __symbol__account_cycles) is executed repeatedly since
      hist__account_cycles is called in each hist_iter__branch_callback, so
      the counting of ch[offset].num is not correct (too big).
      
      With this patch, the issue is fixed. And we don't need the code of
      "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
      annotation__count_and_fill), otherwise some data would be hidden.
      
      Now, we can try, for example:
      
        perf record -b ...
        perf annotate or perf report -s symbol
      
      The before/after output should be no change.
      
       v3:
       ---
       Fix the crash in stdio mode.
       Like previous code, it needs the checking of ui__has_annotation()
       before hist__account_cycles()
      
       v2:
       ---
       1. Cover the similar perf report
       2. Remove the checking code "ch->reset >= ch->num / 2"
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bdd1666b