1. 16 4月, 2020 13 次提交
    • A
      perf cs-etm: Implement ->evsel_is_auxtrace() callback · a58ab57c
      Adrian Hunter 提交于
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a58ab57c
    • A
      perf arm-spe: Implement ->evsel_is_auxtrace() callback · 508c71e3
      Adrian Hunter 提交于
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-5-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      508c71e3
    • A
      perf intel-bts: Implement ->evsel_is_auxtrace() callback · 966246f5
      Adrian Hunter 提交于
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      966246f5
    • A
      perf intel-pt: Implement ->evsel_is_auxtrace() callback · 6b52bb07
      Adrian Hunter 提交于
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6b52bb07
    • A
      perf auxtrace: Add ->evsel_is_auxtrace() callback · 853f37d7
      Adrian Hunter 提交于
      Add ->evsel_is_auxtrace() callback to identify if a selected event
      is an AUX area event.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      853f37d7
    • K
      perf metrictroup: Split the metricgroup__add_metric function · 47352aba
      Kajol Jain 提交于
      This patch refactors metricgroup__add_metric function where some part of
      it move to function metricgroup__add_metric_param.  No logic change.
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-4-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      47352aba
    • J
      perf expr: Add expr_scanner_ctx object · 871f9f59
      Jiri Olsa 提交于
      Add the expr_scanner_ctx object to hold user data for the expr scanner.
      Currently it holds only start_token, Kajol Jain will use it to hold 24x7
      runtime param.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-3-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      871f9f59
    • J
      perf expr: Add expr_ prefix for parse_ctx and parse_id · aecce63e
      Jiri Olsa 提交于
      Adding expr_ prefix for parse_ctx and parse_id, to straighten out the
      expr* namespace.
      
      There's no functional change.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-2-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aecce63e
    • I
      perf synthetic-events: save 4kb from 2 stack frames · 04ed4ccb
      Ian Rogers 提交于
      Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
      
      Reduces stack frame sizes by 4kb.
      
      perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
      'sub $0x35b8,%rsp'.
      
      perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
      'sub $0x1028,%rsp'.
      
      The performance impact of this change is negligible.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04ed4ccb
    • A
      perf tools: Support CAP_PERFMON capability · 6b3e0e2e
      Alexey Budankov 提交于
      Extend error messages to mention CAP_PERFMON capability as an option to
      substitute CAP_SYS_ADMIN capability for secure system performance
      monitoring and observability operations. Make
      perf_event_paranoid_check() and __cmd_ftrace() to be aware of
      CAP_PERFMON capability.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure perf_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      
      Committer testing:
      
      Using a libcap with this patch:
      
        diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h
        index 78b2fd4c8a95..89b5b0279b60 100644
        --- a/libcap/include/uapi/linux/capability.h
        +++ b/libcap/include/uapi/linux/capability.h
        @@ -366,8 +366,9 @@ struct vfs_ns_cap_data {
      
         #define CAP_AUDIT_READ       37
      
        +#define CAP_PERFMON	     38
      
        -#define CAP_LAST_CAP         CAP_AUDIT_READ
        +#define CAP_LAST_CAP         CAP_PERFMON
      
         #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
      
      Note that using '38' in place of 'cap_perfmon' works to some degree with
      an old libcap, its only when cap_get_flag() is called that libcap
      performs an error check based on the maximum value known for
      capabilities that it will fail.
      
      This makes determining the default of perf_event_attr.exclude_kernel to
      fail, as it can't determine if CAP_PERFMON is in place.
      
      Using 'perf top -e cycles' avoids the default check and sets
      perf_event_attr.exclude_kernel to 1.
      
      As root, with a libcap supporting CAP_PERFMON:
      
        # groupadd perf_users
        # adduser perf -g perf_users
        # mkdir ~perf/bin
        # cp ~acme/bin/perf ~perf/bin/
        # chgrp perf_users ~perf/bin/perf
        # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf
        # getcap ~perf/bin/perf
        /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
        # ls -la ~perf/bin/perf
        -rwxr-xr-x. 1 root perf_users 16968552 Apr  9 13:10 /home/perf/bin/perf
      
      As the 'perf' user in the 'perf_users' group:
      
        $ perf top -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $
      
      Either add the cap_ipc_lock capability to the perf binary or reduce the
      ring buffer size to some smaller value:
      
        $ perf top -m10 -a --stdio
        rounding mmap pages size to 64K (16 pages)
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m4 -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m2 -a --stdio
         PerfTop: 762 irqs/sec  kernel:49.7%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs)
        ------------------------------------------------------------------------------------------------------
      
           9.83%  perf                [.] __symbols__insert
           8.58%  perf                [.] rb_next
           5.91%  [kernel]            [k] module_get_kallsym
           5.66%  [kernel]            [k] kallsyms_expand_symbol.constprop.0
           3.98%  libc-2.29.so        [.] __GI_____strtoull_l_internal
           3.66%  perf                [.] rb_insert_color
           2.34%  [kernel]            [k] vsnprintf
           2.30%  [kernel]            [k] string_nocheck
           2.16%  libc-2.29.so        [.] _IO_getdelim
           2.15%  [kernel]            [k] number
           2.13%  [kernel]            [k] format_decode
           1.58%  libc-2.29.so        [.] _IO_feof
           1.52%  libc-2.29.so        [.] __strcmp_avx2
           1.50%  perf                [.] rb_set_parent_color
           1.47%  libc-2.29.so        [.] __libc_calloc
           1.24%  [kernel]            [k] do_syscall_64
           1.17%  [kernel]            [k] __x86_indirect_thunk_rax
      
        $ perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ]
        $ perf evlist
        cycles
        $ perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        $ perf report | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 74  of event 'cycles'
        # Event count (approx.): 15694834
        #
        # Overhead  Command          Shared Object               Symbol
        # ........  ...............  ..........................  ......................................
        #
            19.62%  perf             [kernel.vmlinux]            [k] strnlen_user
            13.88%  swapper          [kernel.vmlinux]            [k] intel_idle
            13.83%  ksoftirqd/0      [kernel.vmlinux]            [k] pfifo_fast_dequeue
            13.51%  swapper          [kernel.vmlinux]            [k] kmem_cache_free
             6.31%  gnome-shell      [kernel.vmlinux]            [k] kmem_cache_free
             5.66%  kworker/u8:3+ix  [kernel.vmlinux]            [k] delay_tsc
             4.42%  perf             [kernel.vmlinux]            [k] __set_cpus_allowed_ptr
             3.45%  kworker/2:1-eve  [kernel.vmlinux]            [k] shmem_truncate_range
             2.29%  gnome-shell      libgobject-2.0.so.0.6000.7  [.] g_closure_ref
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6b3e0e2e
    • J
      perf annotate: Add basic support for bpf_image · 3c29d448
      Jiri Olsa 提交于
      Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF
      images that carry trampoline or dispatcher.
      
      Upcoming patches will add support to read the image data, store it
      within the BPF feature in perf.data and display it for annotation
      purposes.
      
      Currently we only display following message:
      
        # ./perf annotate bpf_trampoline_24456 --stdio
         Percent |      Source code & Disassembly of . for cycles (504  ...
        --------------------------------------------------------------- ...
                 :       to be implemented
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3c29d448
    • J
      perf machine: Set ksymbol dso as loaded on arrival · 7eddf7e7
      Jiri Olsa 提交于
      There's no special load action for ksymbol data on map__load/dso__load
      action, where the kernel is getting loaded. It only gets confused with
      kernel kallsyms/vmlinux load for bpf object, which fails and could mess
      up with the map.
      
      Disabling any further load of the map for ksymbol related dso/map.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7eddf7e7
    • J
      perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event · 943930e4
      Jiri Olsa 提交于
      Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol
      events from /proc/kallsyms. Having this perf can recognize samples from
      those images and perf report and top shows them correctly.
      
      The rest of the ksymbol handling is already in place from for the bpf
      programs monitoring, so only the initial state was needed.
      
      perf report output:
      
        # Overhead  Command     Shared Object                  Symbol
      
          12.37%  test_progs  [kernel.vmlinux]                 [k] entry_SYSCALL_64
          11.80%  test_progs  [kernel.vmlinux]                 [k] syscall_return_via_sysret
           9.63%  test_progs  bpf_prog_bcf7977d3b93787c_prog2  [k] bpf_prog_bcf7977d3b93787c_prog2
           6.90%  test_progs  bpf_trampoline_24456             [k] bpf_trampoline_24456
           6.36%  test_progs  [kernel.vmlinux]                 [k] memcpy_erms
      
      Committer notes:
      
      Use scnprintf() instead of strncpy() to overcome this on fedora:32,
      rawhide and OpenMandriva Cooker:
      
          CC       /tmp/build/perf/util/bpf-event.o
        In file included from /usr/include/string.h:495,
                         from /git/linux/tools/lib/bpf/libbpf_common.h:12,
                         from /git/linux/tools/lib/bpf/bpf.h:31,
                         from util/bpf-event.c:4:
        In function 'strncpy',
            inlined from 'process_bpf_image' at util/bpf-event.c:323:2,
            inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9:
        /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      943930e4
  2. 14 4月, 2020 2 次提交
    • J
      perf stat: Fix no metric header if --per-socket and --metric-only set · 8358f698
      Jin Yao 提交于
      We received a report that was no metric header displayed if --per-socket
      and --metric-only were both set.
      
      It's hard for script to parse the perf-stat output. This patch fixes this
      issue.
      
      Before:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
        S0        8                  2.6
      
               2.215270071 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus
             1.000411692 S0        8                  2.2
             2.001547952 S0        8                  3.4
             3.002446511 S0        8                  3.4
             4.003346157 S0        8                  4.0
             5.004245736 S0        8                  0.3
      
      After:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
                                     CPI
        S0        8                  2.1
      
               1.813579830 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus                  CPI
             1.000415122 S0        8                  3.2
             2.001630051 S0        8                  2.9
             3.002612278 S0        8                  4.3
             4.003523594 S0        8                  3.0
             5.004504256 S0        8                  3.7
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8358f698
    • A
      perf python: Check if clang supports -fno-semantic-interposition · 9a00df31
      Arnaldo Carvalho de Melo 提交于
      The set of C compiler options used by distros to build python bindings
      may include options that are unknown to clang, we check for a variety of
      such options, add -fno-semantic-interposition to that mix:
      
      This fixes the build on, among others, Manjaro Linux:
      
          GEN      /tmp/build/perf/python/perf.so
        clang-9: error: unknown argument: '-fno-semantic-interposition'
        error: command 'clang' failed with exit status 1
        make: Leaving directory '/git/perf/tools/perf'
      
        [perfbuilder@602aed1c266d ~]$ gcc -v
        Using built-in specs.
        COLLECT_GCC=gcc
        COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
        Target: x86_64-pc-linux-gnu
        Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc
        Thread model: posix
        gcc version 9.3.0 (Arch Linux 9.3.0-1)
        [perfbuilder@602aed1c266d ~]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a00df31
  3. 03 4月, 2020 10 次提交
    • A
      perf python: Fix clang detection to strip out options passed in $CC · 9ff76cea
      Arnaldo Carvalho de Melo 提交于
      The clang check in the python setup.py file expected $CC to be just the
      name of the compiler, not the compiler + options, i.e. all options were
      expected to be passed in $CFLAGS, this ends up making it fail in systems
      where CC is set to, e.g.:
      
       "aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot"
      
      Like this:
      
        $ python3
        >>> from subprocess import Popen
        >>> a = Popen(["aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot", "-v"])
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
            restore_signals, start_new_session)
          File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
            raise child_exception_type(errno_num, err_msg, err_filename)
        FileNotFoundError: [Errno 2] No such file or directory: 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot': 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot'
        >>>
      
      Make it more robust, covering this case, by passing cc.split()[0] as the
      first arg to popen().
      
      Fixes: a7ffd416 ("perf python: Fix clang detection when using CC=clang-version")
      Reported-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ilie Halip <ilie.halip@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200401124037.GA12534@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ff76cea
    • A
      perf script report: Fix SEGFAULT when using DWARF mode · 1a4025f0
      Andreas Gerstmayr 提交于
      When running perf script report with a Python script and a callgraph in
      DWARF mode, intr_regs->regs can be 0 and therefore crashing the regs_map
      function.
      
      Added a check for this condition (same check as in builtin-script.c:595).
      Signed-off-by: NAndreas Gerstmayr <agerstmayr@redhat.com>
      Tested-by: NKim Phillips <kim.phillips@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: http://lore.kernel.org/lkml/20200402125417.422232-1-agerstmayr@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1a4025f0
    • A
      perf events parser: Add missing Intel CPU events to parser · 47327f56
      Adrian Hunter 提交于
      perf list expects CPU events to be parseable by name, e.g.
      
          # perf list | grep el-capacity-read
            el-capacity-read OR cpu/el-capacity-read/          [Kernel PMU event]
      
      But the event parser does not recognize them that way, e.g.
      
          # perf test -v "Parse event"
          <SNIP>
          running test 54 'cycles//u'
          running test 55 'cycles:k'
          running test 0 'cpu/config=10,config1,config2=3,period=1000/u'
          running test 1 'cpu/config=1,name=krava/u,cpu/config=2/u'
          running test 2 'cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/'
          running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp'
          -> cpu/event=0,umask=0x11/
          -> cpu/event=0,umask=0x13/
          -> cpu/event=0x54,umask=0x1/
          failed to parse event 'el-capacity-read:u,cpu/event=el-capacity-read/u', err 1, str 'parser error'
          event syntax error: 'el-capacity-read:u,cpu/event=el-capacity-read/u'
                                 \___ parser error test child finished with 1
          ---- end ----
          Parse event definition strings: FAILED!
      
      This happens because the parser splits names by '-' in order to deal
      with cache events. For example 'L1-dcache' is a token in
      parse-events.l which is matched to 'L1-dcache-load-miss' by the
      following rule:
      
          PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config
      
      And so there is special handling for 2-part PMU names i.e.
      
          PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
      
      but no handling for 3-part names, which are instead added as tokens e.g.
      
          topdown-[a-z-]+
      
      While it would be possible to add a rule for 3-part names, that would
      not work if the first parts were also a valid PMU name e.g.
      'el-capacity-read' would be matched to 'el-capacity' before the parser
      reached the 3rd part.
      
      The parser would need significant change to rationalize all this, so
      instead fix for now by adding missing Intel CPU events with 3-part names
      to the event parser as tokens.
      
      Missing events were found by using:
      
          grep -r EVENT_ATTR_STR arch/x86/events/intel/core.c
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Link: http://lore.kernel.org/lkml/90c7ae07-c568-b6d3-f9c4-d0c1528a0610@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      47327f56
    • S
      perf script: Allow --symbol to accept hexadecimal addresses · d2bedb78
      Stephane Eranian 提交于
      This patch extends the perf script --symbols option to filter on
      hexadecimal addresses in addition to symbol names. This makes it easier
      to handle cases where symbols are aliased.
      
      With this patch, it is possible to mix and match symbols and hexadecimal
      addresses using the --symbols option.
      
        $ perf script --symbols=noploop,0x4007a0
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325220802.15039-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2bedb78
    • N
      perf record: Add --all-cgroups option · 8fb4b679
      Namhyung Kim 提交于
      The --all-cgroups option is to enable cgroup profiling support.  It
      tells kernel to record CGROUP events in the ring buffer so that perf
      report can identify task/cgroup association later.
      
        [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ]
        [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 558  of event 'cycles'
        # Event count (approx.): 458017341
        #
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            33.15%  4/0xeffffffb           /sub           9615:looper0
            32.83%  4/0xf00002f5           /sub/cgrp2     9620:looper2
            32.79%  4/0xf00002f4           /sub/cgrp1     9619:looper1
             0.35%  4/0xf00002f5           /sub/cgrp2     9618:cgtest
             0.34%  4/0xf00002f4           /sub/cgrp1     9617:cgtest
             0.32%  4/0xeffffffb           /              9615:looper0
             0.11%  4/0xeffffffb           /sub           9617:cgtest
             0.10%  4/0xeffffffb           /sub           9618:cgtest
      
        #
        # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S')
        #
        [root@seventh ~]#
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org
      Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
      [ Extracted the HAVE_FILE_HANDLE from the followup patch ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8fb4b679
    • N
      perf record: Support synthesizing cgroup events · ab64069f
      Namhyung Kim 提交于
      Synthesize cgroup events by iterating cgroup filesystem directories.
      The cgroup event only saves the portion of cgroup path after the mount
      point and the cgroup id (which actually is a file handle).
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org
      Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
      [ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab64069f
    • N
      perf report: Add 'cgroup' sort key · b629f3e9
      Namhyung Kim 提交于
      The cgroup sort key is to show cgroup membership of each task.
      Currently it shows full path in the cgroupfs (not relative to the root
      of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
      cgroup in different namespaces will all show same name - "/".
      
      The cgroup sort key should come before cgroup_id otherwise
      sort_dimension__add() will match it to cgroup_id as it only matches with
      the given substring.
      
      For example it will look like following.  Note that record patch adding
      --all-cgroups patch will come later.
      
        $ perf record -a --namespace --all-cgroups  cgtest
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]
      
        $ perf report -s cgroup_id,cgroup,pid
        ...
        # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
        # ........  .....................  ..........  ...............
        #
            93.96%  0/0x0                  /                 0:swapper
             1.25%  3/0xeffffffb           /               278:looper0
             0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
             0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
             0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
             0.22%  3/0xeffffffb           /sub            278:looper0
             0.20%  3/0xeffffffb           /               280:cgtest
             0.15%  3/0xf0000163           /sub/cgrp3      285:looper3
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b629f3e9
    • N
      perf cgroup: Maintain cgroup hierarchy · d1277aa3
      Namhyung Kim 提交于
      Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
      id.  Hist entries have cgroup id can compare it directly and later it
      can be used to find a group name using this tree.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d1277aa3
    • N
      perf tools: Basic support for CGROUP event · ba78c1c5
      Namhyung Kim 提交于
      Implement basic functionality to support cgroup tracking.  Each cgroup
      can be identified by inode number which can be read from userspace too.
      The actual cgroup processing will come in the later patch.
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      [ fix perf test failure on sampling parsing ]
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba78c1c5
    • A
      perf python: Include rwsem.c in the pythong biding · 460c3ed9
      Arnaldo Carvalho de Melo 提交于
      We'll need it for the cgroup patches, and its better to have it in a
      separate patch in case we need to later revert the cgroup patches.
      
      I.e. without this we have:
      
        [root@five ~]# perf test -v python
        19: 'import perf' in python                               :
        --- start ---
        test child forked, pid 148447
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
        test child finished with -1
        ---- end ----
        'import perf' in python: FAILED!
        [root@five ~]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200403123606.GC23243@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      460c3ed9
  4. 26 3月, 2020 1 次提交
  5. 24 3月, 2020 10 次提交
    • R
      perf dso: Fix dso comparison · 0d33b343
      Ravi Bangoria 提交于
      Perf gets dso details from two different sources. 1st, from builid
      headers in perf.data and 2nd from MMAP2 samples. Dso from buildid
      header does not have dso_id detail. And dso from MMAP2 samples does
      not have buildid information. If detail of the same dso is present
      at both the places, filename is common.
      
      Previously, __dsos__findnew_link_by_longname_id() used to compare only
      long or short names, but Commit 0e3149f8 ("perf dso: Move dso_id
      from 'struct map' to 'struct dso'") also added a dso_id comparison.
      Because of that, now perf is creating two different dso objects of the
      same file, one from buildid header (with dso_id but without buildid)
      and second from MMAP2 sample (with buildid but without dso_id).
      
      This is causing issues with archive, buildid-list etc subcommands. Fix
      this by comparing dso_id only when it's present. And incase dso is
      present in 'dsos' list without dso_id, inject dso_id detail as well.
      
      Before:
      
        $ sudo ./perf buildid-list -H
        0000000000000000000000000000000000000000 /usr/bin/ls
        0000000000000000000000000000000000000000 /usr/lib64/ld-2.30.so
        0000000000000000000000000000000000000000 /usr/lib64/libc-2.30.so
      
        $ ./perf archive
        perf archive: no build-ids found
      
      After:
      
        $ ./perf buildid-list -H
        b6b1291d0cead046ed0fa5734037fa87a579adee /usr/bin/ls
        641f0c90cfa15779352f12c0ec3c7a2b2b6f41e8 /usr/lib64/ld-2.30.so
        675ace3ca07a0b863df01f461a7b0984c65c8b37 /usr/lib64/libc-2.30.so
      
        $ ./perf archive
        Now please run:
      
        $ tar xvf perf.data.tar.bz2 -C ~/.debug
      
        wherever you need to run 'perf report' on.
      
      Committer notes:
      
      Renamed is_empty_dso_id() to dso_id__empty() and inject_dso_id() to
      dso__inject_id() to keep namespacing consistent.
      
      Fixes: 0e3149f8 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
      Reported-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200324042424.68366-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0d33b343
    • C
      perf cpumap: Fix snprintf overflow check · d74b181a
      Christophe JAILLET 提交于
      'snprintf' returns the number of characters which would be generated for
      the given input.
      
      If the returned value is *greater than* or equal to the buffer size, it
      means that the output has been truncated.
      
      Fix the overflow test accordingly.
      
      Fixes: 7780c25b ("perf tools: Allow ability to map cpus to nodes easily")
      Fixes: 92a7e127 ("perf cpumap: Add cpu__max_present_cpu()")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Suggested-by: NDavid Laight <David.Laight@ACULAB.COM>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: He Zhe <zhe.he@windriver.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200324070319.10901-1-christophe.jaillet@wanadoo.frSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d74b181a
    • J
      perf pmu: Make pmu_uncore_alias_match() public · 5b9a5000
      John Garry 提交于
      The perf pmu-events test will want to use pmu_uncore_alias_match(), so
      make it public.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-7-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b9a5000
    • J
      perf pmu: Add is_pmu_core() · d504fae9
      John Garry 提交于
      Add a function to decide whether a PMU is a core PMU.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d504fae9
    • J
      perf pmu: Refactor pmu_add_cpu_aliases() · e45ad701
      John Garry 提交于
      Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller
      can pass the map; the pmu-events test would use this since there would
      be no CPUID matching to a mapfile there.
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e45ad701
    • K
      perf metricgroup: Fix printing event names of metric group with multiple... · 58fc90fd
      Kajol Jain 提交于
      perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events
      
      Commit f01642e4 ("perf metricgroup: Support multiple events for
      metricgroup") introduced support for multiple events in a metric group.
      But with the current upstream, metric events names are not printed
      properly incase we try to run multiple metric groups with overlapping
      event.
      
      With current upstream version, incase of overlapping metric events issue
      is, we always start our comparision logic from start.  So, the events
      which already matched with some metric group also take part in
      comparision logic. Because of that when we have overlapping events, we
      end up matching current metric group event with already matched one.
      
      For example, in skylake machine we have metric event CoreIPC and
      Instructions. Both of them need 'inst_retired.any' event value.  As
      events in Instructions is subset of events in CoreIPC, they endup in
      pointing to same 'inst_retired.any' value.
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
           1,254,992,790      inst_retired.any          # 1254992790.0
                                                          Instructions
                                                        #      1.3 CoreIPC
             977,172,805      cycles
           1,254,992,756      inst_retired.any
      
             1.000802596 seconds time elapsed
      
      command:# sudo ./perf stat -M UPI,IPC sleep 1
      
         Performance counter stats for 'sleep 1':
                 948,650      uops_retired.retire_slots
                 866,182      inst_retired.any          #      0.7 IPC
                 866,182      inst_retired.any
               1,175,671      cpu_clk_unhalted.thread
      
      Patch fixes the issue by adding a new bool pointer 'evlist_used' to keep
      track of events which already matched with some group by setting it
      true.  So, we skip all used events in list when we start comparision
      logic.  Patch also make some changes in comparision logic, incase we get
      a match miss, we discard the whole match and start again with first
      event id in metric event.
      
      With this patch:
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
               3,348,415      inst_retired.any          #      0.3 CoreIPC
              11,779,026      cycles
               3,348,381      inst_retired.any          # 3348381.0
                                                          Instructions
      
             1.001649056 seconds time elapsed
      
      command:# ./perf stat -M UPI,IPC sleep 1
      
       Performance counter stats for 'sleep 1':
      
               1,023,148      uops_retired.retire_slots #      1.1 UPI
                 924,976      inst_retired.any
                 924,976      inst_retired.any          #      0.6 IPC
               1,489,414      cpu_clk_unhalted.thread
      
             1.003064672 seconds time elapsed
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200221101121.28920-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      58fc90fd
    • J
      perf stat: Align the output for interval aggregation mode · d13e9e41
      Jin Yao 提交于
      There is a slight misalignment in -A -I output.
      
      For example:
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000440863 CPU0               1,068,388      cpu/event=cpu-cycles/
            1.000440863 CPU1                 875,954      cpu/event=cpu-cycles/
            1.000440863 CPU2               3,072,538      cpu/event=cpu-cycles/
            1.000440863 CPU3               4,026,870      cpu/event=cpu-cycles/
            1.000440863 CPU4               5,919,630      cpu/event=cpu-cycles/
            1.000440863 CPU5               2,714,260      cpu/event=cpu-cycles/
            1.000440863 CPU6               2,219,240      cpu/event=cpu-cycles/
            1.000440863 CPU7               1,299,232      cpu/event=cpu-cycles/
      
      The value of counts is not aligned with the column "counts" and
      the event name is not aligned with the column "events".
      
      With this patch, the output is,
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000423009 CPU0                  997,421      cpu/event=cpu-cycles/
            1.000423009 CPU1                1,422,042      cpu/event=cpu-cycles/
            1.000423009 CPU2                  484,651      cpu/event=cpu-cycles/
            1.000423009 CPU3                  525,791      cpu/event=cpu-cycles/
            1.000423009 CPU4                1,370,100      cpu/event=cpu-cycles/
            1.000423009 CPU5                  442,072      cpu/event=cpu-cycles/
            1.000423009 CPU6                  205,643      cpu/event=cpu-cycles/
            1.000423009 CPU7                1,302,250      cpu/event=cpu-cycles/
      
      Now output is aligned.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d13e9e41
    • J
      perf report: Support a new key to reload the browser · 5e3b810a
      Jin Yao 提交于
      Sometimes we may need to reload the browser to update the output since
      some options are changed.
      
      This patch creates a new key K_RELOAD. Once the __cmd_report() returns
      K_RELOAD, it would repeat the whole process, such as, read samples from
      data file, sort the data and display in the browser.
      
       v5:
       ---
       1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
       2. Skip setup_sorting() in repeat path if last key is K_RELOAD.
      
       v4:
       ---
       Need to quit in perf_evsel_menu__run if key is K_RELOAD.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5e3b810a
    • J
      perf report: Allow specifying event to be used as sort key in --group output · 429a5f9d
      Jin Yao 提交于
      When performing "perf report --group", it shows the event group
      information together. By default, the output is sorted by the first
      event in group.
      
      It would be nice for user to select any event for sorting. This patch
      introduces a new option "--group-sort-idx" to sort the output by the
      event at the index n in event group.
      
      For example,
      
      Before:
      
        # perf report --group --stdio
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             1.56%   0.01%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494ce
             1.56%   0.00%   0.00%   0.00%  mgen       [kernel.kallsyms]        [k] task_tick_fair
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             0.00%   0.03%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] g_main_context_check
             0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] apic_timer_interrupt
             ...
      
      After:
      
        # perf report --group --stdio --group-sort-idx 3
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             0.00%   0.00%   0.00%   0.06%  swapper    [kernel.kallsyms]        [k] hrtimer_start_range_ns
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] update_curr
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] apic_timer_interrupt
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] native_apic_msr_eoi_write
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] __update_load_avg_se
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] scheduler_tick
      
      Now the output is sorted by the fourth event in group.
      
       v7:
       ---
       Rebase to latest perf/core, no other change.
      
       v4:
       ---
       1. Update Documentation/perf-report.txt to mention
          '--group-sort-idx' support multiple groups with different
          amount of events and it should be used on grouped events.
      
       2. Update __hpp__group_sort_idx(), just return when the
          idx is out of limit.
      
       3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
          So now we don't need to use together with --group.
      
       v3:
       ---
       Refine the code in __hpp__group_sort_idx().
      
       Before:
         for (i = 1; i < nr_members; i++) {
              if (i == idx) {
                      ret = field_cmp(fields_a[i], fields_b[i]);
                      if (ret)
                              goto out;
              }
         }
      
       After:
         if (idx >= 1 && idx < nr_members) {
              ret = field_cmp(fields_a[idx], fields_b[idx]);
              if (ret)
                      goto out;
         }
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
      [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      429a5f9d
    • J
      perf report: Support interactive annotation of code without symbols · 7b0a0dcb
      Jin Yao 提交于
      For perf report on stripped binaries it is currently impossible to do
      annotation. The annotation state is all tied to symbols, but there are
      either no symbols, or symbols are not covering all the code.
      
      We should support the annotation functionality even without symbols.
      
      This patch fakes a symbol and the symbol name is the string of address.
      After that, we just follow current annotation working flow.
      
      For example,
      
      1. perf report
      
        Overhead  Command  Shared Object     Symbol
          20.67%  div      libc-2.27.so      [.] __random_r
          17.29%  div      libc-2.27.so      [.] __random
          10.59%  div      div               [.] 0x0000000000000628
           9.25%  div      div               [.] 0x0000000000000612
           6.11%  div      div               [.] 0x0000000000000645
      
      2. Select the line of "10.59%  div      div               [.] 0x0000000000000628" and ENTER.
      
        Annotate 0x0000000000000628
        Zoom into div thread
        Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel)
        Browse map details
        Run scripts for samples of symbol [0x0000000000000628]
        Run scripts for all samples
        Switch to another data file in PWD
        Exit
      
      3. Select the "Annotate 0x0000000000000628" and ENTER.
      
      Percent│
             │
             │
             │     Disassembly of section .text:
             │
             │     0000000000000628 <.text+0x68>:
             │       divsd %xmm4,%xmm0
             │       divsd %xmm3,%xmm1
             │       movsd (%rsp),%xmm2
             │       addsd %xmm1,%xmm0
             │       addsd %xmm2,%xmm0
             │       movsd %xmm0,(%rsp)
      
      Now we can see the dump of object starting from 0x628.
      
       v5:
       ---
       Remove the hotkey 'a' implementation from this patch. It
       will be moved to a separate patch.
      
       v4:
       ---
       1. Support the hotkey 'a'. When we press 'a' on address,
          now it supports the annotation.
      
       2. Change the patch title from
          "Support interactive annotation of code without symbols" to
          "perf report: Support interactive annotation of code without symbols"
      
       v3:
       ---
       Keep just the ANNOTATION_DUMMY_LEN, and remove the
       opts->annotate_dummy_len since it's the "maybe in future
       we will provide" feature.
      
       v2:
       ---
       Fix a crash issue when annotating an address in "unknown" object.
      
       The steps to reproduce this issue:
      
       perf record -e cycles:u ls
       perf report
      
          75.29%  ls       ld-2.27.so        [.] do_lookup_x
          23.64%  ls       ld-2.27.so        [.] __GI___tunables_init
           1.04%  ls       [unknown]         [k] 0xffffffff85c01210
           0.03%  ls       ld-2.27.so        [.] _start
      
       When annotating 0xffffffff85c01210, the crash happens.
      
       v2 adds checking for ms->map in add_annotate_opt(). If the object is
       "unknown", ms->map is NULL.
      
      Committer notes:
      
      Renamed new_annotate_sym() to symbol__new_unresolved().
      
      Use PRIx64 to fix this issue in some 32-bit arches:
      
        ui/browsers/hists.c: In function 'symbol__new_unresolved':
        ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
          snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr);
                                        ~~~~~~^                      ~~~~
                                        %-#.*llx
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7b0a0dcb
  6. 23 3月, 2020 3 次提交
    • J
      perf report: Print al_addr when symbol is not found · 443bc639
      Jin Yao 提交于
      For branch mode, if the symbol is not found, it prints
      the address.
      
      For example, 0x0000555eee0365a0 in below output.
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x0000555eee0365a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000555eee036769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000555eee036779                   [.] 0x0000555eee0365ff
           4.25%  div      div                   [.] 0x0000555eee0365fa                   [.] 0x0000555eee036760
      
      But it's not very easy to understand what the instructions
      are in the binary. So this patch uses the al_addr instead.
      
      With this patch, the output is
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x00000000000005a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000000000000769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000000000000779                   [.] 0x00000000000005ff
           4.25%  div      div                   [.] 0x00000000000005fa                   [.] 0x0000000000000760
      
      Now we can use objdump to dump the object starting from 0x5a0.
      
      For example,
      objdump -d --start-address 0x5a0 div
      
      00000000000005a0 <rand@plt>:
       5a0:   ff 25 2a 0a 20 00       jmpq   *0x200a2a(%rip)        # 200fd0 <__cxa_finalize@plt+0x200a20>
       5a6:   68 02 00 00 00          pushq  $0x2
       5ab:   e9 c0 ff ff ff          jmpq   570 <srand@plt-0x10>
       ...
      
      Committer testing:
      
        [root@seventh ~]# perf record -a -b sleep 1
        [root@seventh ~]# perf report --header-only | grep cpudesc
        # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
        [root@seventh ~]# perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        [root@seventh ~]#
      
      Before:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465c82  [.] 0x00007fe406465d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465ded  [.] 0x00007fe406465c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465e4e  [.] 0x00007fe406465de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      After:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7c82  [.] 0x00000000000f7d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7ded  [.] 0x00000000000f7c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7e4e  [.] 0x00000000000f7de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      Lets use -v to get full paths and then try objdump on the unresolved address:
      
        [root@seventh ~]# perf report -v --stdio --dso libsystemd-shared-241.so |& grep libsystemd-shared-241.so | tail -1
           0.04% systemd-journal /usr/lib/systemd/libsystemd-shared-241.so 0x80c1a B [.] 0x0000000000080c1a 0x80a95 B [.] 0x0000000000080a95 61
        [root@seventh ~]#
      
        [root@seventh ~]# objdump -d --start-address 0x00000000000f7d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        Disassembly of section .text:
      
        00000000000f7d80 <proc_cmdline_parse_given@@SD_SHARED+0x330>:
           f7d80:	41 39 11             	cmp    %edx,(%r9)
           f7d83:	0f 84 ff fe ff ff    	je     f7c88 <proc_cmdline_parse_given@@SD_SHARED+0x238>
           f7d89:	4c 8d 05 97 09 0c 00 	lea    0xc0997(%rip),%r8        # 1b8727 <utf8_skip_data@@SD_SHARED+0x3147>
           f7d90:	b9 49 00 00 00       	mov    $0x49,%ecx
           f7d95:	48 8d 15 c9 f5 0b 00 	lea    0xbf5c9(%rip),%rdx        # 1b7365 <utf8_skip_data@@SD_SHARED+0x1d85>
           f7d9c:	31 ff                	xor    %edi,%edi
           f7d9e:	48 8d 35 9b ff 0b 00 	lea    0xbff9b(%rip),%rsi        # 1b7d40 <utf8_skip_data@@SD_SHARED+0x2760>
           f7da5:	e8 a6 d6 f4 ff       	callq  45450 <log_assert_failed_realm@plt>
           f7daa:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
           f7db0:	41 56                	push   %r14
           f7db2:	41 55                	push   %r13
           f7db4:	41 54                	push   %r12
           f7db6:	55                   	push   %rbp
        [root@seventh ~]#
      
      If we tried the the reported address before this patch:
      
        [root@seventh ~]# objdump -d --start-address 0x00007fe406465d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        [root@seventh ~]#
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      443bc639
    • L
      perf symbols: Consolidate symbol fixup issue · 7eec00a7
      Leo Yan 提交于
      After copying Arm64's perf archive with object files and perf.data file
      to x86 laptop, the x86's perf kernel symbol resolution fails.  It
      outputs 'unknown' for all symbols parsing.
      
      This issue is root caused by the function elf__needs_adjust_symbols(),
      x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten
      their own version.  elf__needs_adjust_symbols() decides if need to parse
      symbols with the relative offset address; but x86 building uses the weak
      function which misses to check for the elf type 'ET_DYN', so that it
      cannot parse symbols in Arm DSOs due to the wrong result from
      elf__needs_adjust_symbols().
      
      The DSO parsing should not depend on any specific architecture perf
      building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa.
      And confirmed by Naveen N. Rao that powerpc64 kernels are not being
      built as ET_DYN anymore and change to ET_EXEC.
      
      This patch removes the arch specific functions for Arm64 and powerpc and
      changes elf__needs_adjust_symbols() as a common function.
      
      In the common elf__needs_adjust_symbols(), it checks an extra condition
      'ET_DYN' for elf header type.  With this fixing, the Arm64 DSO can be
      parsed properly with x86's perf tool.
      
      Before:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms])
      
      After:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms])
      
      v3: Changed to check for ET_DYN across all architectures.
      
      v2: Fixed Arm64 and powerpc native building.
      Reported-by: NMike Leach <mike.leach@linaro.org>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Allison Randal <allison@lohutok.net>
      Cc: Enrico Weigelt <info@metux.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7eec00a7
    • I
      perf parse-events: Fix 3 use after frees found with clang ASAN · d4953f7e
      Ian Rogers 提交于
      Reproducible with a clang asan build and then running perf test in
      particular 'Parse event definition strings'.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d4953f7e
  7. 18 3月, 2020 1 次提交