1. 15 2月, 2019 1 次提交
  2. 06 2月, 2019 1 次提交
  3. 22 1月, 2019 3 次提交
    • A
      perf utils: Move perf_config using routines from color.c to separate object · 32e9136e
      Arnaldo Carvalho de Melo 提交于
      To untangle objects a bit more, avoiding rebuilding the color_fprintf
      routines when changes are made to the perf config headers.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lkml.kernel.org/n/tip-8qvu2ek26antm3a8jyl4ocbq@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32e9136e
    • S
      perf tools: Handle PERF_RECORD_BPF_EVENT · 45178a92
      Song Liu 提交于
      This patch adds basic handling of PERF_RECORD_BPF_EVENT.  Tracking of
      PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to
      turn it on.
      
      Committer notes:
      
      Add dummy machine__process_bpf_event() variant that returns zero for
      systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking
      the build in such systems.
      
      Remove the needless include <machine.h> from bpf->event.h, provide just
      forward declarations for the structs and unions in the parameters, to
      reduce compilation time and needless rebuilds when machine.h gets
      changed.
      
      Committer testing:
      
      When running with:
      
       # perf record --bpf-event
      
      On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL
      is not present, we fallback to removing those two bits from
      perf_event_attr, making the tool to continue to work on older kernels:
      
        perf_event_attr:
          size                             112
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          mmap                             1
          comm                             1
          freq                             1
          enable_on_exec                   1
          task                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
          mmap2                            1
          comm_exec                        1
          ksymbol                          1
          bpf_event                        1
        ------------------------------------------------------------
        sys_perf_event_open: pid 5779  cpu 0  group_fd -1  flags 0x8
        sys_perf_event_open failed, error -22
        switching off bpf_event
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          mmap                             1
          comm                             1
          freq                             1
          enable_on_exec                   1
          task                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
          mmap2                            1
          comm_exec                        1
          ksymbol                          1
        ------------------------------------------------------------
        sys_perf_event_open: pid 5779  cpu 0  group_fd -1  flags 0x8
        sys_perf_event_open failed, error -22
        switching off ksymbol
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          mmap                             1
          comm                             1
          freq                             1
          enable_on_exec                   1
          task                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
          mmap2                            1
          comm_exec                        1
        ------------------------------------------------------------
      
      And then proceeds to work without those two features.
      
      As passing --bpf-event is an explicit action performed by the user, perhaps we
      should emit a warning telling that the kernel has no such feature, but this can
      be done on top of this patch.
      
      Now with a kernel that supports these events, start the 'record --bpf-event -a'
      and then run 'perf trace sleep 10000' that will use the BPF
      augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus
      should generate PERF_RECORD_BPF_EVENT events:
      
        [root@quaco ~]# perf record -e dummy -a --bpf-event
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.713 MB perf.data ]
      
        [root@quaco ~]# bpftool prog
        13: cgroup_skb  tag 7be49e3934a125ba  gpl
        	loaded_at 2019-01-19T09:09:43-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
        14: cgroup_skb  tag 2a142ef67aaad174  gpl
        	loaded_at 2019-01-19T09:09:43-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
        15: cgroup_skb  tag 7be49e3934a125ba  gpl
        	loaded_at 2019-01-19T09:09:43-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
        16: cgroup_skb  tag 2a142ef67aaad174  gpl
        	loaded_at 2019-01-19T09:09:43-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
        17: cgroup_skb  tag 7be49e3934a125ba  gpl
        	loaded_at 2019-01-19T09:09:44-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
        18: cgroup_skb  tag 2a142ef67aaad174  gpl
        	loaded_at 2019-01-19T09:09:44-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
        21: cgroup_skb  tag 7be49e3934a125ba  gpl
        	loaded_at 2019-01-19T09:09:45-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
        22: cgroup_skb  tag 2a142ef67aaad174  gpl
        	loaded_at 2019-01-19T09:09:45-0300  uid 0
        	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
        31: tracepoint  name sys_enter  tag 12504ba9402f952f  gpl
        	loaded_at 2019-01-19T09:19:56-0300  uid 0
        	xlated 512B  jited 374B  memlock 4096B  map_ids 30,29,28
        32: tracepoint  name sys_exit  tag c1bd85c092d6e4aa  gpl
        	loaded_at 2019-01-19T09:19:56-0300  uid 0
        	xlated 256B  jited 191B  memlock 4096B  map_ids 30,29
        # perf report -D | grep PERF_RECORD_BPF_EVENT | nl
           1	0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
           2	0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
           3	0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
           4	0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
           5	0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
           6	0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
           7	0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
           8	0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
           9	7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29
          10	7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29
          11	7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30
          12	7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30
          13	7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31
          14	7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32
        #
      
      There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      45178a92
    • T
      perf report: Display arch specific diagnostic counter sets, starting with s390 · 93115d32
      Thomas Richter 提交于
      On s390 the event bc000 (also named CF_DIAG) extracts the CPU
      Measurement Facility diagnostic counter sets and displays them as
      counter number and counter value pairs sorted by counter set number.
      
      Output:
       [root@s35lp76 perf]# ./perf report -D --stdio
      
       [00000000] Counterset:0 Counters:6
         Counter:000 Value:0x000000000085ec36 Counter:001 Value:0x0000000000796c94
         Counter:002 Value:0x0000000000005ada Counter:003 Value:0x0000000000092460
         Counter:004 Value:0x0000000000006073 Counter:005 Value:0x00000000001a9a73
       [0x000038] Counterset:1 Counters:2
         Counter:000 Value:0x000000000007c59f Counter:001 Value:0x000000000002fad6
       [0x000050] Counterset:2 Counters:16
         Counter:000 Value:000000000000000000 Counter:001 Value:000000000000000000
         Counter:002 Value:000000000000000000 Counter:003 Value:000000000000000000
         Counter:004 Value:000000000000000000 Counter:005 Value:000000000000000000
         Counter:006 Value:000000000000000000 Counter:007 Value:000000000000000000
         Counter:008 Value:000000000000000000 Counter:009 Value:000000000000000000
         Counter:010 Value:000000000000000000 Counter:011 Value:000000000000000000
         Counter:012 Value:000000000000000000 Counter:013 Value:000000000000000000
         Counter:014 Value:000000000000000000 Counter:015 Value:000000000000000000
       [0x0000d8] Counterset:3 Counters:128
         Counter:000 Value:0x000000000000020f Counter:001 Value:0x00000000000001d8
         Counter:002 Value:0x000000000000d7fa Counter:003 Value:0x000000000000008b
         ...
      
      The number in brackets is the offset into the raw data field of the
      sample.
      
      New functions trace_event_sample_raw__init() and s390_sample_raw() are
      introduced in the code path to enable interpretation on non s390
      platforms. This event bc000 attached raw data is generated only on s390
      platform. Correct display on other platforms requires correct endianness
      handling.
      
      Committer notes:
      
      Added a init function that sets up a evlist function pointer to avoid
      repeated tests on evlist->env and calls to perf_env__name() that
      involves normalizing, etc, for each PERF_RECORD_SAMPLE.
      
      Removed needless __maybe_unused from the trace_event_raw()
      prototype in session.h, move it to be an static function in evlist.
      
      The 'offset' variable is a size_t, not an u64, fix it to avoid this on
      some arches:
      
          CC       /tmp/build/perf/util/s390-sample-raw.o
        util/s390-sample-raw.c: In function 's390_cpumcfdg_testctr':
        util/s390-sample-raw.c:77:4: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'size_t' [-Werror=format=]
            pr_err("Invalid counter set entry at %#"  PRIx64 "\n",
            ^
        cc1: all warnings being treated as errors
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Link: https://lkml.kernel.org/r/9c856ac0-ef23-72b5-901d-a1f815508976@linux.ibm.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Link: https://lkml.kernel.org/n/tip-s3jhif06et9ug78qhclw41z1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      93115d32
  4. 18 12月, 2018 1 次提交
    • A
      perf tools: Support 'srccode' output · dd2e18e9
      Andi Kleen 提交于
      When looking at PT or brstackinsn traces with 'perf script' it can be
      very useful to see the source code. This adds a simple facility to print
      them with 'perf script', if the information is available through dwarf
      
        % perf record ...
        % perf script -F insn,ip,sym,srccode
        ...
      
                  4004c6 main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004c6 main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004cd main
        5               for (i = 0; i < 10000000; i++)
                   4004b3 main
        6                       v++;
      
        % perf record -b ...
        % perf script -F insn,ip,sym,srccode,brstackinsn
      
        ...
               main+22:
                0000000000400543        insn: e8 ca ff ff ff            # PRED
        |18                     f1();
                f1:
                0000000000400512        insn: 55
        |10       {
                0000000000400513        insn: 48 89 e5
                0000000000400516        insn: b8 00 00 00 00
        |11             f2();
                000000000040051b        insn: e8 d6 ff ff ff            # PRED
                f2:
                00000000004004f6        insn: 55
        |5        {
                00000000004004f7        insn: 48 89 e5
                00000000004004fa        insn: 8b 05 2c 0b 20 00
        |6              c = a / b;
                0000000000400500        insn: 8b 0d 2a 0b 20 00
                0000000000400506        insn: 99
                0000000000400507        insn: f7 f9
                0000000000400509        insn: 89 05 29 0b 20 00
                000000000040050f        insn: 90
        |7        }
                0000000000400510        insn: 5d
                0000000000400511        insn: c3                        # PRED
                f1+14:
                0000000000400520        insn: b8 00 00 00 00
        |12             f2();
                0000000000400525        insn: e8 cc ff ff ff            # PRED
                f2:
                00000000004004f6        insn: 55
        |5        {
                00000000004004f7        insn: 48 89 e5
                00000000004004fa        insn: 8b 05 2c 0b 20 00
        |6              c = a / b;
      
      Not supported for callchains currently, would need some layout changes
      there.
      
      Committer notes:
      
      Fixed the build on Alpine Linux (3.4 .. 3.8) by addressing this
      warning:
      
        In file included from util/srccode.c:19:0:
        /usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
         #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
          ^~~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181204001848.24769-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dd2e18e9
  5. 20 11月, 2018 1 次提交
  6. 31 8月, 2018 1 次提交
  7. 03 8月, 2018 1 次提交
    • T
      perf auxtrace: Support for perf report -D for s390 · b96e6615
      Thomas Richter 提交于
      Add initial support for s390 auxiliary traces using the CPU-Measurement
      Sampling Facility.
      
      Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data
      file. Later patches will show the contents of the auxiliary traces.
      
      Setup the auxtrace queues and data structures for s390.  A raw dump of
      the perf.data file now does not show an error when an auxtrace event is
      encountered.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace
        0x128 [0x10]: failed to process type: 70
        Error:
        failed to process sample
      
        0x128 [0x10]: event: 70
        .
        . ... raw event: size 16 bytes
        .  0000:  00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00  ...F............
      
        0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0
        [root@s35lp76 perf]#
      
      Output after:
      
         # ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE
        0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5
        0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
      	   offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
        ....
      
      Additional notes about the underlying hardware and software
      implementation, provided by Hendrik Brueckner (see Link: below).
      
      =============================================================================
      
      The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain
      performance information on the mainframe.  Basically, it was introduced
      with System z10 years ago for the z/Architecture, that means, 64-bit.
      For Linux, there are two facilities of interest, counter facility and sampling
      facility.  The counter facility provides hardware counters for instructions,
      cycles, crypto-activities, and many more.
      
      The sampling facility is a hardware sampler that when started will write
      samples at a particular interval into a sampling buffer.  At some point,
      for example, if a sample block is full, it generates an interrupt to collect
      samples (while the sampler continues to run).
      
      Few years ago, I started to provide the a perf PMU to use the counter
      and sampling facilities.  Recently, the device driver was updated to also
      "export" the sampling buffer into the AUX area.  Thomas now completed the
      related perf work to interpret and process these AUX data.
      
      If people are more interested in the sampling facility, they can have a
      look into:
      
      - The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05
        http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
      
      and to learn how-to use it for Linux on Z, have look at chapter 54,
      "Using the CPU-measurement facilities" in the:
      
      - Device Drivers, Features, and Commands, SC33-8411-34
        http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf
      
      =============================================================================
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b96e6615
  8. 04 6月, 2018 1 次提交
  9. 16 5月, 2018 1 次提交
    • A
      perf llvm-utils: Add bpf include path to clang command line · 1b16fffa
      Arnaldo Carvalho de Melo 提交于
      We'll start putting headers for helpers to be used in eBPF proggies in
      there:
      
        # perf trace -v --no-syscalls -e empty.c |& grep "llvm compiling command : "
        llvm compiling command : /usr/lib64/ccache/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41100   -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.17.0-rc3-00034-gf4ef6a43/build -c /home/acme/bpf/empty.c -target bpf -O2 -o -
        #
      
      Notice the "-I/home/acme/lib/include/perf/bpf"
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-6xq94xro8xlb5s9urznh3f9k@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b16fffa
  10. 17 3月, 2018 1 次提交
    • J
      perf tools: Add mem2node object · 4acf6142
      Jiri Olsa 提交于
      Adding mem2node object to allow the easy lookup of the node for the
      physical address.
      
      It has following interface:
      
        int  mem2node__init(struct mem2node *map, struct perf_env *env);
        void mem2node__exit(struct mem2node *map);
        int  mem2node__node(struct mem2node *map, u64 addr);
      
      The mem2node__toolsinit initialize object from the perf data file
      MEM_TOPOLOGY feature data. Following calls to mem2node__node will return
      node number for given physical address. The mem2node__exit function
      frees the object.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180309101442.9224-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4acf6142
  11. 25 1月, 2018 2 次提交
  12. 23 1月, 2018 1 次提交
  13. 17 1月, 2018 1 次提交
    • K
      perf tools: Add ARM Statistical Profiling Extensions (SPE) support · ffd3d18c
      Kim Phillips 提交于
      'perf record' and 'perf report --dump-raw-trace' supported in this
      release.
      
      Example usage:
      
       # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
       # perf report --dump-raw-trace
      
      Note that the perf.data file is portable, so the report can be run on
      another architecture host if necessary.
      
      Output will contain raw SPE data and its textual representation, such
      as:
      
      0x5c8 [0x30]: PERF_RECORD_AUXTRACE size: 0x200000  offset: 0  ref: 0x1891ad0e  idx: 1  tid: 2227  cpu: 1
      .
      . ... ARM SPE data: size 2097152 bytes
      .  00000000:  49 00                                           LD
      .  00000002:  b2 c0 3b 29 0f 00 00 ff ff                      VA 0xffff00000f293bc0
      .  0000000b:  b3 c0 eb 24 fb 00 00 00 80                      PA 0xfb24ebc0 ns=1
      .  00000014:  9a 00 00                                        LAT 0 XLAT
      .  00000017:  42 16                                           EV RETIRED L1D-ACCESS TLB-ACCESS
      .  00000019:  b0 00 c4 15 08 00 00 ff ff                      PC 0xff00000815c400 el3 ns=1
      .  00000022:  98 00 00                                        LAT 0 TOT
      .  00000025:  71 36 6c 21 2c 09 00 00 00                      TS 39395093558
      .  0000002e:  49 00                                           LD
      .  00000030:  b2 80 3c 29 0f 00 00 ff ff                      VA 0xffff00000f293c80
      .  00000039:  b3 80 ec 24 fb 00 00 00 80                      PA 0xfb24ec80 ns=1
      .  00000042:  9a 00 00                                        LAT 0 XLAT
      .  00000045:  42 16                                           EV RETIRED L1D-ACCESS TLB-ACCESS
      .  00000047:  b0 f4 11 16 08 00 00 ff ff                      PC 0xff0000081611f4 el3 ns=1
      .  00000050:  98 00 00                                        LAT 0 TOT
      .  00000053:  71 36 6c 21 2c 09 00 00 00                      TS 39395093558
      .  0000005c:  48 00                                           INSN-OTHER
      .  0000005e:  42 02                                           EV RETIRED
      .  00000060:  b0 2c ef 7f 08 00 00 ff ff                      PC 0xff0000087fef2c el3 ns=1
      .  00000069:  98 00 00                                        LAT 0 TOT
      .  0000006c:  71 d1 6f 21 2c 09 00 00 00                      TS 39395094481
      ...
      
      Other release notes:
      
      - applies to acme's perf/{core,urgent} branches, likely elsewhere
      
      - Report is self-contained within the tool.
        Record requires enabling the kernel SPE driver by
        setting CONFIG_ARM_SPE_PMU.
      
      - The intel-bts implementation was used as a starting point; its
        min/default/max buffer sizes and power of 2 pages granularity need to be
        revisited for ARM SPE
      
      - Recording across multiple SPE clusters/domains not supported
      
      - Snapshot support (record -S), and conversion to native perf events
        (e.g., via 'perf inject --itrace'), are also not supported
      
      - Technically both cs-etm and spe can be used simultaneously, however
        disabled for simplicity in this release
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20180114132850.0b127434b704a26bad13268f@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ffd3d18c
  14. 23 10月, 2017 1 次提交
  15. 22 9月, 2017 1 次提交
    • A
      perf tools: Provide mutex wrappers for pthreads rwlocks · 0a7c74ea
      Arnaldo Carvalho de Melo 提交于
      Andi reported a performance drop in single threaded perf tools such as
      'perf script' due to the growing number of locks being put in place to
      allow for multithreaded tools, so wrap the POSIX threads rwlock routines
      with the names used for such kinds of locks in the Linux kernel and then
      allow for tools to ask for those locks to be used or not.
      
      I.e. a tool may have a multithreaded phase and then switch to single
      threaded, like the upcoming patches for the synthesizing of
      PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to
      single threaded mode in 'perf top'.
      
      The init routines will not be conditional, this way starting as single
      threaded to then move to multi threaded mode should be possible.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a7c74ea
  16. 13 9月, 2017 1 次提交
    • A
      perf stat: Support JSON metrics in perf stat · b18f3e36
      Andi Kleen 提交于
      Add generic support for standalone metrics specified in JSON files to
      perf stat. A metric is a formula that uses multiple events to compute a
      higher level result (e.g. IPC).
      
      Previously metrics were always tied to an event and automatically
      enabled with that event. But now change it that we can have standalone
      metrics. They are in the same JSON data structure as events, but don't
      have an event name.
      
      We also allow to organize the metrics in metric groups, which allows a
      short cut to select several related metrics at once.
      
      Add a new -M / --metrics option to perf stat that adds the metrics or
      metric groups specified.
      
      Add the core code to manage and parse the metric groups. They are
      collected from the JSON data structures into a separate rblist.  When
      computing shadow values look for metrics in that list.  Then they are
      computed using the existing saved values infrastructure in stat-shadow.c
      
      The actual JSON metrics are in a separate pull request.
      
        % perf stat -M Summary --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Instructions   CLKS          CPU_Utilization  GFLOPs   SMT_2T_Utilization   Kernel_Utilization
        317614222.0    1392930775.0  0.0              0.0      0.2                  0.1
      
             1.001497549 seconds time elapsed
      
        % perf stat -M GFLOPs flops
      
         Performance counter stats for 'flops':
      
           3,999,541,471  fp_comp_ops_exe.sse_scalar_single #  1.2 GFLOPs   (66.65%)
                      14  fp_comp_ops_exe.sse_scalar_double                 (66.65%)
                       0  fp_comp_ops_exe.sse_packed_double                 (66.67%)
                       0  fp_comp_ops_exe.sse_packed_single                 (66.70%)
                       0  simd_fp_256.packed_double                         (66.70%)
                       0  simd_fp_256.packed_single                         (66.67%)
                       0  duration_time
      
             3.238372845 seconds time elapsed
      
      v2: Add missing header file
      v3: Move find_map to pmu.c
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b18f3e36
  17. 22 8月, 2017 1 次提交
  18. 19 7月, 2017 2 次提交
  19. 26 4月, 2017 1 次提交
  20. 21 4月, 2017 1 次提交
  21. 20 4月, 2017 1 次提交
  22. 28 3月, 2017 1 次提交
  23. 23 3月, 2017 1 次提交
    • A
      perf tools: Add a simple expression parser for JSON · 07516736
      Andi Kleen 提交于
      Add a simple expression parser good enough to parse JSON relation
      expressions. The parser is implemented using bison.
      
      This is just intended as an simple parser for internal usage in the
      event lists, not the beginning of a "perf scripting language"
      
      v2: Use expr__ prefix instead of expr_
          Support multiple free variables for parser
      
      Committer note:
      
      The v2 patch had:
      
        %define api.pure full
      
      In expr.y, that is a feature introduced in bison 2.7, to have reentrant
      parsers, not using global variables, which would make tools/perf stop
      building with the bison version shipped in older distros, so Andi
      realised that the other parsers (e.g. parse-events.y) were using:
      
        %pure-parser
      
      Which is present in older versions of bison and fits the bill.
      
      I added:
      
        CFLAGS_expr-bison.o += -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w
      
      To finally make it build, copying what was there for pmu-bison.o,
      another parser.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-8-andi@firstfloor.org
      [ stdlib.h is needed in tests/expr.c for free() fixing build in systems such as ubuntu:16.04-x-s390 ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      07516736
  24. 16 3月, 2017 1 次提交
    • A
      perf script: Add 'brstackinsn' for branch stacks · 48d02a1d
      Andi Kleen 提交于
      Implement printing instruction sequences as hex dump for branch stacks.
      
      This relies on the x86 instruction decoder used by the PT decoder to
      find the lengths of instructions to dump them individually.
      
      This is good enough for pattern matching.
      
      This allows to study hot paths for individual samples, together with
      branch misprediction and cycle count / IPC information if available (on
      Skylake systems).
      
        % perf record -b ...
        % perf script -F brstackinsn
        ...
          read_hpet+67:
                ffffffff9905b843        insn: 74 ea                     # PRED
                ffffffff9905b82f        insn: 85 c9
                ffffffff9905b831        insn: 74 12
                ffffffff9905b833        insn: f3 90
                ffffffff9905b835        insn: 48 8b 0f
                ffffffff9905b838        insn: 48 89 ca
                ffffffff9905b83b        insn: 48 c1 ea 20
                ffffffff9905b83f        insn: 39 f2
                ffffffff9905b841        insn: 89 d0
                ffffffff9905b843        insn: 74 ea                     # PRED
      
      Only works when no special branch filters are specified.
      
      Occasionally the path does not reach up to the sample IP, as the LBRs
      may be frozen before executing a final jump. In this case we print a
      special message.
      
      The instruction dumper piggy backs on the existing infrastructure from
      the IP PT decoder.
      
      An earlier iteration of this patch relied on a disassembler, but this
      version only uses the existing instruction decoder.
      
      Committer note:
      
      Added hint about how to get suitable perf.data files for use with
      '-F brstackinsm':
      
        $ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
        $
        $ perf script -F brstackinsn
        Display of branch stack assembler requested, but non all-branch filter set
        Hint: run 'perf record -b ...'
        $
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Link: http://lkml.kernel.org/r/20170223234634.583-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      48d02a1d
  25. 14 3月, 2017 1 次提交
    • H
      perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info · f3b3614a
      Hari Bathini 提交于
      Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
      by the kernel when fork, clone, setns or unshare are invoked. And update
      perf-record documentation with the new option to record namespace
      events.
      
      Committer notes:
      
      Combined it with a later patch to allow printing it via 'perf report -D'
      and be able to test the feature introduced in this patch. Had to move
      here also perf_ns__name(), that was introduced in another later patch.
      
      Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
      
        util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
           ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
                                               ^
      Testing it:
      
        # perf record --namespaces -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
        #
        # perf report -D
        <SNIP>
        3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
                      [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                       4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      
        0x1151e0 [0x30]: event: 9
        .
        . ... raw event: size 48 bytes
        .  0000:  09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00  ......0..q.h....
        .  0010:  a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00  .9...9...(.c....
        .  0020:  03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00  ................
        <SNIP>
              NAMESPACES events:          1
        <SNIP>
        #
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3b3614a
  26. 17 1月, 2017 1 次提交
  27. 06 12月, 2016 1 次提交
    • W
      perf clang: Add builtin clang support ant test case · 00b86691
      Wang Nan 提交于
      Add basic clang support in clang.cpp and test__clang() testcase. The
      first testcase checks if builtin clang is able to generate LLVM IR.
      
      tests/clang.c is a proxy. Real testcase resides in
      utils/c++/clang-test.cpp in c++ and exports C interface to perf test
      subsystem.
      
      Test result:
      
         $ perf test -v clang
         51: builtin clang support                               :
         51.1: Test builtin clang compile C source to IR              :
         --- start ---
         test child forked, pid 13215
         test child finished with 0
         ---- end ----
         Test builtin clang support subtest 0: Ok
      
      Committer note:
      
      Make sure you've enabled CLANG and LLVM builtin support by setting
      the LIBCLANGLLVM variable on the make command line, e.g.:
      
        make LIBCLANGLLVM=1 O=/tmp/build/perf -C tools/perf install-bin
      
      Otherwise you'll get this when trying to do the 'perf test' call above:
      
        # perf test clang
        51: builtin clang support                      : Skip (not compiled in)
        #
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/20161126070354.141764-11-wangnan0@huawei.com
      [ Removed "Test" from descriptions, redundant and already removed from all the other entries ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      00b86691
  28. 02 12月, 2016 1 次提交
  29. 29 11月, 2016 1 次提交
    • W
      perf tools: Introduce perf hooks · a074865e
      Wang Nan 提交于
      Perf hooks allow hooking user code at perf events. They can be used for
      manipulation of BPF maps, taking snapshot and reporting results. In this
      patch two perf hook points are introduced: record_start and record_end.
      
      To avoid buggy user actions, a SIGSEGV signal handler is introduced into
      'perf record'. It turns off perf hook if it causes a segfault and report
      an error to help debugging.
      
      A test case for perf hook is introduced.
      
      Test result:
        $ ./buildperf/perf test -v hook
        50: Test perf hooks                                          :
        --- start ---
        test child forked, pid 10311
        SIGSEGV is observed as expected, try to recover.
        Fatal error (SEGFAULT) in perf hook 'test'
        test child finished with 0
        ---- end ----
        Test perf hooks: Ok
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/20161126070354.141764-5-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a074865e
  30. 24 10月, 2016 1 次提交
  31. 22 9月, 2016 1 次提交
  32. 09 9月, 2016 1 次提交
    • P
      perf annotate: Add branch stack / basic block · 70fbe057
      Peter Zijlstra 提交于
      I wanted to know the hottest path through a function and figured the
      branch-stack (LBR) information should be able to help out with that.
      
      The below uses the branch-stack to create basic blocks and generate
      statistics from them.
      
              from    to              branch_i
              * ----> *
                      |
                      | block
                      v
                      * ----> *
                      from    to      branch_i+1
      
      The blocks are broken down into non-overlapping ranges, while tracking
      if the start of each range is an entry point and/or the end of a range
      is a branch.
      
      Each block iterates all ranges it covers (while splitting where required
      to exactly match the block) and increments the 'coverage' count.
      
      For the range including the branch we increment the taken counter, as
      well as the pred counter if flags.predicted.
      
      Using these number we can find if an instruction:
      
       - had coverage; given by:
      
              br->coverage / br->sym->max_coverage
      
         This metric ensures each symbol has a 100% spot, which reflects the
         observation that each symbol must have a most covered/hottest
         block.
      
       - is a branch target: br->is_target && br->start == add
      
       - for targets, how much of a branch's coverages comes from it:
      
      	target->entry / branch->coverage
      
       - is a branch: br->is_branch && br->end == addr
      
       - for branches, how often it was taken:
      
              br->taken / br->coverage
      
         after all, all execution that didn't take the branch would have
         incremented the coverage and continued onward to a later branch.
      
       - for branches, how often it was predicted:
      
              br->pred / br->taken
      
      The coverage percentage is used to color the address and asm sections;
      for low (<1%) coverage we use NORMAL (uncolored), indicating that these
      instructions are not 'important'. For high coverage (>75%) we color the
      address RED.
      
      For each branch, we add an asm comment after the instruction with
      information on how often it was taken and predicted.
      
      Output looks like (sans color, which does loose a lot of the
      information :/)
      
      $ perf record --branch-filter u,any -e cycles:p ./branches 27
      $ perf annotate branches
      
       Percent |	Source code & Disassembly of branches for cycles:pu (217 samples)
      ---------------------------------------------------------------------------------
               :	branches():
          0.00 :	  40057a:       push   %rbp
          0.00 :	  40057b:       mov    %rsp,%rbp
          0.00 :	  40057e:       sub    $0x20,%rsp
          0.00 :	  400582:       mov    %rdi,-0x18(%rbp)
          0.00 :	  400586:       mov    %rsi,-0x20(%rbp)
          0.00 :	  40058a:       mov    -0x18(%rbp),%rax
          0.00 :	  40058e:       mov    %rax,-0x10(%rbp)
          0.00 :	  400592:       movq   $0x0,-0x8(%rbp)
          0.00 :	  40059a:       jmpq   400656 <branches+0xdc>
          1.84 :	  40059f:       mov    -0x10(%rbp),%rax	# +100.00%
          3.23 :	  4005a3:       and    $0x1,%eax
          1.84 :	  4005a6:       test   %rax,%rax
          0.00 :	  4005a9:       je     4005bf <branches+0x45>	# -54.50% (p:42.00%)
          0.46 :	  4005ab:       mov    0x200bbe(%rip),%rax        # 601170 <acc>
         12.90 :	  4005b2:       add    $0x1,%rax
          2.30 :	  4005b6:       mov    %rax,0x200bb3(%rip)        # 601170 <acc>
          0.46 :	  4005bd:       jmp    4005d1 <branches+0x57>	# -100.00% (p:100.00%)
          0.92 :	  4005bf:       mov    0x200baa(%rip),%rax        # 601170 <acc>	# +49.54%
         13.82 :	  4005c6:       sub    $0x1,%rax
          0.46 :	  4005ca:       mov    %rax,0x200b9f(%rip)        # 601170 <acc>
          2.30 :	  4005d1:       mov    -0x10(%rbp),%rax	# +50.46%
          0.46 :	  4005d5:       mov    %rax,%rdi
          0.46 :	  4005d8:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  4005dd:       mov    %rax,-0x10(%rbp)	# +100.00%
          0.92 :	  4005e1:       mov    -0x18(%rbp),%rax
          0.00 :	  4005e5:       and    $0x1,%eax
          0.00 :	  4005e8:       test   %rax,%rax
          0.00 :	  4005eb:       je     4005ff <branches+0x85>	# -100.00% (p:100.00%)
          0.00 :	  4005ed:       mov    0x200b7c(%rip),%rax        # 601170 <acc>
          0.00 :	  4005f4:       shr    $0x2,%rax
          0.00 :	  4005f8:       mov    %rax,0x200b71(%rip)        # 601170 <acc>
          0.00 :	  4005ff:       mov    -0x10(%rbp),%rax	# +100.00%
          7.37 :	  400603:       and    $0x1,%eax
          3.69 :	  400606:       test   %rax,%rax
          0.00 :	  400609:       jne    400612 <branches+0x98>	# -59.25% (p:42.99%)
          1.84 :	  40060b:       mov    $0x1,%eax
         14.29 :	  400610:       jmp    400617 <branches+0x9d>	# -100.00% (p:100.00%)
          1.38 :	  400612:       mov    $0x0,%eax	# +57.65%
         10.14 :	  400617:       test   %al,%al	# +42.35%
          0.00 :	  400619:       je     40062f <branches+0xb5>	# -57.65% (p:100.00%)
          0.46 :	  40061b:       mov    0x200b4e(%rip),%rax        # 601170 <acc>
          2.76 :	  400622:       sub    $0x1,%rax
          0.00 :	  400626:       mov    %rax,0x200b43(%rip)        # 601170 <acc>
          0.46 :	  40062d:       jmp    400641 <branches+0xc7>	# -100.00% (p:100.00%)
          0.92 :	  40062f:       mov    0x200b3a(%rip),%rax        # 601170 <acc>	# +56.13%
          2.30 :	  400636:       add    $0x1,%rax
          0.92 :	  40063a:       mov    %rax,0x200b2f(%rip)        # 601170 <acc>
          0.92 :	  400641:       mov    -0x10(%rbp),%rax	# +43.87%
          2.30 :	  400645:       mov    %rax,%rdi
          0.00 :	  400648:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  40064d:       mov    %rax,-0x10(%rbp)	# +100.00%
          1.84 :	  400651:       addq   $0x1,-0x8(%rbp)
          0.92 :	  400656:       mov    -0x8(%rbp),%rax
          5.07 :	  40065a:       cmp    -0x20(%rbp),%rax
          0.00 :	  40065e:       jb     40059f <branches+0x25>	# -100.00% (p:100.00%)
          0.00 :	  400664:       nop
          0.00 :	  400665:       leaveq
          0.00 :	  400666:       retq
      
      (Note: the --branch-filter u,any was used to avoid spurious target and
      branch points due to interrupts/faults, they show up as very small -/+
      annotations on 'weird' locations)
      
      Committer note:
      
      Please take a look at:
      
        http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
      
      To see the colors.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      [ Moved sym->max_coverage to 'struct annotate', aka symbol__annotate(sym) ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70fbe057
  33. 01 9月, 2016 1 次提交
  34. 28 7月, 2016 1 次提交
  35. 13 7月, 2016 2 次提交
    • D
      perf symbols: Add Rust demangling · cae15db7
      David Tolnay 提交于
      Rust demangling is another step after bfd demangling. Add a diagnosis to
      identify mangled Rust symbols based on the hash that the Rust mangler appends
      as the last path component, as well as other characteristics.  Add a demangler
      to reconstruct the original symbol.
      
      Committer notes:
      
      How I tested it:
      
      Enabled COPR on Fedora 24 and then installed the 'rust-binary' package,
      with it:
      
        $ cat src/main.rs
        fn main() {
            println!("Hello, world!");
        }
        $ cat Cargo.toml
        [package]
      
        name = "hello_world"
        version = "0.0.1"
        authors = [ "Arnaldo Carvalho de Melo <acme@kernel.org>" ]
      
        $ perf record cargo bench
         Compiling hello_world v0.0.1 (file:///home/acme/projects/hello_world)
           Running target/release/hello_world-d4b9dab4b2a47d75
      
        running 0 tests
      
        test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
      
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.096 MB perf.data (1457 samples) ]
        $
      
      Before this patch:
      
        $ perf report --stdio --dsos librbml-e8edd0fd.so
        # dso: librbml-e8edd0fd.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:u'
        # Event count (approx.): 979599126
        #
        # Overhead  Command  Symbol
        # ........  .......  .............................................................................................................
        #
             1.78%  rustc    [.] rbml::reader::maybe_get_doc::hb9d387df6024b15b
             1.50%  rustc    [.] _$LT$reader..DocsIterator$LT$$u27$a$GT$$u20$as$u20$std..iter..Iterator$GT$::next::hd9af9e60d79a35c8
             1.20%  rustc    [.] rbml::reader::doc_at::hc88107fba445af31
             0.46%  rustc    [.] _$LT$reader..TaggedDocsIterator$LT$$u27$a$GT$$u20$as$u20$std..iter..Iterator$GT$::next::h0cb40e696e4bb489
             0.35%  rustc    [.] rbml::reader::Decoder::_next_int::h66eef7825a398bc3
             0.29%  rustc    [.] rbml::reader::Decoder::_next_sub::h8e5266005580b836
             0.15%  rustc    [.] rbml::reader::get_doc::h094521c645459139
             0.14%  rustc    [.] _$LT$reader..Decoder$LT$$u27$doc$GT$$u20$as$u20$serialize..Decoder$GT$::read_u32::h0acea2fff9669327
             0.07%  rustc    [.] rbml::reader::Decoder::next_doc::h6714d469c9dfaf91
             0.07%  rustc    [.] _ZN4rbml6reader10doc_as_u6417h930b740aa94f1d3aE@plt
             0.06%  rustc    [.] _fini
        $
      
      After:
      
        $ perf report --stdio --dsos librbml-e8edd0fd.so
        # dso: librbml-e8edd0fd.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:u'
        # Event count (approx.): 979599126
        #
        # Overhead  Command  Symbol
        # ........  .......  .................................................................
        #
           1.78%  rustc    [.] rbml::reader::maybe_get_doc
           1.50%  rustc    [.] <reader::DocsIterator<'a> as std::iter::Iterator>::next
           1.20%  rustc    [.] rbml::reader::doc_at
           0.46%  rustc    [.] <reader::TaggedDocsIterator<'a> as std::iter::Iterator>::next
           0.35%  rustc    [.] rbml::reader::Decoder::_next_int
           0.29%  rustc    [.] rbml::reader::Decoder::_next_sub
           0.15%  rustc    [.] rbml::reader::get_doc
           0.14%  rustc    [.] <reader::Decoder<'doc> as serialize::Decoder>::read_u32
           0.07%  rustc    [.] rbml::reader::Decoder::next_doc
           0.07%  rustc    [.] _ZN4rbml6reader10doc_as_u6417h930b740aa94f1d3aE@plt
           0.06%  rustc    [.] _fini
        $
      Signed-off-by: NDavid Tolnay <dtolnay@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5780B7FA.3030602@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cae15db7
    • A
      perf tools: Uninline scnprintf() and vscnprint() · d0761e37
      Arnaldo Carvalho de Melo 提交于
      They were in tools/include/linux/kernel.h, requiring that it in turn
      included stdio.h, which is way too heavy.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-855h8olnkot9v0dajuee1lo3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0761e37