1. 18 8月, 2017 2 次提交
  2. 16 8月, 2017 1 次提交
    • W
      perf bpf: Fix endianness problem when loading parameters in prologue · db26984a
      Wang Nan 提交于
      Perf's BPF prologue generator unconditionally fetches 8 bytes for
      function parameters, which causes problems on big endian machines. Thomas
      gives a detailed analysis for this problem:
      
       http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d151@linux.vnet.ibm.com
      
       ---- 8< ----
        I investigated perf test BPF for s390x and have a question regarding
        the 38.3 subtest (bpf-prologue test) which fails on s390x.
      
        When I turn on trace_printk in tests/bpf-script-test-prologue.c
        I see this output in /sys/kernel/debug/tracing/trace:
      
        [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
        perf-30229 [000] d..2 170161.535791: : f_mode 2001d00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535809: : f_mode 6001f00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535815: : f_mode 6001f00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535819: : f_mode 2001d00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535822: : f_mode 2001d00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535825: : f_mode 6001f00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535828: : f_mode 6001f00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535832: : f_mode 2001d00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535835: : f_mode 2001d00000000 offset:4 orig:0
        perf-30229 [000] d..2 170161.535841: : f_mode 6001f00000000 offset:4 orig:0
      
        [...]
      
        There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
        accesses: f_mode (member of struct file at offset 140) offset and orig.  They
        are parameters of the lseek() system call triggered in this test case in
        function llseek_loop().
      
        What is really strange is the value of f_mode. It is an 8 byte value, whereas
        in the probe event it is defined as a 4 byte value.  The lower 4 bytes are all
        zero and do not belong to member f_mode.  The correct value should be 2001d for
        read-only and 6001f for read-write open mode.
      
        Here is the output of the 'perf test -vv bpf' trace:
        Try to find probe point from debuginfo.
        Matched function: null_lseek [2d9310d]
         Probe point found: null_lseek+0
        Searching 'file' variable in context.
        Converting variable file into trace event.
        converting f_mode in file
        f_mode type is unsigned int.
        Opening /sys/kernel/debug/tracing//README write=0
        Searching 'offset' variable in context.
        Converting variable offset into trace event.
        offset type is long long int.
        Searching 'orig' variable in context.
        Converting variable orig into trace event.
        orig type is int.
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing//kprobe_events write=1
        Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
       ---- 8< ----
      
      This patch parses the type of each argument and converts data from memory to
      expected type.
      
      Now the test runs successfully on 4.13.0-rc5:
      
        [root@s8360046 perf]# ./perf test  bpf
        38: BPF filter                                 :
        38.1: Basic BPF filtering                      : Ok
        38.2: BPF pinning                              : Ok
        38.3: BPF prologue generation                  : Ok
        38.4: BPF relocation checker                   : Ok
        [root@s8360046 perf]#
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170815092159.31912-1-tmricht@linux.vnet.ibm.comSigned-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db26984a
  3. 12 8月, 2017 4 次提交
    • T
      perf report: Fix module symbol adjustment for s390x · 4a084ecf
      Thomas Richter 提交于
      The 'perf report' tool does not display the addresses of kernel module
      symbols correctly.
      
      For example symbol qeth_send_ipa_cmd in kernel module qeth.ko has this
      relative address for function qeth_send_ipa_cmd():
      
        [root@s8360047 linux]# nm -g drivers/s390/net/qeth.ko | fgrep send_ipa_cmd
        0000000000013088 T qeth_send_ipa_cmd
      
      The module is loaded at address:
      
        [root@s8360047 linux]# cat /sys/module/qeth/sections/.text
        0x000003ff80296d20
        [root@s8360047 linux]#
      
      This should result in a start address of:
      
        0x13088 + 0x3ff80296d20 = 0x3ff802a9da8
      
      Using crash to verify the address on a live system:
      
        [root@s8360046 linux]# crash vmlinux
      
        crash 7.1.9++
        Copyright (C) 2002-2016  Red Hat, Inc.
        Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
      
        [...]
      
        crash> mod -s qeth drivers/s390/net/qeth.ko
             MODULE       NAME        SIZE  OBJECT FILE
             3ff8028d700  qeth      151552  drivers/s390/net/qeth.ko
        crash> sym qeth_send_ipa_cmd
        3ff802a9da8 (T) qeth_send_ipa_cmd [qeth] /root/linux/drivers/s390/net/qeth_core_main.c: 2944
        crash>
      
      Now perf report displays the address of symbol qeth_send_ipa_cmd:
      symbol__new:
      
        qeth_send_ipa_cmd 0x130f0-0x132ce
      
      There is a difference of 0x68 between the entry in the symbol table (see
      nm command above) and perf. The difference is from the offset the .text
      segment of qeth.ko:
      
        [root@s8360047 perf]# readelf -a drivers/s390/net/qeth.ko
        Section Headers:
        [Nr] Name              Type             Address           Offset
             Size              EntSize          Flags  Link  Info  Align
        [ 0]                   NULL             0000000000000000  00000000
             0000000000000000  0000000000000000           0     0     0
        [ 1] .note.gnu.build-i NOTE             0000000000000000  00000040
             0000000000000024  0000000000000000   A       0     0     4
        [ 2] .text             PROGBITS         0000000000000000  00000068
             000000000001c8a0  0000000000000000  AX       0     0     8
      
      As seen the .text segment has an offset of 0x68 with start address 0x0.
      Therefore 0x68 is added to the address of qeth_send_ipa_cmd and thus
      0x13088 + 0x68 = 0x130f0 is displayed.
      
      This is wrong, perf report needs to display the start address of symbol
      qeth_send_ipa_cmd at 0x13088 + qeth.ko.text section start address.
      
      The qeth.ko module .text start address is available in the qeth.ko DSO
      map. Just identify the kernel module symbols and correct the addresses.
      
      With the fix I see this correct address for symbol: symbol__new:
      qeth_send_ipa_cmd 0x3ff802a9da8-0x3ff802a9f86
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-1-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-q8lktlpoxb5e3dj52u1s1rw4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a084ecf
    • T
      perf record: Fix wrong size in perf_record_mmap for last kernel module · 9ad4652b
      Thomas Richter 提交于
      During work on perf report for s390 I ran into the following issue:
      
      0 0x318 [0x78]: PERF_RECORD_MMAP -1/0:
              [0x3ff804d6990(0xfffffc007fb2966f) @ 0]:
              x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko
      
      This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid
      module size for qeth_l2.ko (the s390 ethernet device driver).
      
      Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory.
      
      It turned out that this wrong size is created by the perf record
      command.  What happens is this function call sequence from
      __cmd_record():
      
        perf_session__new():
          perf_session__create_kernel_maps():
            machine__create_kernel_maps():
              machine__create_modules():   Creates map for all loaded kernel modules.
                modules__parse():   Reads /proc/modules and extracts module name and
                                    load address (1st and last column)
                  machine__create_module():   Called for every module found in /proc/modules.
                                    Creates a new map for every module found and enters
                                    module name and start address into the map. Since the
                                    module end address is unknown it is set to zero.
      
      This ends up with a kernel module map list sorted by module start
      addresses.  All module end addresses are zero.
      
      Last machine__create_kernel_maps() calls function map_groups__fixup_end().
      This function iterates through the maps and assigns each map entry's
      end address the successor map entry start address. The last entry of the
      map group has no successor, so ~0 is used as end to consume the remaining
      memory.
      
      Later __cmd_record calls function record__synthesize() which in turn calls
      perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules()
      to create PERF_REPORT_MMAP entries into the perf.data file.
      
      On s390 this results in the last module qeth_l2.ko
      (which has highest start address, see module table:
              [root@s8360047 perf]# cat /proc/modules
              qeth_l2 86016 1 - Live 0x000003ff804d6000
              qeth 266240 1 qeth_l2, Live 0x000003ff80296000
              ccwgroup 24576 1 qeth, Live 0x000003ff80218000
              vmur 36864 0 - Live 0x000003ff80182000
              qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000
              [root@s8360047 perf]# )
      to be the last entry and its map has an end address of ~0.
      
      When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko
      its start address and length is written. The length is calculated in line:
          event->mmap.len   = pos->end - pos->start;
      and results in 0xffffffffffffffff - 0x3ff804d6990(*) = 0xfffffc007fb2966f
      
      (*) On s390 the module start address is actually determined by a __weak function
      named arch__fix_module_text_start() in machine__create_module().
      
      I think this improvable. We can use the module size (2nd column of /proc/modules)
      to get each loaded kernel module size and calculate its end address.
      Only for map entries which do not have a valid end address (end is still zero)
      we can use the heuristic we have now, that is use successor start address or ~0.
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ad4652b
    • M
      perf srcline: Do not consider empty files as valid srclines · d964b1cd
      Milian Wolff 提交于
      Sometimes we get a non-null, but empty, string for the filename from
      bfd. This then results in srclines of the form ":0", which is different
      from the canonical SRCLINE_UNKNOWN in the form "??:0".  Set the file to
      NULL if it is empty to fix this.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170806212446.24925-14-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d964b1cd
    • M
      perf util: Take elf_name as const string in dso__demangle_sym · 80c345b2
      Milian Wolff 提交于
      The input string is not modified and thus can be passed in as a pointer
      to const data.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170806212446.24925-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      80c345b2
  4. 11 8月, 2017 2 次提交
  5. 29 7月, 2017 2 次提交
    • G
      perf data: Add mmap[2] events to CTF conversion · f9f6f2a9
      Geneviève Bastien 提交于
      This adds the mmap and mmap2 events to the CTF trace obtained from perf
      data.
      
      These events will allow CTF trace visualization tools like Trace Compass
      to automatically resolve the symbols of the callchain to the
      corresponding function or origin library.
      
      To include those events, one needs to convert with the --all option.
      Here follows an output of babeltrace:
      
        $ sudo perf data convert --all --to-ctf myctftrace
        $ babeltrace ./myctftrace
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 },
       { pid = 638, tid = 638, start = 0x7F54AE39E000, filename =
       "/usr/lib/ld-2.25.so" }
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 }, { pid =
       638, tid = 638, start = 0x7F54AE565000, filename =
       "/usr/lib/libudev.so.1.6.6" }
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 }, { pid =
       638, tid = 638, start = 0x7FFC093EA000, filename = "[vdso]" }
      Signed-off-by: NGeneviève Bastien <gbastien@versatic.net>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
      Cc: Julien Desfossez <jdesfossez@efficios.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170727181205.24843-2-gbastien@versatic.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f9f6f2a9
    • G
      perf data: Add callchain to CTF conversion · a3073c8e
      Geneviève Bastien 提交于
      The field perf_callchain, if available, is added to the sampling events
      during the CTF conversion. It is an array of u64 values.  The
      perf_callchain_size field contains the size of the array.
      
      It will allow the analysis of sampling data in trace visualization tools
      like Trace Compass. Possible analyses with those data: dynamic
      flamegraphs, correlation with other tracing data like a userspace trace.
      
      Here follows a babeltrace CTF output of a trace with callchain:
      
        $ babeltrace ./myctftrace
        [17:38:45.672760285] (+?.?????????) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 1, perf_callchain_size = 7, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770, [3] = 0xFFFFFFFF81006EC6, [4] = 0xFFFFFFFF8118245E, [5] = 0xFFFFFFFF810A9224, [6] = 0xFFFFFFFF8164A4C6 ] }
        [17:38:45.672777672] (+0.000017387) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 1, perf_callchain_size = 8, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770, [3] = 0xFFFFFFFF81006EC6, [4] = 0xFFFFFFFF8118245E, [5] = 0xFFFFFFFF810A9224, [6] = 0xFFFFFFFF8164A4C6, [7] = 0xFFFFFFFF8164ABAD ] }
        [17:38:45.672786700] (+0.000009028) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 70, perf_callchain_size = 3, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770 ] }
      Signed-off-by: NGeneviève Bastien <gbastien@versatic.net>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
      Cc: Julien Desfossez <jdesfossez@efficios.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170727181205.24843-1-gbastien@versatic.net
      [ Removed PERF_SAMPLE_CALLCHAIN from the TODO list, jolsa ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a3073c8e
  6. 28 7月, 2017 1 次提交
  7. 27 7月, 2017 5 次提交
  8. 26 7月, 2017 8 次提交
  9. 25 7月, 2017 3 次提交
    • J
      perf evsel: Add verbose output for sys_perf_event_open fallback · 2b04e0f8
      Jiri Olsa 提交于
      Adding info about what is being switched off in the sys_perf_event_open
      fallback.
      
      New output (notice the 'switching off' lines):
      
        $ perf stat -e '{cycles,instructions}' -vvv ls
        Using CPUID GenuineIntel-6-3D
        intel_pt default config: tsc
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0x8
        sys_perf_event_open failed, error -22
        switching off cloexec flag
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0
        sys_perf_event_open failed, error -22
        switching off exclude_guest, exclude_host
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0
        sys_perf_event_open failed, error -22
        switching off sample_id_all
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
        ...
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170721121212.21414-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b04e0f8
    • A
      perf cgroup: Fix refcount usage · cd8dd032
      Arnaldo Carvalho de Melo 提交于
      When converting from atomic_t to refcount_t we didn't follow the usual
      step of initializing it to one before taking any new reference, which
      trips over checking if taking a reference for a freed refcount_t, fix
      it.
      
      Brendan's report:
      
       ---
      It's 4.12-rc7, with node v4.4.1. I'm building 4.13-rc1 now, as I hit
      what I think is another unrelated perf bug and I'm starting to wonder
      what else is broken on that version:
      
      (root) /mnt/src/linux-4.12-rc7/tools/perf # ./perf record -F 99 -a -e
      cpu-clock --cgroup=docker/f9e9d5df065b14646e8a11edc837a13877fd90c171137b2ba3feb67a0201cb65
      -g
      perf: /mnt/src/linux-4.12-rc7/tools/include/linux/refcount.h:108:
      refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed.
      Aborted
      
      that used to work...
       ---
      
      Testing it:
      
      Before:
      
        # perf stat -e cycles -C 0 --cgroup /
        perf: /home/acme/git/linux/tools/include/linux/refcount.h:108: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed.
        Aborted (core dumped)
        #
      
      After:
      
        # perf stat -e cycles -C 0 --cgroup /
      ^C
        Performance counter stats for 'CPU(s) 0':
      
             132,081,393      cycles                    /
      
             2.492942763 seconds time elapsed
      
        #
      Reported-by: NBrendan Gregg <brendan.d.gregg@gmail.com>
      Acked-by: NElena Reshetova <elena.reshetova@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Kees Kook <keescook@chromium.org>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sudeep Holla <Sudeep.Holla@arm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 79c5fe6d ("perf cgroup: Convert cgroup_sel.refcnt from atomic_t to refcount_t")
      Link: http://lkml.kernel.org/n/tip-l7ovfblq14ip2i08m1g0fkhv@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cd8dd032
    • T
      perf annotate stdio: Fix --show-total-period · 585d93c5
      Taeung Song 提交于
      We were showing the total number of samples, not the total period as
      asked by the user, fix it.
      Reported-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Link: http://lkml.kernel.org/n/tip-lh2nh89rtqn5x5vbfthw6qml@git.kernel.org
      Fixes: 0c4a5bce ("perf annotate: Display total number of samples with --show-total-period")
      [ split from a larger patch ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      585d93c5
  10. 21 7月, 2017 5 次提交
  11. 19 7月, 2017 7 次提交
    • J
      perf report: Show branch type in callchain entry · b851dd49
      Jin Yao 提交于
      Show branch type in callchain entry. The branch type is printed
      with other LBR information (such as cycles/abort/...).
      
      For example:
      
        perf record -g -j any,save_type
        perf report --branch-history --stdio --no-children
      
        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)
      
      Change log
      
      v6: Remove the branch_type_str() since it's moved to branch.c.
      
      v5: Rewrite the branch info print code in util/callchain.c.
      
      v4: Comparing to previous version, the major changes are:
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-8-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b851dd49
    • J
      perf report: Show branch type statistics for stdio mode · 2d78b189
      Jin Yao 提交于
      Show the branch type statistics at the end of perf report --stdio.
      
      For example:
      
        perf report --stdio
      
        COND_FWD:  28.5%
        COND_BWD:   9.4%
        CROSS_4K:   0.7%
        CROSS_2M:  14.1%
            COND:  37.9%
          UNCOND:   0.2%
             IND:   6.7%
            CALL:  26.5%
             RET:  28.7%
          SYSRET:   0.0%
      
        The branch types are:
      
         COND_FWD: conditional forward
         COND_BWD: conditional backward
             COND: conditional branch
           UNCOND: unconditional branch
              IND: indirect
             CALL: function call
           IND_CALL: indirect function call
              RET: function return
          SYSCALL: syscall
           SYSRET: syscall return
        COND_CALL: conditional function call
         COND_RET: conditional function return
      
      CROSS_4K and CROSS_2M:
      
      They are the metrics checking for branches cross 4K or 2MB pages.
      It's an approximate computing. We don't know if the area is 4K or
      2MB, so always compute both.
      
      To make the output simple, if a branch crosses 2M area, CROSS_4K
      will not be incremented.
      
      Change log
      
      v7: Since the common branch type definitions are changed, some
          tags/strings are updated accordingly.
      
      v6: Remove branch_type_stat_display() since it's moved to branch.c.
      
      v5: Remove the unnecessary sort__mode checking in
          hist_iter__branch_callback().
      
      v4: Comparing to previous version, the major changes are:
      
      Add the computing of JCC forward/JCC backward and cross page checking
      by using the from and to addresses.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-7-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d78b189
    • J
      perf util: Create branch.c/.h for common branch functions · 992c7e92
      Jin Yao 提交于
      Create new util/branch.c and util/branch.h to contain the common branch
      functions. Such as:
      
      branch_type_count(): Count the numbers of branch types
      branch_type_name() : Return the name of branch type
      branch_type_stat_display(): Display branch type statistics info
      branch_type_str(): Construct the branch type string.
      
      The branch type is saved in branch_flags.
      
      Change log:
      
      v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
      
      v7: Since the common branch type name is changed (e.g. JCC->COND),
          this patch is performed the modification accordingly.
      
      v6: Move that multiline conditional code inside {} brackets.
          Move branch_type_stat_display() from builtin-report.c to
            branch.c.
          Move branch_type_str() from callchain.c to branch.c.
      
      v5: It's a new patch in v5 patch series.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-6-git-send-email-yao.jin@linux.intel.com
      [ Don't use 'index' and 'stat' as names for variables, it shadows global decls in older distros ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      992c7e92
    • J
      perf report: Refactor the branch info printing code · 8d51735f
      Jin Yao 提交于
      The branch info such as predicted/cycles/... are printed at the
      callchain entries.
      
      For example: perf report --branch-history --no-children --stdio
      
          --1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
                    main div.c:44 (predicted:52.4% cycles:1)
                    main div.c:42 (cycles:2)
                    compute_flag div.c:28 (cycles:2)
                    compute_flag div.c:27 (cycles:1)
                    rand rand.c:28 (cycles:1)
                    rand rand.c:28 (cycles:1)
                    __random random.c:298 (cycles:1)
                    __random random.c:297 (cycles:1)
                    __random random.c:295 (cycles:1)
                    __random random.c:295 (cycles:1)
                    __random random.c:295 (cycles:1)
      
      But the current code is difficult to maintain and extend. This patch
      refactors the code for easy maintenance.
      
      Change log:
      
      v6: 1. Put the multiline condition code into {} brackets in
             counts_str_build()
      
          2. Keep the original display order, that is:
             predicted, abort, cycles, iterations
      
      v5: It's a new patch in v5 patch series.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-5-git-send-email-yao.jin@linux.intel.com
      [ Don't use 'index' as a name for a variable, it shadows a globa decl in older distros ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8d51735f
    • J
      perf record: Create a new option save_type in --branch-filter · 60f83fa6
      Jin Yao 提交于
      The option indicates the kernel to save branch type during sampling.
      
      One example:
      
        perf record -g --branch-filter any,save_type <command>
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1500379995-6449-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      60f83fa6
    • D
      perf header: Add event desc to pipe-mode header · f9ebdccf
      David Carrillo-Cisneros 提交于
      Add event descriptor to perf header output in pipe-mode.
      
      After this patch:
      
        $ perf record -e cycles sleep 1 | perf report --header
        # ========
        # captured on: Mon Jun  5 22:52:13 2017
        # ========
        #
        # hostname : lphh20
        # os release : 4.3.5-smp-801.43.0.0
        # perf version : 4.12.rc2.g439987
        # arch : x86_64
        # nrcpus online : 72
        # nrcpus avail : 72
        # cpudesc : Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz
        # cpuid : GenuineIntel,6,63,2
        # total memory : 264134144 kB
        # cmdline : /root/perf record -e cycles sleep 1
        # event : name = cycles, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
        # CPU_TOPOLOGY info available, use -I to display
        # NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: intel_bts = 6, cpu = 4, msr = 49, uncore_cbox_10 = 36, uncore_cbox_11 = 37, uncore_cbox_12 = 38, uncore_cbox_13 = 39, uncore_cbox_14 = 40, uncore_cbox_15 = 41, uncore_cbox_16 = 42, uncore_cbox_17 = 43, software = 1, power = 7, uncore_irp = 24, uncore_pcu = 48, tracepoint = 2, uncore_imc_0 = 16, uncore_imc_1 = 17, uncore_imc_2 = 18, uncore_imc_3 = 19, uncore_imc_4 = 20, uncore_imc_5 = 21, uncore_imc_6 = 22, uncore_imc_7 = 23, uncore_qpi_0 = 8, uncore_qpi_1 = 9, uncore_cbox_0 = 26, uncore_cbox_1 = 27, uncore_cbox_2 = 28, uncore_cbox_3 = 29, uncore_cbox_4 = 30, uncore_cbox_5 = 31, uncore_cbox_6 = 32, uncore_cbox_7 = 33, uncore_cbox_8 = 34, uncore_cbox_9 = 35, uncore_r2pcie = 13, uncore_r3qpi_0 = 10, uncore_r3qpi_1 = 11, uncore_r3qpi_2 = 12, uncore_sbox_0 = 44, uncore_sbox_1 = 45, uncore_sbox_2 = 46, uncore_sbox_3 = 47, breakpoint = 5, uncore_ha_0 = 14, uncore_ha_1 = 15, uncore_ubox = 25
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB (null) ]
      
      Prior to this patch, event was not printed.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Simon Que <sque@chromium.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170718042549.145161-17-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f9ebdccf
    • D
      perf tools: Add feature header record to pipe-mode · e9def1b2
      David Carrillo-Cisneros 提交于
      Add header record types to pipe-mode, reusing the functions
      used in file-mode and leveraging the new struct feat_fd.
      
      For alignment, check that synthesized events don't exceed
      pagesize.
      
      Add the perf_event__synthesize_feature event call back to
      process the new header records.
      
      Before this patch:
      
        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        ...
      
      After this patch:
        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
        # ========
        # captured on: Mon May 22 16:33:43 2017
        # ========
        #
        # hostname : my_hostname
        # os release : 4.11.0-dbx-up_perf
        # perf version : 4.11.rc6.g6277c80
        # arch : x86_64
        # nrcpus online : 72
        # nrcpus avail : 72
        # cpudesc : Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz
        # cpuid : GenuineIntel,6,63,2
        # total memory : 263457192 kB
        # cmdline : /root/perf record -o - -e cycles -c 100000 sleep 1
        # HEADER_CPU_TOPOLOGY info available, use -I to display
        # HEADER_NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: intel_bts = 6, uncore_imc_4 = 22, uncore_sbox_1 = 47, uncore_cbox_5 = 33, uncore_ha_0 = 16, uncore_cbox
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        ...
      
      Support added for the subcommands: report, inject, annotate and script.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Simon Que <sque@chromium.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170718042549.145161-16-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9def1b2