1. 30 8月, 2017 1 次提交
    • J
      perf report: Calculate the average cycles of iterations · c4ee0625
      Jin Yao 提交于
      The branch history code has a loop detection function. With this, we can
      get the number of iterations by calculating the removed loops.
      
      While it would be nice for knowing the average cycles of iterations.
      This patch adds up the cycles in branch entries of removed loops and
      save the result to the next branch entry (e.g. branch entry A).
      
      Finally it will display the iteration number and average cycles at the
      "from" of branch entry A.
      
      For example:
      perf record -g -j any,save_type ./div
      perf report --branch-history --no-children --stdio
      
      --22.63%--main div.c:42 (RET CROSS_2M)
                compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
                |
                 --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                           rand rand.c:28 (cycles:1)
                           rand rand.c:28 (RET CROSS_2M)
                           __random random.c:298 (cycles:1)
                           __random random.c:297 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (RET CROSS_2M)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4ee0625
  2. 29 8月, 2017 8 次提交
    • L
      perf symbols: Fix plt entry calculation for ARM and AARCH64 · b2f76050
      Li Bin 提交于
      On x86, the plt header size is as same as the plt entry size, and can be
      identified from shdr's sh_entsize of the plt.
      
      But we can't assume that the sh_entsize of the plt shdr is always the
      plt entry size in all architecture, and the plt header size may be not
      as same as the plt entry size in some architecure.
      
      On ARM, the plt header size is 20 bytes and the plt entry size is 12
      bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils
      implementation. The plt section is as follows:
      
      Disassembly of section .plt:
      000004a0 <__cxa_finalize@plt-0x14>:
       4a0:   e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
       4a4:   e59fe004        ldr     lr, [pc, #4]    ; 4b0 <_init+0x1c>
       4a8:   e08fe00e        add     lr, pc, lr
       4ac:   e5bef008        ldr     pc, [lr, #8]!
       4b0:   00008424        .word   0x00008424
      
      000004b4 <__cxa_finalize@plt>:
       4b4:   e28fc600        add     ip, pc, #0, 12
       4b8:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
       4bc:   e5bcf424        ldr     pc, [ip, #1060]!        ; 0x424
      
      000004c0 <printf@plt>:
       4c0:   e28fc600        add     ip, pc, #0, 12
       4c4:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
       4c8:   e5bcf41c        ldr     pc, [ip, #1052]!        ; 0x41c
      
      On AARCH64, the plt header size is 32 bytes and the plt entry size is 16
      bytes.  The plt section is as follows:
      
      Disassembly of section .plt:
      0000000000000560 <__cxa_finalize@plt-0x20>:
       560:   a9bf7bf0        stp     x16, x30, [sp,#-16]!
       564:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       568:   f944be11        ldr     x17, [x16,#2424]
       56c:   9125e210        add     x16, x16, #0x978
       570:   d61f0220        br      x17
       574:   d503201f        nop
       578:   d503201f        nop
       57c:   d503201f        nop
      
      0000000000000580 <__cxa_finalize@plt>:
       580:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       584:   f944c211        ldr     x17, [x16,#2432]
       588:   91260210        add     x16, x16, #0x980
       58c:   d61f0220        br      x17
      
      0000000000000590 <__gmon_start__@plt>:
       590:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       594:   f944c611        ldr     x17, [x16,#2440]
       598:   91262210        add     x16, x16, #0x988
       59c:   d61f0220        br      x17
      
      NOTES:
      
      In addition to ARM and AARCH64, other architectures, such as
      s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider
      this issue.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
      Cc: David Tolnay <dtolnay@gmail.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: zhangmengting@huawei.com
      Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b2f76050
    • L
      perf probe: Fix kprobe blacklist checking condition · 2c29461e
      Li Bin 提交于
      The commit 9aaf5a5f ("perf probe: Check kprobes blacklist when
      adding new events"), 'perf probe' supports checking the blacklist of the
      fuctions which can not be probed.  But the checking condition is wrong,
      that the end_addr of the symbol which is the start_addr of the next
      symbol can't be included.
      
      Committer notes:
      
      IOW make it match its kernel counterpart in kernel/kprobes.c:
      
        bool within_kprobe_blacklist(unsigned long addr)
      
      Each entry have as its end address not its end address, but the first
      address _outside_ that symbol, which for related functions, is the first
      address of the next symbol, like these from kernel/trace/trace_probe.c:
      
      0xffffffffbd198df0-0xffffffffbd198e40	print_type_u8
      0xffffffffbd198e40-0xffffffffbd198e90	print_type_u16
      0xffffffffbd198e90-0xffffffffbd198ee0	print_type_u32
      0xffffffffbd198ee0-0xffffffffbd198f30	print_type_u64
      0xffffffffbd198f30-0xffffffffbd198f80	print_type_s8
      0xffffffffbd198f80-0xffffffffbd198fd0	print_type_s16
      0xffffffffbd198fd0-0xffffffffbd199020	print_type_s32
      0xffffffffbd199020-0xffffffffbd199070	print_type_s64
      0xffffffffbd199070-0xffffffffbd1990c0	print_type_x8
      0xffffffffbd1990c0-0xffffffffbd199110	print_type_x16
      0xffffffffbd199110-0xffffffffbd199160	print_type_x32
      0xffffffffbd199160-0xffffffffbd1991b0	print_type_x64
      
      But not always:
      
      0xffffffffbd1997b0-0xffffffffbd1997c0	fetch_kernel_stack_address (kernel/trace/trace_probe.c)
      0xffffffffbd1c57f0-0xffffffffbd1c58b0	__context_tracking_enter   (kernel/context_tracking.c)
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: zhangmengting@huawei.com
      Fixes: 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events")
      Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2c29461e
    • D
      perf tools: Robustify detection of clang binary · 3866058e
      David Carrillo-Cisneros 提交于
      Prior to this patch, make scripts tested for CLANG with ifeq ($(CC),
      clang), failing to detect CLANG binaries with different names. Fix it by
      testing for the existence of __clang__ macro in the list of compiler
      defined macros.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3866058e
    • J
      perf report: Group stat values on global event id · 9933183e
      Jiri Olsa 提交于
      There's no big value on displaying counts for every event ID, which is
      one per every CPU. Rather than that, displaying the whole sum for the
      event.
      
        $ perf record -c 100000 -e cycles:u -s test
        $ perf report -T
      
      Before:
        #  PID   TID  cycles:u  cycles:u  cycles:u  cycles:u  ... [20 more columns of 'cycles:u']
          3339  3339         0         0         0         0
          3340  3340         0         0         0         0
          3341  3341         0         0         0         0
          3342  3342         0         0         0         0
      
      Now:
        #  PID   TID  cycles:u
          3339  3339     19678
          3340  3340     18744
          3341  3341     17335
          3342  3342     26414
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-10-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9933183e
    • J
      perf values: Zero value buffers · a1834fc9
      Jiri Olsa 提交于
      We need to make sure the array of value pointers are zero initialized,
      because we use them in realloc later on and uninitialized non zero value
      will cause allocation error and aborted execution.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a1834fc9
    • J
      perf values: Fix allocation check · f4ef3b7c
      Jiri Olsa 提交于
      Bailing out in case the allocation failed, not the other way round.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-8-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f4ef3b7c
    • J
      perf values: Fix thread index bug · 64eed1de
      Jiri Olsa 提交于
      We are taking wrong index (+1) for first thread, which leaves thread
      with index 0 unused and uninitialized.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-7-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      64eed1de
    • J
      perf report: Add dump_read function · dac7f6b7
      Jiri Olsa 提交于
      Adding dump_read function to gather all the dump output of read
      function. Adding output of enabled and running times and id if enabled
      (3 new lines with '...' prefix below).
      
        $ perf record -s ...
        $ perf report -D
      
        958358311769 0x91f8 [0x40]: PERF_RECORD_READ: 3339 3339 cycles:u 0
        ... time enabled : 958358313731
        ... time running : 958358313731
        ... id           : 80
      
      Committer note:
      
      Do not use 'read' as a variable name as it breaks the build on older
      systems, such as RHEL6:
      
          CC       /tmp/build/perf/util/session.o
        cc1: warnings being treated as errors
        util/session.c: In function 'dump_read':
        util/session.c:1132: error: declaration of 'read' shadows a global declaration
        /usr/include/bits/unistd.h:35: error: shadowed declaration is here
        mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dac7f6b7
  3. 28 8月, 2017 2 次提交
  4. 23 8月, 2017 1 次提交
  5. 22 8月, 2017 8 次提交
  6. 18 8月, 2017 6 次提交
  7. 16 8月, 2017 1 次提交
    • W
      perf bpf: Fix endianness problem when loading parameters in prologue · db26984a
      Wang Nan 提交于
      Perf's BPF prologue generator unconditionally fetches 8 bytes for
      function parameters, which causes problems on big endian machines. Thomas
      gives a detailed analysis for this problem:
      
       http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d151@linux.vnet.ibm.com
      
       ---- 8< ----
        I investigated perf test BPF for s390x and have a question regarding
        the 38.3 subtest (bpf-prologue test) which fails on s390x.
      
        When I turn on trace_printk in tests/bpf-script-test-prologue.c
        I see this output in /sys/kernel/debug/tracing/trace:
      
        [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace
        perf-30229 [000] d..2 170161.535791: : f_mode 2001d00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535809: : f_mode 6001f00000000 offset:0 orig:0
        perf-30229 [000] d..2 170161.535815: : f_mode 6001f00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535819: : f_mode 2001d00000000 offset:1 orig:0
        perf-30229 [000] d..2 170161.535822: : f_mode 2001d00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535825: : f_mode 6001f00000000 offset:2 orig:1
        perf-30229 [000] d..2 170161.535828: : f_mode 6001f00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535832: : f_mode 2001d00000000 offset:3 orig:1
        perf-30229 [000] d..2 170161.535835: : f_mode 2001d00000000 offset:4 orig:0
        perf-30229 [000] d..2 170161.535841: : f_mode 6001f00000000 offset:4 orig:0
      
        [...]
      
        There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c
        accesses: f_mode (member of struct file at offset 140) offset and orig.  They
        are parameters of the lseek() system call triggered in this test case in
        function llseek_loop().
      
        What is really strange is the value of f_mode. It is an 8 byte value, whereas
        in the probe event it is defined as a 4 byte value.  The lower 4 bytes are all
        zero and do not belong to member f_mode.  The correct value should be 2001d for
        read-only and 6001f for read-write open mode.
      
        Here is the output of the 'perf test -vv bpf' trace:
        Try to find probe point from debuginfo.
        Matched function: null_lseek [2d9310d]
         Probe point found: null_lseek+0
        Searching 'file' variable in context.
        Converting variable file into trace event.
        converting f_mode in file
        f_mode type is unsigned int.
        Opening /sys/kernel/debug/tracing//README write=0
        Searching 'offset' variable in context.
        Converting variable offset into trace event.
        offset type is long long int.
        Searching 'orig' variable in context.
        Converting variable orig into trace event.
        orig type is int.
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing//kprobe_events write=1
        Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32
       ---- 8< ----
      
      This patch parses the type of each argument and converts data from memory to
      expected type.
      
      Now the test runs successfully on 4.13.0-rc5:
      
        [root@s8360046 perf]# ./perf test  bpf
        38: BPF filter                                 :
        38.1: Basic BPF filtering                      : Ok
        38.2: BPF pinning                              : Ok
        38.3: BPF prologue generation                  : Ok
        38.4: BPF relocation checker                   : Ok
        [root@s8360046 perf]#
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170815092159.31912-1-tmricht@linux.vnet.ibm.comSigned-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db26984a
  8. 12 8月, 2017 4 次提交
    • T
      perf report: Fix module symbol adjustment for s390x · 4a084ecf
      Thomas Richter 提交于
      The 'perf report' tool does not display the addresses of kernel module
      symbols correctly.
      
      For example symbol qeth_send_ipa_cmd in kernel module qeth.ko has this
      relative address for function qeth_send_ipa_cmd():
      
        [root@s8360047 linux]# nm -g drivers/s390/net/qeth.ko | fgrep send_ipa_cmd
        0000000000013088 T qeth_send_ipa_cmd
      
      The module is loaded at address:
      
        [root@s8360047 linux]# cat /sys/module/qeth/sections/.text
        0x000003ff80296d20
        [root@s8360047 linux]#
      
      This should result in a start address of:
      
        0x13088 + 0x3ff80296d20 = 0x3ff802a9da8
      
      Using crash to verify the address on a live system:
      
        [root@s8360046 linux]# crash vmlinux
      
        crash 7.1.9++
        Copyright (C) 2002-2016  Red Hat, Inc.
        Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
      
        [...]
      
        crash> mod -s qeth drivers/s390/net/qeth.ko
             MODULE       NAME        SIZE  OBJECT FILE
             3ff8028d700  qeth      151552  drivers/s390/net/qeth.ko
        crash> sym qeth_send_ipa_cmd
        3ff802a9da8 (T) qeth_send_ipa_cmd [qeth] /root/linux/drivers/s390/net/qeth_core_main.c: 2944
        crash>
      
      Now perf report displays the address of symbol qeth_send_ipa_cmd:
      symbol__new:
      
        qeth_send_ipa_cmd 0x130f0-0x132ce
      
      There is a difference of 0x68 between the entry in the symbol table (see
      nm command above) and perf. The difference is from the offset the .text
      segment of qeth.ko:
      
        [root@s8360047 perf]# readelf -a drivers/s390/net/qeth.ko
        Section Headers:
        [Nr] Name              Type             Address           Offset
             Size              EntSize          Flags  Link  Info  Align
        [ 0]                   NULL             0000000000000000  00000000
             0000000000000000  0000000000000000           0     0     0
        [ 1] .note.gnu.build-i NOTE             0000000000000000  00000040
             0000000000000024  0000000000000000   A       0     0     4
        [ 2] .text             PROGBITS         0000000000000000  00000068
             000000000001c8a0  0000000000000000  AX       0     0     8
      
      As seen the .text segment has an offset of 0x68 with start address 0x0.
      Therefore 0x68 is added to the address of qeth_send_ipa_cmd and thus
      0x13088 + 0x68 = 0x130f0 is displayed.
      
      This is wrong, perf report needs to display the start address of symbol
      qeth_send_ipa_cmd at 0x13088 + qeth.ko.text section start address.
      
      The qeth.ko module .text start address is available in the qeth.ko DSO
      map. Just identify the kernel module symbols and correct the addresses.
      
      With the fix I see this correct address for symbol: symbol__new:
      qeth_send_ipa_cmd 0x3ff802a9da8-0x3ff802a9f86
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-1-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-q8lktlpoxb5e3dj52u1s1rw4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a084ecf
    • T
      perf record: Fix wrong size in perf_record_mmap for last kernel module · 9ad4652b
      Thomas Richter 提交于
      During work on perf report for s390 I ran into the following issue:
      
      0 0x318 [0x78]: PERF_RECORD_MMAP -1/0:
              [0x3ff804d6990(0xfffffc007fb2966f) @ 0]:
              x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko
      
      This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid
      module size for qeth_l2.ko (the s390 ethernet device driver).
      
      Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory.
      
      It turned out that this wrong size is created by the perf record
      command.  What happens is this function call sequence from
      __cmd_record():
      
        perf_session__new():
          perf_session__create_kernel_maps():
            machine__create_kernel_maps():
              machine__create_modules():   Creates map for all loaded kernel modules.
                modules__parse():   Reads /proc/modules and extracts module name and
                                    load address (1st and last column)
                  machine__create_module():   Called for every module found in /proc/modules.
                                    Creates a new map for every module found and enters
                                    module name and start address into the map. Since the
                                    module end address is unknown it is set to zero.
      
      This ends up with a kernel module map list sorted by module start
      addresses.  All module end addresses are zero.
      
      Last machine__create_kernel_maps() calls function map_groups__fixup_end().
      This function iterates through the maps and assigns each map entry's
      end address the successor map entry start address. The last entry of the
      map group has no successor, so ~0 is used as end to consume the remaining
      memory.
      
      Later __cmd_record calls function record__synthesize() which in turn calls
      perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules()
      to create PERF_REPORT_MMAP entries into the perf.data file.
      
      On s390 this results in the last module qeth_l2.ko
      (which has highest start address, see module table:
              [root@s8360047 perf]# cat /proc/modules
              qeth_l2 86016 1 - Live 0x000003ff804d6000
              qeth 266240 1 qeth_l2, Live 0x000003ff80296000
              ccwgroup 24576 1 qeth, Live 0x000003ff80218000
              vmur 36864 0 - Live 0x000003ff80182000
              qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000
              [root@s8360047 perf]# )
      to be the last entry and its map has an end address of ~0.
      
      When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko
      its start address and length is written. The length is calculated in line:
          event->mmap.len   = pos->end - pos->start;
      and results in 0xffffffffffffffff - 0x3ff804d6990(*) = 0xfffffc007fb2966f
      
      (*) On s390 the module start address is actually determined by a __weak function
      named arch__fix_module_text_start() in machine__create_module().
      
      I think this improvable. We can use the module size (2nd column of /proc/modules)
      to get each loaded kernel module size and calculate its end address.
      Only for map entries which do not have a valid end address (end is still zero)
      we can use the heuristic we have now, that is use successor start address or ~0.
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ad4652b
    • M
      perf srcline: Do not consider empty files as valid srclines · d964b1cd
      Milian Wolff 提交于
      Sometimes we get a non-null, but empty, string for the filename from
      bfd. This then results in srclines of the form ":0", which is different
      from the canonical SRCLINE_UNKNOWN in the form "??:0".  Set the file to
      NULL if it is empty to fix this.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170806212446.24925-14-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d964b1cd
    • M
      perf util: Take elf_name as const string in dso__demangle_sym · 80c345b2
      Milian Wolff 提交于
      The input string is not modified and thus can be passed in as a pointer
      to const data.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170806212446.24925-3-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      80c345b2
  9. 11 8月, 2017 2 次提交
  10. 29 7月, 2017 2 次提交
    • G
      perf data: Add mmap[2] events to CTF conversion · f9f6f2a9
      Geneviève Bastien 提交于
      This adds the mmap and mmap2 events to the CTF trace obtained from perf
      data.
      
      These events will allow CTF trace visualization tools like Trace Compass
      to automatically resolve the symbols of the callchain to the
      corresponding function or origin library.
      
      To include those events, one needs to convert with the --all option.
      Here follows an output of babeltrace:
      
        $ sudo perf data convert --all --to-ctf myctftrace
        $ babeltrace ./myctftrace
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 },
       { pid = 638, tid = 638, start = 0x7F54AE39E000, filename =
       "/usr/lib/ld-2.25.so" }
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 }, { pid =
       638, tid = 638, start = 0x7F54AE565000, filename =
       "/usr/lib/libudev.so.1.6.6" }
        [19:00:00.000000000] (+0.000000000) perf_mmap2: { cpu_id = 0 }, { pid =
       638, tid = 638, start = 0x7FFC093EA000, filename = "[vdso]" }
      Signed-off-by: NGeneviève Bastien <gbastien@versatic.net>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
      Cc: Julien Desfossez <jdesfossez@efficios.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170727181205.24843-2-gbastien@versatic.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f9f6f2a9
    • G
      perf data: Add callchain to CTF conversion · a3073c8e
      Geneviève Bastien 提交于
      The field perf_callchain, if available, is added to the sampling events
      during the CTF conversion. It is an array of u64 values.  The
      perf_callchain_size field contains the size of the array.
      
      It will allow the analysis of sampling data in trace visualization tools
      like Trace Compass. Possible analyses with those data: dynamic
      flamegraphs, correlation with other tracing data like a userspace trace.
      
      Here follows a babeltrace CTF output of a trace with callchain:
      
        $ babeltrace ./myctftrace
        [17:38:45.672760285] (+?.?????????) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 1, perf_callchain_size = 7, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770, [3] = 0xFFFFFFFF81006EC6, [4] = 0xFFFFFFFF8118245E, [5] = 0xFFFFFFFF810A9224, [6] = 0xFFFFFFFF8164A4C6 ] }
        [17:38:45.672777672] (+0.000017387) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 1, perf_callchain_size = 8, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770, [3] = 0xFFFFFFFF81006EC6, [4] = 0xFFFFFFFF8118245E, [5] = 0xFFFFFFFF810A9224, [6] = 0xFFFFFFFF8164A4C6, [7] = 0xFFFFFFFF8164ABAD ] }
        [17:38:45.672786700] (+0.000009028) cycles:ppp: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81063EE4, perf_tid = 25841, perf_pid = 25774, perf_period = 70, perf_callchain_size = 3, perf_callchain = [ [0] = 0xFFFFFFFFFFFFFF80, [1] = 0xFFFFFFFF81063EE4, [2] = 0xFFFFFFFF8100C770 ] }
      Signed-off-by: NGeneviève Bastien <gbastien@versatic.net>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
      Cc: Julien Desfossez <jdesfossez@efficios.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170727181205.24843-1-gbastien@versatic.net
      [ Removed PERF_SAMPLE_CALLCHAIN from the TODO list, jolsa ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a3073c8e
  11. 28 7月, 2017 1 次提交
  12. 27 7月, 2017 4 次提交