1. 26 7月, 2019 1 次提交
    • T
      perf report: Fix OOM error in TUI mode on s390 · eac8b39d
      Thomas Richter 提交于
      [ Upstream commit 8a07aa4e9b7b0222129c07afff81634a884b2866 ]
      
      Debugging a OOM error using the TUI interface revealed this issue
      on s390:
      
      [tmricht@m83lp54 perf]$ cat /proc/kallsyms |sort
      ....
      00000001119b7158 B radix_tree_node_cachep
      00000001119b8000 B __bss_stop
      00000001119b8000 B _end
      000003ff80002850 t autofs_mount	[autofs4]
      000003ff80002868 t autofs_show_options	[autofs4]
      000003ff80002a98 t autofs_evict_inode	[autofs4]
      ....
      
      There is a huge gap between the last kernel symbol
      __bss_stop/_end and the first kernel module symbol
      autofs_mount (from autofs4 module).
      
      After reading the kernel symbol table via functions:
      
       dso__load()
       +--> dso__load_kernel_sym()
            +--> dso__load_kallsyms()
      	   +--> __dso_load_kallsyms()
      	        +--> symbols__fixup_end()
      
      the symbol __bss_stop has a start address of 1119b8000 and
      an end address of 3ff80002850, as can be seen by this debug statement:
      
        symbols__fixup_end __bss_stop start:0x1119b8000 end:0x3ff80002850
      
      The size of symbol __bss_stop is 0x3fe6e64a850 bytes!
      It is the last kernel symbol and fills up the space until
      the first kernel module symbol.
      
      This size kills the TUI interface when executing the following
      code:
      
        process_sample_event()
          hist_entry_iter__add()
            hist_iter__report_callback()
              hist_entry__inc_addr_samples()
                symbol__inc_addr_samples(symbol = __bss_stop)
                  symbol__cycles_hist()
                     annotated_source__alloc_histograms(...,
      				                symbol__size(sym),
      		                                ...)
      
      This function allocates memory to save sample histograms.
      The symbol_size() marco is defined as sym->end - sym->start, which
      results in above value of 0x3fe6e64a850 bytes and
      the call to calloc() in annotated_source__alloc_histograms() fails.
      
      The histgram memory allocation might fail, make this failure
      no-fatal and continue processing.
      
      Output before:
      [tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
      					      -i ~/slow.data 2>/tmp/2
      [tmricht@m83lp54 perf]$ tail -5 /tmp/2
        __symbol__inc_addr_samples(875): ENOMEM! sym->name=__bss_stop,
      		start=0x1119b8000, addr=0x2aa0005eb08, end=0x3ff80002850,
      		func: 0
      problem adding hist entry, skipping event
      0x938b8 [0x8]: failed to process type: 68 [Cannot allocate memory]
      [tmricht@m83lp54 perf]$
      
      Output after:
      [tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
      					      -i ~/slow.data 2>/tmp/2
      [tmricht@m83lp54 perf]$ tail -5 /tmp/2
         symbol__inc_addr_samples map:0x1597830 start:0x110730000 end:0x3ff80002850
         symbol__hists notes->src:0x2aa2a70 nr_hists:1
         symbol__inc_addr_samples sym:unlink_anon_vmas src:0x2aa2a70
         __symbol__inc_addr_samples: addr=0x11094c69e
         0x11094c670 unlink_anon_vmas: period++ [addr: 0x11094c69e, 0x2e, evidx=0]
         	=> nr_samples: 1, period: 526008
      [tmricht@m83lp54 perf]$
      
      There is no error about failed memory allocation and the TUI interface
      shows all entries.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/90cb5607-3e12-5167-682d-978eba7dafa8@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eac8b39d
  2. 06 4月, 2019 1 次提交
    • W
      perf annotate: Fix getting source line failure · e6786f86
      Wei Li 提交于
      [ Upstream commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 ]
      
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e6786f86
  3. 31 8月, 2018 2 次提交
    • K
      perf annotate: Fix parsing aarch64 branch instructions after objdump update · 4e67b2a5
      Kim Phillips 提交于
      Starting with binutils 2.28, aarch64 objdump adds comments to the
      disassembly output to show the alternative names of a condition code
      [1].
      
      It is assumed that commas in objdump comments could occur in other
      arches now or in the future, so this fix is arch-independent.
      
      The fix could have been done with arm64 specific jump__parse and
      jump__scnprintf functions, but the jump__scnprintf instruction would
      have to have its comment character be a literal, since the scnprintf
      functions cannot receive a struct arch easily.
      
      This inconvenience also applies to the generic jump__scnprintf, which is
      why we add a raw_comment pointer to struct ins_operands, so the __parse
      function assigns it to be re-used by its corresponding __scnprintf
      function.
      
      Example differences in 'perf annotate --stdio2' output on an aarch64
      perf.data file:
      
      BEFORE: → b.cs   ffff200008133d1c <unwind_frame+0x18c>  // b.hs, dffff7ecc47b
      AFTER : ↓ b.cs   18c
      
      BEFORE: → b.cc   ffff200008d8d9cc <get_alloc_profile+0x31c>  // b.lo, b.ul, dffff727295b
      AFTER : ↓ b.cc   31c
      
      The branch target labels 18c and 31c also now appear in the output:
      
      BEFORE:        add    x26, x29, #0x80
      AFTER : 18c:   add    x26, x29, #0x80
      
      BEFORE:        add    x21, x21, #0x8
      AFTER : 31c:   add    x21, x21, #0x8
      
      The Fixes: tag below is added so stable branches will get the update; it
      doesn't necessarily mean that commit was broken at the time, rather it
      didn't withstand the aarch64 objdump update.
      
      Tested no difference in output for sample x86_64, power arch perf.data files.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bb7eff5206e4795ac79c177a80fe9f4630aaf730Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Fixes: b13bbeee ("perf annotate: Fix branch instruction with multiple operands")
      Link: http://lkml.kernel.org/r/20180827125340.a2f7e291901d17cea05daba4@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4e67b2a5
    • M
      perf annotate: Properly interpret indirect call · 1dc27f63
      Martin Liška 提交于
      The patch changes the parsing of:
      
      	callq  *0x8(%rbx)
      
      from:
      
        0.26 │     → callq  *8
      
      to:
      
        0.26 │     → callq  *0x8(%rbx)
      
      in this case an address is followed by a register, thus one can't parse
      only the address.
      
      Committer testing:
      
      1) run 'perf record sleep 10'
      2) before applying the patch, run:
      
           perf annotate --stdio2 > /tmp/before
      
      3) after applying the patch, run:
      
           perf annotate --stdio2 > /tmp/after
      
      4) diff /tmp/before /tmp/after:
        --- /tmp/before 2018-08-28 11:16:03.238384143 -0300
        +++ /tmp/after  2018-08-28 11:15:39.335341042 -0300
        @@ -13274,7 +13274,7 @@
                      ↓ jle    128
                        hash_value = hash_table->hash_func (key);
                        mov    0x8(%rsp),%rdi
        -  0.91       → callq  *30
        +  0.91       → callq  *0x30(%r12)
                        mov    $0x2,%r8d
                        cmp    $0x2,%eax
                        node_hash = hash_table->hashes[node_index];
        @@ -13848,7 +13848,7 @@
                         mov    %r14,%rdi
                         sub    %rbx,%r13
                         mov    %r13,%rdx
        -              → callq  *38
        +              → callq  *0x38(%r15)
                         cmp    %rax,%r13
           1.91        ↓ je     240
                  1b4:   mov    $0xffffffff,%r13d
        @@ -14026,7 +14026,7 @@
                         mov    %rcx,-0x500(%rbp)
                         mov    %r15,%rsi
                         mov    %r14,%rdi
        -              → callq  *38
        +              → callq  *0x38(%rax)
                         mov    -0x500(%rbp),%rcx
                         cmp    %rax,%rcx
                       ↓ jne    9b0
      <SNIP tons of other such cases>
      Signed-off-by: NMartin Liška <mliska@suse.cz>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/bd1f3932-be2b-85f9-7582-111ee0a43b07@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1dc27f63
  4. 20 8月, 2018 1 次提交
  5. 09 8月, 2018 17 次提交
  6. 06 6月, 2018 1 次提交
    • A
      perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULL · 8d628d26
      Arnaldo Carvalho de Melo 提交于
      Making it a bit more robust, this took place here when a sample appeared
      right after:
      
        ffffffff8a925000 D __nosave_end
      
      And before the next considered symbol, which, using kallsyms make us
      over guess the size of __nosave_end, and then the sequence:
      
        hist_entry__inc_addr_samples ->
          symbol__inc_addr_samples ->
            symbol__hists ->
              annotated_source__alloc_histograms
      
      Ends up not liking to allocate gigabytes of ram for annotation...
      
      This will be alleviated by considering BSS symbols, which we should but
      don't so far, and then we should investigate those samples further.
      
      The testcase was to have:
      
         perf top -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions
      
      Running for a while till it segfaulted trying to access NULL notes->src->histograms.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ndfjtpiop3tdcnyjgp320ra8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8d628d26
  7. 04 6月, 2018 15 次提交
  8. 23 5月, 2018 1 次提交
  9. 19 5月, 2018 1 次提交
    • J
      perf annotate: Create hotkey 'c' to show min/max cycles · 3e71fc03
      Jin Yao 提交于
      In the 'perf annotate' view, a new hotkey 'c' is created for showing the
      min/max cycles.
      
      For example, when press 'c', the annotate view is:
      
        Percent│ IPC     Cycle(min/max)
               │
               │
               │                             Disassembly of section .text:
               │
               │                             000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92                           sub    $0x18,%rsp
               │3.92                           mov    $0x1,%esi
               │3.92                           xor    %eax,%eax
               │3.92                           cmpl   $0x0,argp_program_version_hook@@G
               │3.92             1(2/1)      ↓ je     20
               │                               lock   cmpxchg %esi,__abort_msg@@GLIBC_P
               │                             ↓ jne    29
               │                             ↓ jmp    43
               │1.10                     20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
          8.93 │1.10             1(5/1)      ↓ je     43
      
      When press 'c' again, the annotate view is switched back:
      
        Percent│ IPC Cycle
               │
               │
               │                Disassembly of section .text:
               │
               │                000000000003aab0 <random@@GLIBC_2.2.5>:
          8.22 │3.92              sub    $0x18,%rsp
               │3.92              mov    $0x1,%esi
               │3.92              xor    %eax,%eax
               │3.92              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
               │3.92     1      ↓ je     20
               │                  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
               │                ↓ jne    29
               │                ↓ jmp    43
               │1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
          8.93 │1.10     1      ↓ je     43
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
      [ Rename all maxmin to minmax ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e71fc03