1. 11 10月, 2019 1 次提交
    • J
      perf diff: Report noisy for cycles diff · cebf7d51
      Jin Yao 提交于
      This patch prints the stddev and hist for the cycles diff of program
      block. It can help us to understand if the cycles is noisy or not.
      
      This patch is inspired by Andi Kleen's patch:
      
        https://lwn.net/Articles/600471/
      
      We create new option '--cycles-hist'.
      
      Example:
      
        perf record -b ./div
        perf record -b ./div
        perf diff -c cycles
      
        # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
        # ........  .......................................................... ....  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0  libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0  div                [.] rand@plt
             0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr
      
      When we enable the option '--cycles-hist', the output is
      
        perf diff -c cycles --cycles-hist
      
        # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
        # ........  .......................................................... ....  .................  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0                     libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0                     libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0                     libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0                     div                [.] rand@plt
             0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr
      
       v8:
       ---
       Rebase to perf/core branch
      
       v7:
       ---
       1. v6 got Jiri's ACK.
       2. Rebase to latest perf/core branch.
      
       v6:
       ---
       1. Jiri provides better code for using data__hpp_register() in ui_init().
          Use this code in v6.
      
       v5:
       ---
       1. Refine the use of data__hpp_register() in ui_init() according to
          Jiri's suggestion.
      
       v4:
       ---
       1. Rename the new option from '--noisy' to '--cycles-hist'
       2. Remove the option '-n'.
       3. Only update the spark value and stats when '--cycles-hist' is enabled.
       4. Remove the code of printing '..'.
      
       v3:
       ---
       1. Move the histogram to a separate column
       2. Move the svals[] out of struct stats
      
       v2:
       ---
       Jiri got a compile error,
      
        CC       builtin-diff.o
        builtin-diff.c: In function ‘compute_cycles_diff’:
        builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
        712 |          labs(pair->block_info->cycles_spark[i] -
            |          ^~~~
      
       Because the result of u64 - u64 is still u64. Now we change the type of
       cycles_spark[] to s64.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cebf7d51
  2. 01 10月, 2019 6 次提交
  3. 26 9月, 2019 1 次提交
  4. 25 9月, 2019 1 次提交
  5. 20 9月, 2019 1 次提交
  6. 01 9月, 2019 3 次提交
  7. 26 8月, 2019 1 次提交
  8. 09 8月, 2019 1 次提交
    • A
      perf annotate: Fix printing of unaugmented disassembled instructions from BPF · 85127775
      Arnaldo Carvalho de Melo 提交于
      The code to disassemble BPF programs uses binutil's disassembling
      routines, and those use in turn fprintf to print to a memstream FILE,
      adding a newline at the end of each line, which ends up confusing the
      TUI routines called from:
      
        annotate_browser__write()
          annotate_line__write()
            annotate_browser__printf()
              ui_browser__vprintf()
                SLsmg_vprintf()
      
      The SLsmg_vprintf() function in the slang library gets confused with the
      terminating newline, so make the disasm_line__parse() function that
      parses the lines produced by the BPF specific disassembler (that uses
      binutil's libopcodes) and the lines produced by the objdump based
      disassembler used for everything else (and that doesn't adds this
      terminating newline) trim the end of the line in addition of the
      beginning.
      
      This way when disasm_line->ops.raw, i.e. for instructions without a
      special scnprintf() method, we'll not have that \n getting in the way of
      filling the screen right after the instruction with spaces to avoid
      leaving what was on the screen before and thus garbling the annotation
      screen, breaking scrolling, etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Fixes: 6987561c ("perf annotate: Enable annotation of BPF programs")
      Link: https://lkml.kernel.org/n/tip-unbr5a5efakobfr6rhxq99ta@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      85127775
  9. 30 7月, 2019 3 次提交
  10. 09 7月, 2019 3 次提交
    • A
      perf tools: Use list_del_init() more thorougly · e56fbc9d
      Arnaldo Carvalho de Melo 提交于
      To allow for destructors to check if they're operating on a object still
      in a list, and to avoid going from use after free list entries into
      still valid, or even also other already removed from list entries.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e56fbc9d
    • A
      perf tools: Use zfree() where applicable · d8f9da24
      Arnaldo Carvalho de Melo 提交于
      In places where the equivalent was already being done, i.e.:
      
         free(a);
         a = NULL;
      
      And in placs where struct members are being freed so that if we have
      some erroneous reference to its struct, then accesses to freed members
      will result in segfaults, which we can detect faster than use after free
      to areas that may still have something seemingly valid.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8f9da24
    • L
      perf annotate: Fix dereferencing freed memory found by the smatch tool · 600c787d
      Leo Yan 提交于
      Based on the following report from Smatch, fix the potential
      dereferencing freed memory check.
      
        tools/perf/util/annotate.c:1125
        disasm_line__parse() error: dereferencing freed memory 'namep'
      
        tools/perf/util/annotate.c
        1100 static int disasm_line__parse(char *line, const char **namep, char **rawp)
        1101 {
        1102         char tmp, *name = ltrim(line);
      
        [...]
      
        1114         *namep = strdup(name);
        1115
        1116         if (*namep == NULL)
        1117                 goto out_free_name;
      
        [...]
      
        1124 out_free_name:
        1125         free((void *)namep);
                                  ^^^^^
        1126         *namep = NULL;
                     ^^^^^^
        1127         return -1;
        1128 }
      
      If strdup() fails to allocate memory space for *namep, we don't need to
      free memory with pointer 'namep', which is resident in data structure
      disasm_line::ins::name; and *namep is NULL pointer for this failure, so
      it's pointless to assign NULL to *namep again.
      
      Committer note:
      
      Freeing namep, which is the address of the first entry of the 'struct
      ins' that is the first member of struct disasm_line would in fact free
      that disasm_line instance, if it was allocated via malloc/calloc, which,
      later, would a dereference of freed memory.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Alexios Zavras <alexios.zavras@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20190702103420.27540-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      600c787d
  11. 02 7月, 2019 2 次提交
  12. 26 6月, 2019 2 次提交
  13. 11 6月, 2019 1 次提交
    • T
      perf report: Fix OOM error in TUI mode on s390 · 8a07aa4e
      Thomas Richter 提交于
      Debugging a OOM error using the TUI interface revealed this issue
      on s390:
      
      [tmricht@m83lp54 perf]$ cat /proc/kallsyms |sort
      ....
      00000001119b7158 B radix_tree_node_cachep
      00000001119b8000 B __bss_stop
      00000001119b8000 B _end
      000003ff80002850 t autofs_mount	[autofs4]
      000003ff80002868 t autofs_show_options	[autofs4]
      000003ff80002a98 t autofs_evict_inode	[autofs4]
      ....
      
      There is a huge gap between the last kernel symbol
      __bss_stop/_end and the first kernel module symbol
      autofs_mount (from autofs4 module).
      
      After reading the kernel symbol table via functions:
      
       dso__load()
       +--> dso__load_kernel_sym()
            +--> dso__load_kallsyms()
      	   +--> __dso_load_kallsyms()
      	        +--> symbols__fixup_end()
      
      the symbol __bss_stop has a start address of 1119b8000 and
      an end address of 3ff80002850, as can be seen by this debug statement:
      
        symbols__fixup_end __bss_stop start:0x1119b8000 end:0x3ff80002850
      
      The size of symbol __bss_stop is 0x3fe6e64a850 bytes!
      It is the last kernel symbol and fills up the space until
      the first kernel module symbol.
      
      This size kills the TUI interface when executing the following
      code:
      
        process_sample_event()
          hist_entry_iter__add()
            hist_iter__report_callback()
              hist_entry__inc_addr_samples()
                symbol__inc_addr_samples(symbol = __bss_stop)
                  symbol__cycles_hist()
                     annotated_source__alloc_histograms(...,
      				                symbol__size(sym),
      		                                ...)
      
      This function allocates memory to save sample histograms.
      The symbol_size() marco is defined as sym->end - sym->start, which
      results in above value of 0x3fe6e64a850 bytes and
      the call to calloc() in annotated_source__alloc_histograms() fails.
      
      The histgram memory allocation might fail, make this failure
      no-fatal and continue processing.
      
      Output before:
      [tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
      					      -i ~/slow.data 2>/tmp/2
      [tmricht@m83lp54 perf]$ tail -5 /tmp/2
        __symbol__inc_addr_samples(875): ENOMEM! sym->name=__bss_stop,
      		start=0x1119b8000, addr=0x2aa0005eb08, end=0x3ff80002850,
      		func: 0
      problem adding hist entry, skipping event
      0x938b8 [0x8]: failed to process type: 68 [Cannot allocate memory]
      [tmricht@m83lp54 perf]$
      
      Output after:
      [tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
      					      -i ~/slow.data 2>/tmp/2
      [tmricht@m83lp54 perf]$ tail -5 /tmp/2
         symbol__inc_addr_samples map:0x1597830 start:0x110730000 end:0x3ff80002850
         symbol__hists notes->src:0x2aa2a70 nr_hists:1
         symbol__inc_addr_samples sym:unlink_anon_vmas src:0x2aa2a70
         __symbol__inc_addr_samples: addr=0x11094c69e
         0x11094c670 unlink_anon_vmas: period++ [addr: 0x11094c69e, 0x2e, evidx=0]
         	=> nr_samples: 1, period: 526008
      [tmricht@m83lp54 perf]$
      
      There is no error about failed memory allocation and the TUI interface
      shows all entries.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/90cb5607-3e12-5167-682d-978eba7dafa8@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8a07aa4e
  14. 05 6月, 2019 1 次提交
  15. 16 5月, 2019 1 次提交
    • J
      perf annotate: Remove hist__account_cycles() from callback · bdd1666b
      Jin Yao 提交于
      The hist__account_cycles() function is executed when the
      hist_iter__branch_callback() is called.
      
      But it looks it's not necessary.  In hist__account_cycles, it already
      walks on all branch entries.
      
      This patch moves the hist__account_cycles out of callback, now the data
      processing is much faster than before.
      
      Previous code has an issue that the ch[offset].num++ (in
      __symbol__account_cycles) is executed repeatedly since
      hist__account_cycles is called in each hist_iter__branch_callback, so
      the counting of ch[offset].num is not correct (too big).
      
      With this patch, the issue is fixed. And we don't need the code of
      "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
      annotation__count_and_fill), otherwise some data would be hidden.
      
      Now, we can try, for example:
      
        perf record -b ...
        perf annotate or perf report -s symbol
      
      The before/after output should be no change.
      
       v3:
       ---
       Fix the crash in stdio mode.
       Like previous code, it needs the checking of ui__has_annotation()
       before hist__account_cycles()
      
       v2:
       ---
       1. Cover the similar perf report
       2. Remove the checking code "ch->reset >= ch->num / 2"
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bdd1666b
  16. 03 5月, 2019 1 次提交
  17. 21 3月, 2019 1 次提交
    • S
      perf annotate: Enable annotation of BPF programs · 6987561c
      Song Liu 提交于
      In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
      a new function symbol__disassemble_bpf(), where annotation line
      information is filled based on the bpf_prog_info and btf data saved in
      given perf_env.
      
      symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
      programs.
      
      Committer testing:
      
      After fixing this:
      
        -               u64 *addrs = (u64 *)(info_linear->info.jited_ksyms);
        +               u64 *addrs = (u64 *)(uintptr_t)(info_linear->info.jited_ksyms);
      
      Detected when crossbuilding to a 32-bit arch.
      
      And making all this dependent on HAVE_LIBBFD_SUPPORT and
      HAVE_LIBBPF_SUPPORT:
      
      1) Have a BPF program running, one that has BTF info, etc, I used
         the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
         by 'perf trace'.
      
        # grep -B1 augmented_raw ~/.perfconfig
        [trace]
      	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
        #
        # perf trace -e *mmsg
        dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
        NetworkManager/10055 sendmmsg(22<socket:[1056822]>, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2
      
      2) Then do a 'perf record' system wide for a while:
      
        # perf record -a
        ^C[ perf record: Woken up 68 times to write data ]
        [ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
        #
      
      3) Check that we captured BPF and BTF info in the perf.data file:
      
        # perf report --header-only | grep 'b[pt]f'
        # event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
        # bpf_prog_info of id 13
        # bpf_prog_info of id 14
        # bpf_prog_info of id 15
        # bpf_prog_info of id 16
        # bpf_prog_info of id 17
        # bpf_prog_info of id 18
        # bpf_prog_info of id 21
        # bpf_prog_info of id 22
        # bpf_prog_info of id 41
        # bpf_prog_info of id 42
        # btf info of id 2
        #
      
      4) Check which programs got recorded:
      
         # perf report | grep bpf_prog | head
           0.16%  exe              bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
           0.14%  exe              bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
           0.08%  fuse-overlayfs   bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
           0.07%  fuse-overlayfs   bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
           0.01%  clang-4.0        bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
           0.01%  clang-4.0        bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
           0.00%  clang            bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
           0.00%  runc             bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
           0.00%  clang            bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
           0.00%  sh               bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
        #
      
        This was with the default --sort order for 'perf report', which is:
      
          --sort comm,dso,symbol
      
        If we just look for the symbol, for instance:
      
         # perf report --sort symbol | grep bpf_prog | head
           0.26%  [k] bpf_prog_819967866022f1e1_sys_enter                -      -
           0.24%  [k] bpf_prog_c1bd85c092d6e4aa_sys_exit                 -      -
         #
      
        or the DSO:
      
         # perf report --sort dso | grep bpf_prog | head
           0.26%  bpf_prog_819967866022f1e1_sys_enter
           0.24%  bpf_prog_c1bd85c092d6e4aa_sys_exit
        #
      
      We'll see the two BPF programs that augmented_raw_syscalls.o puts in
      place,  one attached to the raw_syscalls:sys_enter and another to the
      raw_syscalls:sys_exit tracepoints, as expected.
      
      Now we can finally do, from the command line, annotation for one of
      those two symbols, with the original BPF program source coude intermixed
      with the disassembled JITed code:
      
        # perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter
      
        Samples: 950  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
        bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
        Percent      int sys_enter(struct syscall_enter_args *args)
         53.41         push   %rbp
      
          0.63         mov    %rsp,%rbp
          0.31         sub    $0x170,%rsp
          1.93         sub    $0x28,%rbp
          7.02         mov    %rbx,0x0(%rbp)
          3.20         mov    %r13,0x8(%rbp)
          1.07         mov    %r14,0x10(%rbp)
          0.61         mov    %r15,0x18(%rbp)
          0.11         xor    %eax,%eax
          1.29         mov    %rax,0x20(%rbp)
          0.11         mov    %rdi,%rbx
                     	return bpf_get_current_pid_tgid();
          2.02       → callq  *ffffffffda6776d9
          2.76         mov    %eax,-0x148(%rbp)
                       mov    %rbp,%rsi
                     int sys_enter(struct syscall_enter_args *args)
                       add    $0xfffffffffffffeb8,%rsi
                     	return bpf_map_lookup_elem(pids, &pid) != NULL;
                       movabs $0xffff975ac2607800,%rdi
      
          1.26       → callq  *ffffffffda6789e9
                       cmp    $0x0,%rax
          2.43       → je     0
                       add    $0x38,%rax
          0.21         xor    %r13d,%r13d
                     	if (pid_filter__has(&pids_filtered, getpid()))
          0.81         cmp    $0x0,%rax
                     → jne    0
                       mov    %rbp,%rdi
                     	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
          2.22         add    $0xfffffffffffffeb8,%rdi
          0.11         mov    $0x40,%esi
          0.32         mov    %rbx,%rdx
          2.74       → callq  *ffffffffda658409
                     	syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
          0.22         mov    %rbp,%rsi
          1.69         add    $0xfffffffffffffec0,%rsi
                     	syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
                       movabs $0xffff975bfcd36000,%rdi
      
                       add    $0xd0,%rdi
          0.21         mov    0x0(%rsi),%eax
          0.93         cmp    $0x200,%rax
                     → jae    0
          0.10         shl    $0x3,%rax
      
          0.11         add    %rdi,%rax
          0.11       → jmp    0
                       xor    %eax,%eax
                     	if (syscall == NULL || !syscall->enabled)
          1.07         cmp    $0x0,%rax
                     → je     0
                     	if (syscall == NULL || !syscall->enabled)
          6.57         movzbq 0x0(%rax),%rdi
      
                     	if (syscall == NULL || !syscall->enabled)
                       cmp    $0x0,%rdi
          0.95       → je     0
                       mov    $0x40,%r8d
                     	switch (augmented_args.args.syscall_nr) {
                       mov    -0x140(%rbp),%rdi
                     	switch (augmented_args.args.syscall_nr) {
                       cmp    $0x2,%rdi
                     → je     0
                       cmp    $0x101,%rdi
                     → je     0
                       cmp    $0x15,%rdi
                     → jne    0
                     	case SYS_OPEN:	 filename_arg = (const void *)args->args[0];
                       mov    0x10(%rbx),%rdx
                     → jmp    0
                     	case SYS_OPENAT: filename_arg = (const void *)args->args[1];
                       mov    0x18(%rbx),%rdx
                     	if (filename_arg != NULL) {
                       cmp    $0x0,%rdx
                     → je     0
                       xor    %edi,%edi
                     		augmented_args.filename.reserved = 0;
                       mov    %edi,-0x104(%rbp)
                     		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                       mov    %rbp,%rdi
                       add    $0xffffffffffffff00,%rdi
                     		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                       mov    $0x100,%esi
                     → callq  *ffffffffda658499
                       mov    $0x148,%r8d
                     		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                       mov    %eax,-0x108(%rbp)
                     		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                       mov    %rax,%rdi
                       shl    $0x20,%rdi
      
                       shr    $0x20,%rdi
      
                     		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
                       cmp    $0xff,%rdi
                     → ja     0
                     			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
                       add    $0x48,%rax
                     			len &= sizeof(augmented_args.filename.value) - 1;
                       and    $0xff,%rax
                       mov    %rax,%r8
                       mov    %rbp,%rcx
                     	return perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
                       add    $0xfffffffffffffeb8,%rcx
                       mov    %rbx,%rdi
                       movabs $0xffff975fbd72d800,%rsi
      
                       mov    $0xffffffff,%edx
                     → callq  *ffffffffda658ad9
                       mov    %rax,%r13
                     }
                       mov    %r13,%rax
          0.72         mov    0x0(%rbp),%rbx
                       mov    0x8(%rbp),%r13
          1.16         mov    0x10(%rbp),%r14
          0.10         mov    0x18(%rbp),%r15
          0.42         add    $0x28,%rbp
          0.54         leaveq
          0.54       ← retq
        #
      
      Please see 'man perf-config' to see how to control what should be seen,
      via ~/.perfconfig [annotate] section, for instance, one can suppress the
      source code and see just the disassembly, etc.
      
      Alternatively, use the TUI bu just using 'perf annotate', press
      '/bpf_prog' to see the bpf symbols, press enter and do the interactive
      annotation, which allows for dumping to a file after selecting the
      the various output tunables, for instance, the above without source code
      intermixed, plus showing all the instruction offsets:
      
        # perf annotate bpf_prog_819967866022f1e1_sys_enter
      
      Then press: 's' to hide the source code + 'O' twice to show all
      instruction offsets, then 'P' to print to the
      bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:
      
        # cat bpf_prog_819967866022f1e1_sys_enter.annotation
        bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
        Event: cycles:ppp
      
         53.41    0:   push   %rbp
      
          0.63    1:   mov    %rsp,%rbp
          0.31    4:   sub    $0x170,%rsp
          1.93    b:   sub    $0x28,%rbp
          7.02    f:   mov    %rbx,0x0(%rbp)
          3.20   13:   mov    %r13,0x8(%rbp)
          1.07   17:   mov    %r14,0x10(%rbp)
          0.61   1b:   mov    %r15,0x18(%rbp)
          0.11   1f:   xor    %eax,%eax
          1.29   21:   mov    %rax,0x20(%rbp)
          0.11   25:   mov    %rdi,%rbx
          2.02   28: → callq  *ffffffffda6776d9
          2.76   2d:   mov    %eax,-0x148(%rbp)
                 33:   mov    %rbp,%rsi
                 36:   add    $0xfffffffffffffeb8,%rsi
                 3d:   movabs $0xffff975ac2607800,%rdi
      
          1.26   47: → callq  *ffffffffda6789e9
                 4c:   cmp    $0x0,%rax
          2.43   50: → je     0
                 52:   add    $0x38,%rax
          0.21   56:   xor    %r13d,%r13d
          0.81   59:   cmp    $0x0,%rax
                 5d: → jne    0
                 63:   mov    %rbp,%rdi
          2.22   66:   add    $0xfffffffffffffeb8,%rdi
          0.11   6d:   mov    $0x40,%esi
          0.32   72:   mov    %rbx,%rdx
          2.74   75: → callq  *ffffffffda658409
          0.22   7a:   mov    %rbp,%rsi
          1.69   7d:   add    $0xfffffffffffffec0,%rsi
                 84:   movabs $0xffff975bfcd36000,%rdi
      
                 8e:   add    $0xd0,%rdi
          0.21   95:   mov    0x0(%rsi),%eax
          0.93   98:   cmp    $0x200,%rax
                 9f: → jae    0
          0.10   a1:   shl    $0x3,%rax
      
          0.11   a5:   add    %rdi,%rax
          0.11   a8: → jmp    0
                 aa:   xor    %eax,%eax
          1.07   ac:   cmp    $0x0,%rax
                 b0: → je     0
          6.57   b6:   movzbq 0x0(%rax),%rdi
      
                 bb:   cmp    $0x0,%rdi
          0.95   bf: → je     0
                 c5:   mov    $0x40,%r8d
                 cb:   mov    -0x140(%rbp),%rdi
                 d2:   cmp    $0x2,%rdi
                 d6: → je     0
                 d8:   cmp    $0x101,%rdi
                 df: → je     0
                 e1:   cmp    $0x15,%rdi
                 e5: → jne    0
                 e7:   mov    0x10(%rbx),%rdx
                 eb: → jmp    0
                 ed:   mov    0x18(%rbx),%rdx
                 f1:   cmp    $0x0,%rdx
                 f5: → je     0
                 f7:   xor    %edi,%edi
                 f9:   mov    %edi,-0x104(%rbp)
                 ff:   mov    %rbp,%rdi
                102:   add    $0xffffffffffffff00,%rdi
                109:   mov    $0x100,%esi
                10e: → callq  *ffffffffda658499
                113:   mov    $0x148,%r8d
                119:   mov    %eax,-0x108(%rbp)
                11f:   mov    %rax,%rdi
                122:   shl    $0x20,%rdi
      
                126:   shr    $0x20,%rdi
      
                12a:   cmp    $0xff,%rdi
                131: → ja     0
                133:   add    $0x48,%rax
                137:   and    $0xff,%rax
                13d:   mov    %rax,%r8
                140:   mov    %rbp,%rcx
                143:   add    $0xfffffffffffffeb8,%rcx
                14a:   mov    %rbx,%rdi
                14d:   movabs $0xffff975fbd72d800,%rsi
      
                157:   mov    $0xffffffff,%edx
                15c: → callq  *ffffffffda658ad9
                161:   mov    %rax,%r13
                164:   mov    %r13,%rax
          0.72  167:   mov    0x0(%rbp),%rbx
                16b:   mov    0x8(%rbp),%r13
          1.16  16f:   mov    0x10(%rbp),%r14
          0.10  173:   mov    0x18(%rbp),%r15
          0.42  177:   add    $0x28,%rbp
          0.54  17b:   leaveq
          0.54  17c: ← retq
      
      Another cool way to test all this is to symple use 'perf top' look for
      those symbols, go there and press enter, annotate it live :-)
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislav Fomichev <sdf@google.com>
      Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6987561c
  18. 07 3月, 2019 1 次提交
    • A
      perf annotate: Calculate the max instruction name, align column to that · bc3bb795
      Arnaldo Carvalho de Melo 提交于
      We were hardcoding '6' as the max instruction name, and we have lots
      that are longer than that, see the diff from two 'P' printed TUI
      annotations for a libc function that uses instructions with long names,
      such as 'vpmovmskb' with its 9 chars:
      
        --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
        +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
        @@ -2,284 +2,284 @@
         Event: cycles:ppp
      
         Percent        endbr64
        -  0.10         mov    %edi,%eax
        +  0.10         mov        %edi,%eax
        -               xor    %edx,%edx
        +               xor        %edx,%edx
        -  3.54         vpxor  %ymm7,%ymm7,%ymm7
        +  3.54         vpxor      %ymm7,%ymm7,%ymm7
        -               or     %esi,%eax
        +               or         %esi,%eax
        -               and    $0xfff,%eax
        +               and        $0xfff,%eax
        -               cmp    $0xf80,%eax
        +               cmp        $0xf80,%eax
        -             ↓ jg     370
        +             ↓ jg         370
        - 27.07         vmovdqu (%rdi),%ymm1
        + 27.07         vmovdqu    (%rdi),%ymm1
        -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
        +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
        -  2.15         vpminub %ymm1,%ymm0,%ymm0
        +  2.15         vpminub    %ymm1,%ymm0,%ymm0
        -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
        +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
        -  0.43         vpmovmskb %ymm0,%ecx
        +  0.43         vpmovmskb  %ymm0,%ecx
        -  1.53         test   %ecx,%ecx
        +  1.53         test       %ecx,%ecx
        -             ↓ je     b0
        +             ↓ je         b0
        -  5.26         tzcnt  %ecx,%edx
        +  5.26         tzcnt      %ecx,%edx
        - 18.40         movzbl (%rdi,%rdx,1),%eax
        + 18.40         movzbl     (%rdi,%rdx,1),%eax
        -  7.09         movzbl (%rsi,%rdx,1),%edx
        +  7.09         movzbl     (%rsi,%rdx,1),%edx
        -  3.34         sub    %edx,%eax
        +  3.34         sub        %edx,%eax
           2.37         vzeroupper
                      ← retq
                        nop
        -         50:   tzcnt  %ecx,%edx
        +         50:   tzcnt      %ecx,%edx
        -               movzbl 0x20(%rdi,%rdx,1),%eax
        +               movzbl     0x20(%rdi,%rdx,1),%eax
        -               movzbl 0x20(%rsi,%rdx,1),%edx
        +               movzbl     0x20(%rsi,%rdx,1),%edx
        -               sub    %edx,%eax
        +               sub        %edx,%eax
                        vzeroupper
                      ← retq
        -               data16 nopw %cs:0x0(%rax,%rax,1)
        +               data16     nopw %cs:0x0(%rax,%rax,1)
      Reported-by: NTravis Downs <travis.downs@gmail.com>
      LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bc3bb795
  19. 22 2月, 2019 1 次提交
    • W
      perf annotate: Fix getting source line failure · 11db1ad4
      Wei Li 提交于
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11db1ad4
  20. 06 2月, 2019 1 次提交
  21. 25 1月, 2019 1 次提交
  22. 04 1月, 2019 1 次提交
  23. 18 12月, 2018 4 次提交
  24. 18 10月, 2018 1 次提交
    • D
      perf annotate: Add Sparc support · 0ab41886
      David Miller 提交于
      E.g.:
      
        $ perf annotate --stdio2
        Samples: 7K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 3086733887
        __gettimeofday  /lib32/libc-2.27.so [Percent: local period]
        Percent│
               │
               │
               │    Disassembly of section .text:
               │
               │    000a6fa0 <__gettimeofday@@GLIBC_2.0>:
          0.47 │      save   %sp, -96, %sp
          0.73 │      sethi  %hi(0xe9000), %l7
               │    → call   __frame_state_for@@GLIBC_2.0+0x480
          0.30 │      add    %l7, 0x58, %l7     ! e9058 <nftw64@@GLIBC_2.3.3+0x818>
          1.33 │      mov    %i0, %o0
               │      mov    %i1, %o1
          0.43 │      mov    0x74, %g1
               │      ta     0x10
         88.92 │    ↓ bcc    30
          2.95 │      clr    %g1
               │      neg    %o0
               │      mov    1, %g1
          0.31 │30:   cmp    %g1, 0
               │      bne,pn %icc, a6fe4 <__gettimeofday@@GLIBC_2.0+0x44>
               │      mov    %o0, %i0
          1.96 │    ← return %i7 + 8
          2.62 │      nop
               │      sethi  %hi(0), %g1
               │      neg    %o0, %g2
               │      add    %g1, 0x160, %g1
               │      ld     [ %l7 + %g1 ], %g1
               │      st     %g2, [ %g7 + %g1 ]
               │    ← return %i7 + 8
               │      mov    -1, %o0
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20181016.205555.1070918198627611771.davem@davemloft.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0ab41886