1. 09 9月, 2016 1 次提交
    • P
      perf annotate: Add branch stack / basic block · 70fbe057
      Peter Zijlstra 提交于
      I wanted to know the hottest path through a function and figured the
      branch-stack (LBR) information should be able to help out with that.
      
      The below uses the branch-stack to create basic blocks and generate
      statistics from them.
      
              from    to              branch_i
              * ----> *
                      |
                      | block
                      v
                      * ----> *
                      from    to      branch_i+1
      
      The blocks are broken down into non-overlapping ranges, while tracking
      if the start of each range is an entry point and/or the end of a range
      is a branch.
      
      Each block iterates all ranges it covers (while splitting where required
      to exactly match the block) and increments the 'coverage' count.
      
      For the range including the branch we increment the taken counter, as
      well as the pred counter if flags.predicted.
      
      Using these number we can find if an instruction:
      
       - had coverage; given by:
      
              br->coverage / br->sym->max_coverage
      
         This metric ensures each symbol has a 100% spot, which reflects the
         observation that each symbol must have a most covered/hottest
         block.
      
       - is a branch target: br->is_target && br->start == add
      
       - for targets, how much of a branch's coverages comes from it:
      
      	target->entry / branch->coverage
      
       - is a branch: br->is_branch && br->end == addr
      
       - for branches, how often it was taken:
      
              br->taken / br->coverage
      
         after all, all execution that didn't take the branch would have
         incremented the coverage and continued onward to a later branch.
      
       - for branches, how often it was predicted:
      
              br->pred / br->taken
      
      The coverage percentage is used to color the address and asm sections;
      for low (<1%) coverage we use NORMAL (uncolored), indicating that these
      instructions are not 'important'. For high coverage (>75%) we color the
      address RED.
      
      For each branch, we add an asm comment after the instruction with
      information on how often it was taken and predicted.
      
      Output looks like (sans color, which does loose a lot of the
      information :/)
      
      $ perf record --branch-filter u,any -e cycles:p ./branches 27
      $ perf annotate branches
      
       Percent |	Source code & Disassembly of branches for cycles:pu (217 samples)
      ---------------------------------------------------------------------------------
               :	branches():
          0.00 :	  40057a:       push   %rbp
          0.00 :	  40057b:       mov    %rsp,%rbp
          0.00 :	  40057e:       sub    $0x20,%rsp
          0.00 :	  400582:       mov    %rdi,-0x18(%rbp)
          0.00 :	  400586:       mov    %rsi,-0x20(%rbp)
          0.00 :	  40058a:       mov    -0x18(%rbp),%rax
          0.00 :	  40058e:       mov    %rax,-0x10(%rbp)
          0.00 :	  400592:       movq   $0x0,-0x8(%rbp)
          0.00 :	  40059a:       jmpq   400656 <branches+0xdc>
          1.84 :	  40059f:       mov    -0x10(%rbp),%rax	# +100.00%
          3.23 :	  4005a3:       and    $0x1,%eax
          1.84 :	  4005a6:       test   %rax,%rax
          0.00 :	  4005a9:       je     4005bf <branches+0x45>	# -54.50% (p:42.00%)
          0.46 :	  4005ab:       mov    0x200bbe(%rip),%rax        # 601170 <acc>
         12.90 :	  4005b2:       add    $0x1,%rax
          2.30 :	  4005b6:       mov    %rax,0x200bb3(%rip)        # 601170 <acc>
          0.46 :	  4005bd:       jmp    4005d1 <branches+0x57>	# -100.00% (p:100.00%)
          0.92 :	  4005bf:       mov    0x200baa(%rip),%rax        # 601170 <acc>	# +49.54%
         13.82 :	  4005c6:       sub    $0x1,%rax
          0.46 :	  4005ca:       mov    %rax,0x200b9f(%rip)        # 601170 <acc>
          2.30 :	  4005d1:       mov    -0x10(%rbp),%rax	# +50.46%
          0.46 :	  4005d5:       mov    %rax,%rdi
          0.46 :	  4005d8:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  4005dd:       mov    %rax,-0x10(%rbp)	# +100.00%
          0.92 :	  4005e1:       mov    -0x18(%rbp),%rax
          0.00 :	  4005e5:       and    $0x1,%eax
          0.00 :	  4005e8:       test   %rax,%rax
          0.00 :	  4005eb:       je     4005ff <branches+0x85>	# -100.00% (p:100.00%)
          0.00 :	  4005ed:       mov    0x200b7c(%rip),%rax        # 601170 <acc>
          0.00 :	  4005f4:       shr    $0x2,%rax
          0.00 :	  4005f8:       mov    %rax,0x200b71(%rip)        # 601170 <acc>
          0.00 :	  4005ff:       mov    -0x10(%rbp),%rax	# +100.00%
          7.37 :	  400603:       and    $0x1,%eax
          3.69 :	  400606:       test   %rax,%rax
          0.00 :	  400609:       jne    400612 <branches+0x98>	# -59.25% (p:42.99%)
          1.84 :	  40060b:       mov    $0x1,%eax
         14.29 :	  400610:       jmp    400617 <branches+0x9d>	# -100.00% (p:100.00%)
          1.38 :	  400612:       mov    $0x0,%eax	# +57.65%
         10.14 :	  400617:       test   %al,%al	# +42.35%
          0.00 :	  400619:       je     40062f <branches+0xb5>	# -57.65% (p:100.00%)
          0.46 :	  40061b:       mov    0x200b4e(%rip),%rax        # 601170 <acc>
          2.76 :	  400622:       sub    $0x1,%rax
          0.00 :	  400626:       mov    %rax,0x200b43(%rip)        # 601170 <acc>
          0.46 :	  40062d:       jmp    400641 <branches+0xc7>	# -100.00% (p:100.00%)
          0.92 :	  40062f:       mov    0x200b3a(%rip),%rax        # 601170 <acc>	# +56.13%
          2.30 :	  400636:       add    $0x1,%rax
          0.92 :	  40063a:       mov    %rax,0x200b2f(%rip)        # 601170 <acc>
          0.92 :	  400641:       mov    -0x10(%rbp),%rax	# +43.87%
          2.30 :	  400645:       mov    %rax,%rdi
          0.00 :	  400648:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  40064d:       mov    %rax,-0x10(%rbp)	# +100.00%
          1.84 :	  400651:       addq   $0x1,-0x8(%rbp)
          0.92 :	  400656:       mov    -0x8(%rbp),%rax
          5.07 :	  40065a:       cmp    -0x20(%rbp),%rax
          0.00 :	  40065e:       jb     40059f <branches+0x25>	# -100.00% (p:100.00%)
          0.00 :	  400664:       nop
          0.00 :	  400665:       leaveq
          0.00 :	  400666:       retq
      
      (Note: the --branch-filter u,any was used to avoid spurious target and
      branch points due to interrupts/faults, they show up as very small -/+
      annotations on 'weird' locations)
      
      Committer note:
      
      Please take a look at:
      
        http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
      
      To see the colors.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      [ Moved sym->max_coverage to 'struct annotate', aka symbol__annotate(sym) ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70fbe057
  2. 01 9月, 2016 1 次提交
  3. 28 7月, 2016 1 次提交
  4. 13 7月, 2016 3 次提交
    • D
      perf symbols: Add Rust demangling · cae15db7
      David Tolnay 提交于
      Rust demangling is another step after bfd demangling. Add a diagnosis to
      identify mangled Rust symbols based on the hash that the Rust mangler appends
      as the last path component, as well as other characteristics.  Add a demangler
      to reconstruct the original symbol.
      
      Committer notes:
      
      How I tested it:
      
      Enabled COPR on Fedora 24 and then installed the 'rust-binary' package,
      with it:
      
        $ cat src/main.rs
        fn main() {
            println!("Hello, world!");
        }
        $ cat Cargo.toml
        [package]
      
        name = "hello_world"
        version = "0.0.1"
        authors = [ "Arnaldo Carvalho de Melo <acme@kernel.org>" ]
      
        $ perf record cargo bench
         Compiling hello_world v0.0.1 (file:///home/acme/projects/hello_world)
           Running target/release/hello_world-d4b9dab4b2a47d75
      
        running 0 tests
      
        test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
      
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.096 MB perf.data (1457 samples) ]
        $
      
      Before this patch:
      
        $ perf report --stdio --dsos librbml-e8edd0fd.so
        # dso: librbml-e8edd0fd.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:u'
        # Event count (approx.): 979599126
        #
        # Overhead  Command  Symbol
        # ........  .......  .............................................................................................................
        #
             1.78%  rustc    [.] rbml::reader::maybe_get_doc::hb9d387df6024b15b
             1.50%  rustc    [.] _$LT$reader..DocsIterator$LT$$u27$a$GT$$u20$as$u20$std..iter..Iterator$GT$::next::hd9af9e60d79a35c8
             1.20%  rustc    [.] rbml::reader::doc_at::hc88107fba445af31
             0.46%  rustc    [.] _$LT$reader..TaggedDocsIterator$LT$$u27$a$GT$$u20$as$u20$std..iter..Iterator$GT$::next::h0cb40e696e4bb489
             0.35%  rustc    [.] rbml::reader::Decoder::_next_int::h66eef7825a398bc3
             0.29%  rustc    [.] rbml::reader::Decoder::_next_sub::h8e5266005580b836
             0.15%  rustc    [.] rbml::reader::get_doc::h094521c645459139
             0.14%  rustc    [.] _$LT$reader..Decoder$LT$$u27$doc$GT$$u20$as$u20$serialize..Decoder$GT$::read_u32::h0acea2fff9669327
             0.07%  rustc    [.] rbml::reader::Decoder::next_doc::h6714d469c9dfaf91
             0.07%  rustc    [.] _ZN4rbml6reader10doc_as_u6417h930b740aa94f1d3aE@plt
             0.06%  rustc    [.] _fini
        $
      
      After:
      
        $ perf report --stdio --dsos librbml-e8edd0fd.so
        # dso: librbml-e8edd0fd.so
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:u'
        # Event count (approx.): 979599126
        #
        # Overhead  Command  Symbol
        # ........  .......  .................................................................
        #
           1.78%  rustc    [.] rbml::reader::maybe_get_doc
           1.50%  rustc    [.] <reader::DocsIterator<'a> as std::iter::Iterator>::next
           1.20%  rustc    [.] rbml::reader::doc_at
           0.46%  rustc    [.] <reader::TaggedDocsIterator<'a> as std::iter::Iterator>::next
           0.35%  rustc    [.] rbml::reader::Decoder::_next_int
           0.29%  rustc    [.] rbml::reader::Decoder::_next_sub
           0.15%  rustc    [.] rbml::reader::get_doc
           0.14%  rustc    [.] <reader::Decoder<'doc> as serialize::Decoder>::read_u32
           0.07%  rustc    [.] rbml::reader::Decoder::next_doc
           0.07%  rustc    [.] _ZN4rbml6reader10doc_as_u6417h930b740aa94f1d3aE@plt
           0.06%  rustc    [.] _fini
        $
      Signed-off-by: NDavid Tolnay <dtolnay@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5780B7FA.3030602@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cae15db7
    • A
      perf tools: Uninline scnprintf() and vscnprint() · d0761e37
      Arnaldo Carvalho de Melo 提交于
      They were in tools/include/linux/kernel.h, requiring that it in turn
      included stdio.h, which is way too heavy.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-855h8olnkot9v0dajuee1lo3@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0761e37
    • A
      tools: Introduce str_error_r() · c8b5f2c9
      Arnaldo Carvalho de Melo 提交于
      The tools so far have been using the strerror_r() GNU variant, that
      returns a string, be it the buffer passed or something else.
      
      But that, besides being tricky in cases where we expect that the
      function using strerror_r() returns the error formatted in a provided
      buffer (we have to check if it returned something else and copy that
      instead), breaks the build on systems not using glibc, like Alpine
      Linux, where musl libc is used.
      
      So, introduce yet another wrapper, str_error_r(), that has the GNU
      interface, but uses the portable XSI variant of strerror_r(), so that
      users rest asured that the provided buffer is used and it is what is
      returned.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c8b5f2c9
  5. 08 6月, 2016 2 次提交
  6. 07 6月, 2016 2 次提交
  7. 10 5月, 2016 1 次提交
  8. 07 5月, 2016 1 次提交
  9. 19 4月, 2016 1 次提交
  10. 15 4月, 2016 2 次提交
  11. 08 4月, 2016 2 次提交
    • A
      perf tools: Build syscall table .c header from kernel's syscall_64.tbl · 1b700c99
      Arnaldo Carvalho de Melo 提交于
      We used libaudit to map ids to syscall names and vice-versa, but that
      imposes a delay in supporting new syscalls, having to wait for libaudit
      to get those new syscalls on its tables.
      
      To remove that delay, for x86_64 initially, grab a copy of
      arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
      tables.
      
      Syscalls currently not available in audit-libs:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
        Error:	Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
        Hint:	try 'perf list syscalls:sys_enter_*'
        Hint:	and: 'man syscalls'
        #
      
      With this patch:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
          8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
          8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
         30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096
         31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
         31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
      [ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b700c99
    • A
      perf trace: Move syscall table id <-> name routines to separate class · fd0db102
      Arnaldo Carvalho de Melo 提交于
      We're using libaudit for doing name to id and id to syscall name
      translations, but that makes 'perf trace' to have to wait for newer
      libaudit versions supporting recently added syscalls, such as
      "userfaultfd" at the time of this changeset.
      
      We have all the information right there, in the kernel sources, so move
      this code to a separate place, wrapped behind functions that will
      progressively use the kernel source files to extract the syscall table
      for use in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd0db102
  12. 02 4月, 2016 1 次提交
  13. 24 3月, 2016 1 次提交
  14. 11 3月, 2016 1 次提交
    • J
      perf jitdump: Build only on supported archs · e12b202f
      Jiri Olsa 提交于
      Build jitdump only on architectures defined in util/genelf.h file, to avoid
      breaking the build on such arches.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Mel Gorman <mgorman@suse.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20160310164113.GA11357@krava.redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e12b202f
  15. 09 3月, 2016 1 次提交
    • A
      perf jitdump: DWARF is also needed · 46dad054
      Arnaldo Carvalho de Melo 提交于
      While building on a Docker container for ubuntu and installing package
      by package one ends up with:
      
          MKDIR    /tmp/build/util/
          CC       /tmp/build/util/genelf.o
        util/genelf.c:22:19: fatal error: dwarf.h: No such file or directory
         #include <dwarf.h>
                         ^
        compilation terminated.
        mv: cannot stat '/tmp/build/util/.genelf.o.tmp': No such file or directory
      
      Because the jitdump code needs the DWARF related development packages to
      be installed. So make it dependent on that so that the build can succeed
      without jitdump support.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-le498robnmxd40237wej3w62@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      46dad054
  16. 23 2月, 2016 1 次提交
  17. 05 2月, 2016 3 次提交
    • S
      perf jit: add source line info support · 598b7c69
      Stephane Eranian 提交于
      This patch adds source line information support to perf for jitted code.
      
      The source line info must be emitted by the runtime, such as JVMTI.
      
      Perf injects extract the source line info from the jitdump file and adds
      the corresponding .debug_lines section in the ELF image generated for
      each jitted function.
      
      The source line enables matching any address in the profile with a
      source file and line number.
      
      The improvement is visible in perf annotate with the source code
      displayed alongside the assembly code.
      
      The dwarf code leverages the support from OProfile which is also
      released under GPLv2.  Copyright 2007 OProfile authors.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Carl Love <cel@us.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John McCutchan <johnmccutchan@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1448874143-7269-5-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      598b7c69
    • S
      perf inject: Add jitdump mmap injection support · 9b07e27f
      Stephane Eranian 提交于
      This patch adds a --jit/-j option to perf inject.
      
      This options injects MMAP records into the perf.data file to cover the
      jitted code mmaps. It also emits ELF images for each function in the
      jidump file.  Those images are created where the jitdump file is.  The
      MMAP records point to that location as well.
      
      Typical flow:
      
        $ perf record -k mono -- java -agentpath:libpjvmti.so java_class
        $ perf inject --jit -i perf.data -o perf.data.jitted
        $ perf report -i perf.data.jitted
      
      Note that jitdump.h support is not limited to Java, it works with any
      jitted environment modified to emit the jitdump file format, include
      those where code can be jitted multiple times and moved around.
      
      The jitdump.h format is adapted from the Oprofile project.
      
      The genelf.c (ELF binary generation) depends on MD5 hash encoding for
      the buildid. To enable this, libssl-dev must be installed. If not, then
      genelf.c defaults to using urandom to generate the buildid, which is not
      ideal.  The Makefile auto-detects the presence on libssl-dev.
      
      This version mmaps the jitdump file to create a marker MMAP record in
      the perf.data file. The marker is used to detect jitdump and cause perf
      inject to inject the jitted mmaps and generate ELF images for jitted
      functions.
      
      In V8, the following fixes and changes were made among other things:
      
        -  the jidump header format include a new flags field to be used
           to carry information about the configuration of the runtime agent.
           Contributed by: Adrian Hunter <adrian.hunter@intel.com>
      
        - Fix mmap pgoff: MMAP event pgoff must be the offset within the ELF file
          at which the code resides.
          Contributed by: Adrian Hunter <adrian.hunter@intel.com>
      
        - Fix ELF virtual addresses: perf tools expect the ELF virtual addresses of dynamic
          objects to match the file offset.
          Contributed by: Adrian Hunter <adrian.hunter@intel.com>
      
        - JIT MMAP injection does not obey finished_round semantics. JIT MMAP injection injects all
          MMAP events in one go, so it does not obey finished_round semantics, so drop the
          finished_round events from the output perf.data file.
          Contributed by: Adrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Carl Love <cel@us.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John McCutchan <johnmccutchan@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1448874143-7269-3-git-send-email-eranian@google.com
      [ Moved inject.build_ids ordering bits to a separate patch, fixed the NO_LIBELF=1 build ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9b07e27f
    • S
      perf symbols: add Java demangling support · e9c4bcdd
      Stephane Eranian 提交于
      Add Java function descriptor demangling support.  Something bfd cannot
      do.
      
      Use the JAVA_DEMANGLE_NORET flag to avoid decoding the return type of
      functions.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Carl Love <cel@us.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John McCutchan <johnmccutchan@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1448874143-7269-2-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9c4bcdd
  18. 08 1月, 2016 2 次提交
  19. 18 12月, 2015 1 次提交
  20. 17 12月, 2015 1 次提交
  21. 14 12月, 2015 1 次提交
  22. 10 12月, 2015 2 次提交
  23. 19 11月, 2015 2 次提交
    • H
      perf bpf: Add prologue for BPF programs for fetching arguments · bfc077b4
      He Kuang 提交于
      This patch generates a prologue for a BPF program which fetches arguments for
      it.  With this patch, the program can have arguments as follow:
      
        SEC("lock_page=__lock_page page->flags")
        int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
        {
       	 return 1;
        }
      
      This patch passes at most 3 arguments from r3, r4 and r5. r1 is still the ctx
      pointer. r2 is used to indicate if dereferencing was done successfully.
      
      This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack pointer
      for result. Result of each arguments first store on stack:
      
       low address
       BPF_REG_FP - 24  ARG3
       BPF_REG_FP - 16  ARG2
       BPF_REG_FP - 8   ARG1
       BPF_REG_FP
       high address
      
      Then loaded into r3, r4 and r5.
      
      The output prologue for offn(...off2(off1(reg)))) should be:
      
           r6 <- r1			// save ctx into a callee saved register
           r7 <- fp
           r7 <- r7 - stack_offset	// pointer to result slot
           /* load r3 with the offset in pt_regs of 'reg' */
           (r7) <- r3			// make slot valid
           r3 <- r3 + off1		// prepare to read unsafe pointer
           r2 <- 8
           r1 <- r7			// result put onto stack
           call probe_read		// read unsafe pointer
           jnei r0, 0, err		// error checking
           r3 <- (r7)			// read result
           r3 <- r3 + off2		// prepare to read unsafe pointer
           r2 <- 8
           r1 <- r7
           call probe_read
           jnei r0, 0, err
           ...
           /* load r2, r3, r4 from stack */
           goto success
      err:
           r2 <- 1
           /* load r3, r4, r5 with 0 */
           goto usercode
      success:
           r2 <- 0
      usercode:
           r1 <- r6	// restore ctx
           // original user code
      
      If all of arguments reside in register (dereferencing is not
      required), gen_prologue_fastpath() will be used to create
      fast prologue:
      
           r3 <- (r1 + offset of reg1)
           r4 <- (r1 + offset of reg2)
           r5 <- (r1 + offset of reg3)
           r2 <- 0
      
      P.S.
      
      eBPF calling convention is defined as:
      
      * r0		- return value from in-kernel function, and exit value
                        for eBPF program
      * r1 - r5	- arguments from eBPF program to in-kernel function
      * r6 - r9	- callee saved registers that in-kernel function will
                        preserve
      * r10		- read-only frame pointer to access stack
      
      Committer note:
      
      At least testing if it builds and loads:
      
        # cat test_probe_arg.c
        struct pt_regs;
      
        __attribute__((section("lock_page=__lock_page page->flags"), used))
        int func(struct pt_regs *ctx, int err, unsigned long flags)
        {
        	return 1;
        }
      
        char _license[] __attribute__((section("license"), used)) = "GPL";
        int _version __attribute__((section("version"), used)) = 0x40300;
        # perf record -e ./test_probe_arg.c usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.016 MB perf.data ]
        # perf evlist
        perf_bpf_probe:lock_page
        #
      Signed-off-by: NHe Kuang <hekuang@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-11-git-send-email-wangnan0@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bfc077b4
    • A
      tools: Adopt memdup() from tools/perf, moving it to tools/lib/string.c · 4ddd3274
      Arnaldo Carvalho de Melo 提交于
      That will contain more string functions with counterparts, sometimes
      verbatim copies, in the kernel.
      Acked-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/n/tip-rah6g97kn21vfgmlramorz6o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4ddd3274
  24. 28 10月, 2015 1 次提交
    • W
      perf tools: Enable passing bpf object file to --event · 84c86ca1
      Wang Nan 提交于
      By introducing new rules in tools/perf/util/parse-events.[ly], this
      patch enables 'perf record --event bpf_file.o' to select events by an
      eBPF object file. It calls parse_events_load_bpf() to load that file,
      which uses bpf__prepare_load() and finally calls bpf_object__open() for
      the object files.
      
      After applying this patch, commands like:
      
       # perf record --event foo.o sleep
      
      become possible.
      
      However, at this point it is unable to link any useful things onto the
      evsel list because the creating of probe points and BPF program
      attaching have not been implemented.  Before real events are possible to
      be extracted, to avoid perf report error because of empty evsel list,
      this patch link a dummy evsel. The dummy event related code will be
      removed when probing and extracting code is ready.
      
      Commiter notes:
      
      Using it:
      
        $ ls -la foo.o
        ls: cannot access foo.o: No such file or directory
        $ perf record --event foo.o sleep
        libbpf: failed to open foo.o: No such file or directory
        event syntax error: 'foo.o'
                             \___ BPF object file 'foo.o' is invalid
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $
      
        $ file /tmp/build/perf/perf.o
        /tmp/build/perf/perf.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
        $ perf record --event /tmp/build/perf/perf.o sleep
        libbpf: /tmp/build/perf/perf.o is not an eBPF object file
        event syntax error: '/tmp/build/perf/perf.o'
                             \___ BPF object file '/tmp/build/perf/perf.o' is invalid
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $
      
        $ file /tmp/foo.o
        /tmp/foo.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
        $ perf record --event /tmp/foo.o sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data ]
        $ perf evlist
        /tmp/foo.o
        $ perf evlist  -v
        /tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        $
      
      So, type 1 is PERF_TYPE_SOFTWARE, config 0x9 is PERF_COUNT_SW_DUMMY, ok.
      
        $ perf report --stdio
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        $
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      84c86ca1
  25. 07 10月, 2015 1 次提交
  26. 01 10月, 2015 1 次提交
  27. 14 9月, 2015 1 次提交
  28. 01 9月, 2015 2 次提交
    • S
      perf record: Add ability to name registers to record · bcc84ec6
      Stephane Eranian 提交于
      This patch modifies the -I/--int-regs option to enablepassing the name
      of the registers to sample on interrupt. Registers can be specified by
      their symbolic names. For instance on x86, --intr-regs=ax,si.
      
      The motivation is to reduce the size of the perf.data file and the
      overhead of sampling by only collecting the registers useful to a
      specific analysis. For instance, for value profiling, sampling only the
      registers used to passed arguements to functions.
      
      With no parameter, the --intr-regs still records all possible registers
      based on the architecture.
      
      To name registers, it is necessary to use the long form of the option,
      i.e., --intr-regs:
      
        $ perf record --intr-regs=si,di,r8,r9 .....
      
      To record any possible registers:
      
        $ perf record -I .....
        $ perf report --intr-regs ...
      
      To display the register, one can use perf report -D
      
      To list the available registers:
      
        $ perf record --intr-regs=\?
        available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bcc84ec6
    • A
      perf tools: Fix build on powerpc broken by pt/bts · 97db6206
      Adrian Hunter 提交于
      It is theoretically possible to process perf.data files created on x86
      and that contain Intel PT or Intel BTS data, on any other architecture,
      which is why it is possible for there to be build errors on powerpc
      caused by pt/bts.
      
      The errors were:
      
      	util/intel-pt-decoder/intel-pt-insn-decoder.c: In function ‘intel_pt_insn_decoder’:
      	util/intel-pt-decoder/intel-pt-insn-decoder.c:138:3: error: switch missing default case [-Werror=switch-default]
      	   switch (insn->immediate.nbytes) {
      	   ^
      	cc1: all warnings being treated as errors
      
      	linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_synth_branch_sample':
      	sources/linux-acme.git/tools/perf/util/intel-pt.c:871: undefined reference to `tsc_to_perf_time'
      	linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_sample':
      	sources/linux-acme.git/tools/perf/util/intel-pt.c:915: undefined reference to `tsc_to_perf_time'
      	sources/linux-acme.git/tools/perf/util/intel-pt.c:962: undefined reference to `tsc_to_perf_time'
      	linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_process_event':
      	sources/linux-acme.git/tools/perf/util/intel-pt.c:1454: undefined reference to `perf_time_to_tsc'
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1441046384-28663-1-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97db6206