1. 20 6月, 2017 20 次提交
    • N
      perf ftrace: Add option for function filtering · 78b83e8b
      Namhyung Kim 提交于
      The -T/--trace-funcs and -N/--notrace-funcs options are to specify
      functions to enable/disable tracing dynamically.
      
      The -G/--graph-funcs and -g/--nograph-funcs options are to set filters
      for function graph tracer.
      
      For example, to trace fault handling functions only:
      
        $ sudo perf ftrace -T *fault hello
         0)               |  __do_page_fault() {
         0)               |    handle_mm_fault() {
         0)   2.117 us    |      __handle_mm_fault();
         0)   3.627 us    |    }
         0)   7.811 us    |  }
         0)               |  __do_page_fault() {
         0)               |    handle_mm_fault() {
         0)   2.014 us    |      __handle_mm_fault();
         0)   2.424 us    |    }
         0)   2.951 us    |  }
         ...
      
      To trace all functions executed in __do_page_fault:
      
        $ sudo perf ftrace -G __do_page_fault hello
         2)               |  __do_page_fault() {
         3)   0.060 us    |    down_read_trylock();
         3)               |    find_vma() {
         3)   0.075 us    |      vmacache_find();
         3)   0.053 us    |      vmacache_update();
         3)   1.246 us    |    }
         3)               |    handle_mm_fault() {
         3)   0.063 us    |      __rcu_read_lock();
         3)   0.056 us    |      mem_cgroup_from_task();
         3)   0.057 us    |      __rcu_read_unlock();
         3)               |      __handle_mm_fault() {
         3)               |        filemap_map_pages() {
         3)   0.058 us    |          __rcu_read_lock();
         3)               |          alloc_set_pte() {
         ...
      
      But don't want to show details in handle_mm_fault:
      
        $ sudo perf ftrace -G __do_page_fault -g handle_mm_fault hello
         3)               |  __do_page_fault() {
         3)   0.049 us    |    down_read_trylock();
         3)               |    find_vma() {
         3)   0.048 us    |      vmacache_find();
         3)   0.041 us    |      vmacache_update();
         3)   0.680 us    |    }
         3)   0.036 us    |    up_read();
         3)   4.547 us    |  } /* __do_page_fault */
         ...
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170618142302.25390-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      78b83e8b
    • N
      perf ftrace: Move setup_pager before opening trace_pipe · 29681bc5
      Namhyung Kim 提交于
      The 'perf ftrace' command fails to reset tracer after finishing
      recording like below:
      
        $ sudo perf ftrace -v hello
        write 'nop' to tracing/current_tracer failed: Device or resource busy
        ...
      
      This is because the trace_pipe file is open in pager process.  Move the
      pager setup to before opening the file.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: kernel-team@lge.com
      Fixes: 58335964 ("perf ftrace: Use pager for displaying result")
      Link: http://lkml.kernel.org/r/20170618142302.25390-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29681bc5
    • N
      perf ftrace: Show error message when fails to set ftrace files · e7bd9ba2
      Namhyung Kim 提交于
      It'd be better for debugging to show an error message when it fails to
      setup ftrace for some reason.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170618142302.25390-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e7bd9ba2
    • M
      perf script: Support -F brstackoff,dso · 106dacd8
      Mark Santaniello 提交于
      The idea here is to make AutoFDO easier in cloud environment with ASLR.
      It's easiest to show how this is useful by example. I built a small test
      akin to "while(1) { do_nothing(); }" where the do_nothing function is
      loaded from a dso:
      
        $ cat burncpu.cpp
        #include <dlfcn.h>
      
        int main() {
          void* handle = dlopen("./dso.so", RTLD_LAZY);
          if (!handle) return -1;
      
          typedef void (*fp)();
          fp do_nothing = (fp) dlsym(handle, "do_nothing");
      
          while(1) {
            do_nothing();
          }
        }
      
        $ cat dso.cpp
        extern "C" void do_nothing() {}
      
        $ cat build.sh
        #!/bin/bash
        g++ -shared dso.cpp -o dso.so
        g++ burncpu.cpp -o burncpu -ldl
      
      I sampled the execution of this program with perf record -b.
      
      Using the existing "brstack,dso", we get absolute addresses that are
      affected by ASLR, and could be different on different hosts. The address
      does not uniquely identify a branch/target in the binary:
      
        $ perf script -F brstack,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
        0x7f967139b6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0
      
      Using the existing "brstacksym,dso" is a little better, because the
      symbol plus offset and dso name *does* uniquely identify a branch/target
      in the binary.  Ultimately, however, AutoFDO wants a simple offset into
      the binary, so we'd have to undo all the work perf did to symbolize in
      the first place:
      
        $ perf script -F brstacksym,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
        do_nothing+0x5(/tmp/burncpu/dso.so)/main+0x44(/tmp/burncpu/exe)/P/-/-/0
      
      With the new "brstackoff,dso" we get what we need: a simple offset into a
      specific dso/binary that uniquely identifies a branch/target:
        $ perf script -F brstackoff,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
        0x6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0
      Signed-off-by: NMark Santaniello <marksan@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170619163825.2012979-2-marksan@fb.com
      [ Updated documentation about 'brstackoff' using text from above ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      106dacd8
    • M
      perf script: Support -F brstack,dso and brstacksym,dso · 55b9b508
      Mark Santaniello 提交于
      Perf script can report the dso for "addr" and "ip" fields.
      
      This adds the same support for the "brstack" and "brstacksym" fields.
      This can be helpful for AutoFDO: we can ignore LBR entries unless the
      source and target address are both in the target module we are about to
      build.
      
      I built a small test akin to "while(1) { do_nothing(); }" where the
      do_nothing function is loaded from a dso:
      
        $ cat burncpu.cpp
        #include <dlfcn.h>
      
        int main() {
          void* handle = dlopen("./dso.so", RTLD_LAZY);
          if (!handle) return -1;
      
          typedef void (*fp)();
          fp do_nothing = (fp) dlsym(handle, "do_nothing");
      
          while(1) {
            do_nothing();
          }
        }
      
        $ cat dso.cpp
        extern "C" void do_nothing() {}
      
        $ cat build.sh
        #!/bin/bash
        g++ -shared dso.cpp -o dso.so
        g++ burncpu.cpp -o burncpu -ldl
      
      I sampled the execution with perf record -b.  Using the new perf script
      functionality I can easily find cases where there was a transition from one
      dso to another:
      
        $ perf record -a -b -- sleep 5
        [ perf record: Woken up 55 times to write data ]
        [ perf record: Captured and wrote 18.815 MB perf.data (43593 samples) ]
      
        $ perf script -F brstack,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
        0x7f967139b6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0
      
        $ perf script -F brstacksym,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1
        do_nothing+0x5(/tmp/burncpu/dso.so)/main+0x44(/tmp/burncpu/exe)/P/-/-/0
      Signed-off-by: NMark Santaniello <marksan@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170619163825.2012979-1-marksan@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      55b9b508
    • W
      perf test llvm: Avoid error when PROFILE_ALL_BRANCHES is set · 9b57fb7e
      Wang Nan 提交于
      The 'if' keyword is a define that expands to complex code when
      CONFIG_PROFILE_ALL_BRANCHES is selected, which causes a 'perf test LLVM'
      failure like:
      
        $ ./perf test LLVM
        35: LLVM search and compile                    :
        35.1: Basic BPF llvm compile                    : Ok
        35.2: kbuild searching                          : Ok
        35.3: Compile source for BPF prologue generation: FAILED!
        35.4: Compile source for BPF relocation         : Skip
      
      The only affected test case is bpf-script-test-prologue.c
      because it uses kernel headers and has 'if' inside.
      
      This patch undefines 'if' to make it passes perf test.
      
      More detailed analysis from a message in this thread, also by Wang:
      
      The problem is caused by following relocation information:
      
        $ readelf -a ./llvmsubtest3
        ...
           [ 5] _ftrace_branch    PROGBITS         0000000000000000  00000260
                00000000000000a0  0000000000000000  WA       0     0     4
        ...
        Relocation section '.relfunc=null_lseek file->f_mode offset orig' at
        offset 0x490 contains 4 entries:
           Offset          Info           Type           Sym. Value    Sym. Name
        000000000038  000b00000001 unrecognized: 1       0000000000000000 _ftrace_branch
        0000000000b0  000b00000001 unrecognized: 1       0000000000000000 _ftrace_branch
        000000000128  000b00000001 unrecognized: 1       0000000000000000 _ftrace_branch
        0000000001c0  000b00000001 unrecognized: 1       0000000000000000 _ftrace_branch
      
        Relocation section '.rel_ftrace_branch' at offset 0x4d0 contains 8 entries:
           Offset          Info           Type           Sym. Value    Sym. Name
        000000000000  000200000001 unrecognized: 1       0000000000000000 .L__func__.bpf_func__n
        000000000008  000100000001 unrecognized: 1       0000000000000015 .L.str
        000000000028  000200000001 unrecognized: 1       0000000000000000 .L__func__.bpf_func__n
        000000000030  000100000001 unrecognized: 1       0000000000000015 .L.str
        000000000050  000200000001 unrecognized: 1       0000000000000000 .L__func__.bpf_func__n
        000000000058  000100000001 unrecognized: 1       0000000000000015 .L.str
        000000000078  000200000001 unrecognized: 1       0000000000000000 .L__func__.bpf_func__n
        000000000080  000100000001 unrecognized: 1       0000000000000015 .L.str
        ...
      
      So I think the failure is because you enabled CONFIG_PROFILE_ALL_BRANCHES.
      
      I can reproduce your buggy result by selecting
      CONFIG_PROFILE_ALL_BRANCHES in my kbuild:
      
        $ ./perf test LLVM
        35: LLVM search and compile                    :
        35.1: Basic BPF llvm compile                    : Ok
        35.2: kbuild searching                          : Ok
        35.3: Compile source for BPF prologue generation: FAILED!
        35.4: Compile source for BPF relocation         : Skip
      
      Simply undef CONFIG_PROFILE_ALL_BRANCHES in clang opts not working
      because it is introduced by "#include <uapi/linux/fs.h>", which override
      cmdline options. So I think the best way is to undefine 'if' inside BPF
      script.
      Reported-and-Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/20170620183203.2517-1-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9b57fb7e
    • J
      perf annotate: Return arch from symbol__disassemble() and save it in browser · dcaa3948
      Jin Yao 提交于
      In annotate browser, we will add support to check fused instructions.
      While this is x86-specific feature so we need the annotate browser to
      know what the arch it runs on.
      
      symbol__disassemble() has figured out the arch. This patch just lets the
      arch return from symbol__disassemble and save the arch in annotate
      browser.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1497840958-4759-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dcaa3948
    • K
      perf intel-pt/bts: Remove unused SAMPLE_SIZE defines and bts priv array · d3cef7fe
      Kim Phillips 提交于
      These defines were probably dragged in from sampling support in earlier
      patches.  They can be put back when needed.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170616112339.3fb6986e4ff33e353008244b@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d3cef7fe
    • K
      perf coresight: Remove superfluous check before use · 0c788d47
      Kim Phillips 提交于
      The cs_etm_evsel variable is guaranteed to be set at this point in
      cs_etm_recording_options().
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20170615125521.80cc128dc856bc1f2e61b730@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0c788d47
    • A
      tools: Adopt __aligned from kernel sources · 5c97cac6
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to ask the compiler to use a specific
      alignment, making tools/ look more like kernel source code.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-8jiem6ubg9rlpbs7c2p900no@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5c97cac6
    • A
      tools: Adopt __packed from kernel sources · c9f5da74
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to ask the compiler to not insert alignment
      paddings in a struct, making tools/ look more like kernel source code.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-byp46nr7hsxvvyc9oupfb40q@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c9f5da74
    • A
      tools: Adopt noinline from kernel sources · 9dd4ca47
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to ask the compiler not to inline a function
      and to make tools/ source code look like kernel code.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-bis4pqxegt6gbm5dlqs937tn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9dd4ca47
    • A
      perf tools: Use __maybe_unused consistently · 0353631a
      Arnaldo Carvalho de Melo 提交于
      Instead of defining __unused or redefining __maybe_unused.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-4eleto5pih31jw1q4dypm9pf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0353631a
    • A
      tools: Adopt __scanf from kernel sources · 3ee350fb
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to ask the compiler to perform scanf like
      argument validation.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-yzqrhfjrn26lqqtwf55egg0h@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ee350fb
    • A
      tools: Adopt __printf from kernel sources · afaed6d3
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to ask the compiler to perform printf like
      vargargs validation.
      
      v2: Fixed up build on arm, squashing a patch by Kim Phillips, thanks!
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-dopkqmmuqs04cxzql0024nnu@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      afaed6d3
    • A
      tools: Adopt __noreturn from kernel sources · 6c346643
      Arnaldo Carvalho de Melo 提交于
      To have a more compact way to specify that a function doesn't return,
      instead of the open coded:
      
      	__attribute__((noreturn))
      
      And use it instead of the tools/perf/ specific variation, NORETURN.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-l0y144qzixcy5t4c6i7pdiqj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6c346643
    • A
      perf script: Allow adding and removing fields · 36ce5651
      Andi Kleen 提交于
      With 'perf script' it is common that we just want to add or remove a field.
      
      Currently this requires figuring out the long list of default fields and
      specifying them first, and then adding/removing the new field.
      
      This patch adds a new + - syntax to merely add or remove fields,
      that allows more succint and clearer command lines
      
      For example to remove the comm field from PMU samples:
      
      Previously
      
        $ perf script -F tid,cpu,time,event,sym,ip,dso,period | head -1
        swapper  0 [000] 504345.383126:          1 cycles:  ffffffff90060c66 native_write_msr ([kernel.kallsyms])
      
      with the new syntax
      
        perf script -F -comm | head -1
        0 [000] 504345.383126:          1 cycles:  ffffffff90060c66 native_write_msr ([kernel.kallsyms])
      
      The new syntax cannot be mixed with normal overriding.
      
      v2: Fix example in description. Use tid vs pid. No functional changes.
      v3: Don't skip initialization when user specified explicit type.
      v4: Rebase. Remove empty line.
      
      Committer testing:
      
        # perf record -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.748 MB perf.data (14 samples) ]
      
      Without a explicit field list specified via -F, defaults to:
      
        # perf script | head -2
            perf 6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
         swapper    0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
        #
      
      Which is equivalent to:
      
        # perf script -F comm,tid,cpu,time,period,event,ip,sym,dso | head -2
            perf 6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
         swapper    0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
        #
      
      So if we want to remove the comm, as in your original example, we would have to
      figure out the default field list and remove ' comm' from it:
      
        # perf script -F tid,cpu,time,period,event,ip,sym,dso | head -2
         6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
            0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
        #
      
      With your patch this becomes simpler, one can remove fields by prefixing them
      with '-':
      
        # perf script -F -comm | head -2
        6338 [000] 18467.058607: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
           0 [001] 18467.058617: 1 cycles: ffffffff89060c36 native_write_msr (/lib/modules/4.11.0-rc8+/build/vmlinux)
        #
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Link: http://lkml.kernel.org/r/20170602154810.15875-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      36ce5651
    • T
      perf config: Invert an if statement to reduce nesting in cmd_config() · 8c1cedb4
      Taeung Song 提交于
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1494241650-32210-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8c1cedb4
    • J
      perf annotate browser: Display titles in left frame · ec27ae18
      Jin Yao 提交于
      The annotate browser is divided into 2 frames. Left frame contains 3
      columns (some platforms only have one column).
      
      For example:
      
                         │26  int compute_flag()
                         │27  {
       22.80  1.20       │      sub    $0x8,%rsp
                         │25          int i;
                         │
                         │27          i = rand() % 2;
       22.78  1.20     1 │    → callq  rand@plt
      
      While it's hard for user to understand what the data is.
      
      This patch adds the titles "Percent", "IPC" and "Cycle" on columns.
      
      Percent  IPC Cycle │
                         │25  __attribute__((noinline))
                         │26  int compute_flag()
                         │27  {
       22.80  1.20       │      sub    $0x8,%rsp
                         │25          int i;
                         │
                         │27          i = rand() % 2;
       22.78  1.20     1 │    → callq  rand@plt
      
      The titles are displayed at row 0 of annotate browser if row 0 doesn't
      have values of percent, ipc and cycle.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1493909895-9668-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec27ae18
    • J
      perf report: Remove unnecessary check in annotate_browser_write() · c564f0db
      Jin Yao 提交于
      In annotate_browser_write(),
      
              if (dl->offset != -1 && percent_max != 0.0) {
                      if (percent_max != 0.0) {
      			...
                      }
                      ...
              }
      
      The second check of (percent_max != 0.0) is not necessary, remove it.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1493909895-9668-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c564f0db
  2. 17 6月, 2017 1 次提交
    • M
      perf unwind: Report module before querying isactivation in dwfl unwind · 9126cbba
      Milian Wolff 提交于
      The PC returned by dwfl_frame_pc() may map into a not-yet-reported
      module. We have to report it before we continue unwinding. But when we
      query for the isactivation flag in dwfl_frame_pc, libdw will actually do
      one more unwinding step internally which can then break and lead to
      missed frames or broken stacks.
      
      With libunwind we get e.g.:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
        heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	           f5a1c QGuiApplicationPrivate::createPlatformIntegration (/usr/lib/libQt5Gui.so.5.8.0)
      	           f650c QGuiApplicationPrivate::createEventDispatcher (/usr/lib/libQt5Gui.so.5.8.0)
      	          298524 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      ~~~~~
      
      Note the two frames 1589e8 and 78622 in the first sample. These are
      missing when unwinding with libdw. The second sample's breakage is
      more obvious:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
      heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	          723dbf [unknown] ([unknown])
      ~~~~~
      
      This patch fixes this issue and the libdw unwinder mimicks the libunwind
      behavior more closely.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9126cbba
  3. 15 6月, 2017 2 次提交
    • J
      perf tools: Fix build with ARCH=x86_64 · 7a759cd8
      Jiada Wang 提交于
      With commit: 0a943cb1 (tools build: Add HOSTARCH Makefile variable)
      when building for ARCH=x86_64, ARCH=x86_64 is passed to perf instead of
      ARCH=x86, so the perf build process searchs header files from
      tools/arch/x86_64/include, which doesn't exist.
      
      The following build failure is seen:
      
        In file included from util/event.c:2:0:
          tools/include/uapi/linux/mman.h:4:27: fatal error: uapi/asm/mman.h: No such file or directory
          compilation terminated.
      
      Fix this issue by using SRCARCH instead of ARCH in perf, just like the
      main kernel Makefile and tools/objtool's.
      Signed-off-by: NJiada Wang <jiada_wang@mentor.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Rui Teng <rui.teng@linux.vnet.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 0a943cb1 ("tools build: Add HOSTARCH Makefile variable")
      Link: http://lkml.kernel.org/r/1491793357-14977-2-git-send-email-jiada_wang@mentor.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a759cd8
    • A
      perf evsel: Fix probing of precise_ip level for default cycles event · 7a1ac110
      Arnaldo Carvalho de Melo 提交于
      Since commit 18e7a45a ("perf/x86: Reject non sampling events with
      precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
      with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
      in the routine used to probe the max precise level when no events were
      passed to 'perf record' or 'perf top', i.e.:
      
      	perf_evsel__new_cycles()
      		perf_event_attr__set_max_precise_ip()
      
      The x86 code, in x86_pmu_hw_config(), which is called all the way from
      sys_perf_event_open() did, starting with the aforementioned commit:
      
                      /* There's no sense in having PEBS for non sampling events: */
                      if (!is_sampling_event(event))
                              return -EINVAL;
      
      Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
      just the non precise cycles variant.
      
      To make sure that this is the case, I tested it, before this patch,
      with:
      
        # perf probe -L x86_pmu_hw_config
        <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
              0  int x86_pmu_hw_config(struct perf_event *event)
              1  {
              2         if (event->attr.precise_ip) {
      <SNIP>
             17                 if (event->attr.precise_ip > precise)
             18                         return -EOPNOTSUPP;
      
                                /* There's no sense in having PEBS for non sampling events: */
             21                 if (!is_sampling_event(event))
             22                         return -EINVAL;
                        }
      <SNIP>
        # perf probe x86_pmu_hw_config:22
        Added new events:
          probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
          probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
      
        You can now use it in all perf tools, such as:
      
              perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
      
        # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
           0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
          41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
          41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
          41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
          41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
        #
      
      I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
      
      So fix it by just setting attr.sample_period
      
      Now, after this patch:
      
        # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
           0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_open_cloexec_flag (/home/acme/bin/perf)
           0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
           0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
        [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
        #
      
      The probe one called from perf_event_attr__set_max_precise_ip() works
      the first time, with attr.precise_ip = 3, wit hthe next ones being the
      per cpu ones for the cycles:ppp event.
      
      And here is the text from a report and alternative proposed patch by
      Thomas-Mich Richter:
      
       ---
      
      On s390 the counter and sampling facility do not support a precise IP
      skid level and sometimes returns EOPNOTSUPP when structure member
      precise_ip in struct perf_event_attr is not set to zero.
      
      On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
      happens only when no events are specified on command line.
      
      The functions called are
      ...
        --> perf_evlist__add_default
            --> perf_evsel__new_cycles
                --> perf_event_attr__set_max_precise_ip
      
      The last function determines the value of structure member precise_ip by
      invoking the perf_event_open() system call and checking the return code.
      The first successful open is the value for precise_ip.
      
      However the value is determined without setting member sample_period and
      indicates no sampling.
      
      On s390 the counter facility and sampling facility are different.  The
      above procedure determines a precise_ip value of 3 using the counter
      facility. Later it uses the sampling facility with a value of 3 and
      fails with EOPNOTSUPP.
      
       ---
      
      v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
          of unnamed union members in the container struct initialization, so
          move from:
      
      	struct perf_event_attr attr = {
      		...
      		.sample_period = 1,
      	};
      
      to right after it as:
      
      	struct perf_event_attr attr = {
      		...
      	};
      
      	attr.sample_period = 1;
      
      v3: We need to reset .sample_period to 0 to let the users of
      perf_evsel__new_cycles() to properly setup attr.sample_period or
      attr.sample_freq. Reported by Ingo Molnar.
      Reported-and-Acked-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
      Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a1ac110
  4. 09 6月, 2017 9 次提交
  5. 08 6月, 2017 6 次提交
  6. 06 6月, 2017 2 次提交
    • M
      perf report: Ensure the perf DSO mapping matches what libdw sees · 2538b9e2
      Milian Wolff 提交于
      In some situations the libdw unwinder stopped working properly.  I.e.
      with libunwind we see:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	           608f3 _GLOBAL__sub_I_kdynamicjobtracker.cpp (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	            f199 call_init.part.0 (/usr/lib/ld-2.25.so)
      	            f2a5 _dl_init (/usr/lib/ld-2.25.so)
      	             db9 _dl_start_user (/usr/lib/ld-2.25.so)
      ~~~~~
      
      But with libdw and without this patch this sample is not properly
      unwound:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      ~~~~~
      
      Debug output showed me that libdw found a module for the last frame
      address, but it thinks it belongs to /usr/lib/ld-2.25.so. This patch
      double-checks what libdw sees and what perf knows. If the mappings
      mismatch, we now report the elf known to perf. This fixes the situation
      above, and the libdw unwinder produces the same stack as libunwind.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2538b9e2
    • M
      perf report: Include partial stacks unwound with libdw · 5ea0416f
      Milian Wolff 提交于
      So far the whole stack was thrown away when any error occurred before
      the maximum stack depth was unwound. This is actually a very common
      scenario though. The stacks that got unwound so far are still
      interesting. This removes a large chunk of differences when comparing
      perf script output for libunwind and libdw perf unwinding.
      
      E.g. with libunwind:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
                         192ca mmap64 (/usr/lib/ld-2.25.so)
                          59a9 _dl_map_object_from_fd (/usr/lib/ld-2.25.so)
                          83d0 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
                         1a3e0 strcmp (/usr/lib/ld-2.25.so)
                          82b2 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      ~~~~~
      
      With libdw without this patch:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
      ~~~~~
      
      With this patch applied, the libdw unwinder will produce the same
      output as the libunwind unwinder.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170601210021.20046-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5ea0416f