1. 12 4月, 2016 6 次提交
    • A
      perf evsel: Introduce fprintf_callchain() method out of fprintf_sym() · ea453965
      Arnaldo Carvalho de Melo 提交于
      In 'perf trace' we're just interested in printing callchains, and we
      don't want to use the symbol_conf.use_callchain, so move the callchain
      part to a new method.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kcn3romzivcpxb3u75s9nz33@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea453965
    • A
      perf evsel: Rename print_ip() to fprintf_sym() · ff0c1078
      Arnaldo Carvalho de Melo 提交于
      As it receives a FILE, and its more than just the IP, which can even be
      requested not to be printed.
      
      For consistency with other similar methods in tools/perf/, name it as
      perf_evsel__fprintf_sym() and make it return the number of bytes
      printed, just like 'fprintf(3)'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-84gawlqa3lhk63nf0t9vnqnn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff0c1078
    • A
      perf evsel: Allow passing a left alignment when printing a symbol · db3617f3
      Arnaldo Carvalho de Melo 提交于
      For callchains, etc where we want it to align just below the syscall
      name, for instance, in 'perf trace'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-uk9ekchd67651c625ltaur5y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db3617f3
    • M
      perf evsel: Allow specifying a file to output in perf_evsel__print_ip · 6186de9a
      Milian Wolff 提交于
      As this function will be used in 'perf trace'.
      
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/n/tip-8x297v9utnxq77onikevvlse@git.kernel.org
      [ Split from a larger patch ]
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      6186de9a
    • W
      perf bpf: Automatically create bpf-output event __bpf_stdout__ · 72c08098
      Wang Nan 提交于
      This patch removes the need to set a bpf-output event in cmdline.  By
      referencing a map named '__bpf_stdout__', perf automatically creates an
      event for it.
      
      For example:
      
        # perf record -e ./test_bpf_trace.c usleep 100000
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
        # perf script
                 usleep  4639 [000] 261895.307826:        0            __bpf_stdout__:  ffffffff810eb9a1 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
                 usleep  4639 [000] 261895.407883:        0            __bpf_stdout__:  ffffffff8105d609 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
        perf record -e ./test_bpf_trace.c usleep 100000
      
        equals to:
      
        perf record -e bpf-output/no-inherit=1,name=__bpf_stdout__/ \
                    -e ./test_bpf_trace.c/map:__bpf_stdout__.event=__bpf_stdout__/ \
                    usleep 100000
      
      Where test_bpf_trace.c is:
      
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
               unsigned int type;
               unsigned int key_size;
               unsigned int value_size;
               unsigned int max_entries;
        };
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
               (void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
               (void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
               (void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
               (void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") __bpf_stdout__ = {
               .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
               .key_size = sizeof(int),
               .value_size = sizeof(u32),
               .max_entries = __NR_CPUS__,
        };
      
        static inline int __attribute__((always_inline))
        func(void *ctx, int type)
        {
      	char output_str[] = "Raise a BPF event!";
      	char err_str[] = "BAD %d\n";
      	int err;
      
              err = perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(),
      			        &output_str, sizeof(output_str));
      	if (err)
      		trace_printk(err_str, sizeof(err_str), err);
              return 1;
        }
        SEC("func_begin=sys_nanosleep")
        int func_begin(void *ctx) {return func(ctx, 1);}
        SEC("func_end=sys_nanosleep%return")
        int func_end(void *ctx) { return func(ctx, 2);}
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Committer note:
      
      Testing with 'perf trace':
      
        # trace -e nanosleep --ev test_bpf_stdout.c usleep 1
           0.007 ( 0.007 ms): usleep/729 nanosleep(rqtp: 0x7ffc5bbc5fe0) ...
           0.007 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.008 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.069 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.070 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.072 ( 0.072 ms): usleep/729  ... [continued]: nanosleep()) = 0
        #
      Suggested-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-5-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      72c08098
    • W
      perf bpf: Clone bpf stdout events in multiple bpf scripts · d7888573
      Wang Nan 提交于
      This patch allows cloning bpf-output event configuration among multiple
      bpf scripts. If there exist a map named '__bpf_output__' and not
      configured using 'map:__bpf_output__.event=', this patch clones the
      configuration of another '__bpf_stdout__' map. For example, following
      command:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/ \
                     --ev ./test_bpf_trace2.c usleep 100000
      
      equals to:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/  \
                     --ev ./test_bpf_trace2.c/map:__bpf_stdout__.event=evt/ \
                     usleep 100000
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7888573
  2. 08 4月, 2016 9 次提交
    • A
      perf tools: Use readdir() instead of deprecated readdir_r() · bfc279f3
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when parsing tracepoint event definitions, to
      avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
      instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bfc279f3
    • A
      perf tools: Use readdir() instead of deprecated readdir_r() · 7093b4c9
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when synthesizing events for pre-existing threads
      by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
      (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
         CC       /tmp/build/perf/util/event.o
        util/event.c: In function '__event__synthesize_thread':
        util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
          while (!readdir_r(tasks, &dirent, &next) && next) {
          ^~~~~
        In file included from /usr/include/features.h:368:0,
                         from /usr/include/stdint.h:25,
                         from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
                         from /git/linux/tools/include/linux/types.h:6,
                         from util/event.c:1:
        /usr/include/dirent.h:189:12: note: declared here
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7093b4c9
    • A
      perf thread_map: Use readdir() instead of deprecated readdir_r() · 3354cf71
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in thread_map, so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3354cf71
    • W
      perf symbols: Adjust symbol for shared objects · 99e87f7b
      Wang Nan 提交于
      He Kuang reported a problem that perf fails to get correct symbol on
      Android platform in [1]. The problem can be reproduced on normal x86_64
      platform. I will describe the reproducing steps in detail at the end of
      commit message.
      
      The reason of this problem is the missing of symbol adjustment for normal
      shared objects. In most of the cases skipping adjustment is okay. However,
      when '.text' section have different 'address' and 'offset' the result is wrong.
      I checked all shared objects in my working platform, only wine dll objects and
      debug objects (in .debug) have this problem. However, it is common on Android.
      For example:
      
       $ readelf -S ./libsurfaceflinger.so | grep \.text
         [10] .text             PROGBITS         0000000000029030  00012030
      
      This patch enables symbol adjustment for dynamic objects so the symbol
      address got from elfutils would be adjusted correctly.
      
      Now nearly all types of ELF files should adjust symbols. Makes
      ss->adjust_symbols default to true.
      
      Steps to reproduce the problem:
      
        $ cat ./Makefile
        PWD := $(shell pwd)
        LDFLAGS += "-Wl,-rpath=$(PWD)"
        CFLAGS += -g
        main: main.c libbuggy.so
        libbuggy.so: buggy.c
      	gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
        clean:
      	rm -rf main libbuggy.so *.o
      
        $ cat ./buggy.c
        int fib(int x)
        {
            return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
        }
      
        $ cat ./main.c
        #include <stdio.h>
      
        extern int fib(int x);
        int main()
        {
           int i;
      
           for (i = 0; i < 40; i++)
               printf("%d\n", fib(i));
           return 0;
       }
      
       $ make
       $ perf record ./main
       ...
       $ perf report --stdio
       # Overhead  Command  Shared Object      Symbol
       # ........  .......  .................  ...............................
       #
           14.97%  main     libbuggy.so        [.] 0x000000000000066c
            8.68%  main     libbuggy.so        [.] 0x00000000000006aa
            8.52%  main     libbuggy.so        [.] fib@plt
            7.95%  main     libbuggy.so        [.] 0x0000000000000664
            5.94%  main     libbuggy.so        [.] 0x00000000000006a9
            5.35%  main     libbuggy.so        [.] 0x0000000000000678
       ...
      
      The correct result should be (after this patch):
      
        # Overhead  Command  Shared Object      Symbol
        # ........  .......  .................  ...............................
        #
            91.47%  main     libbuggy.so        [.] fib
             8.52%  main     libbuggy.so        [.] fib@plt
             0.00%  main     [kernel.kallsyms]  [k] kmem_cache_free
      
      [1] http://lkml.kernel.org/g/1452567507-54013-1-git-send-email-hekuang@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99e87f7b
    • W
      perf symbols: Record text offset in dso to calculate objdump address · a58f7033
      Wang Nan 提交于
      In this patch, the offset of '.text' section is stored into dso
      and used here to re-calculate address to objdump.
      
      In most of the cases, executable code is in '.text' section, so the
      adjustment made to a symbol in dso__load_sym (using
      sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
      'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
      get objdump address from symbol address (rip). However, it is not true
      for kernel and kernel module since there could be multiple executable
      sections with different offset. Exclude kernel for this reason.
      
      After this patch, even dso->adjust_symbols is set to true for shared
      objects, map__rip_2objdump() and map__objdump_2mem() would return
      correct result, so perf behavior of annotate won't be changed.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-2-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a58f7033
    • A
      perf tools: Build syscall table .c header from kernel's syscall_64.tbl · 1b700c99
      Arnaldo Carvalho de Melo 提交于
      We used libaudit to map ids to syscall names and vice-versa, but that
      imposes a delay in supporting new syscalls, having to wait for libaudit
      to get those new syscalls on its tables.
      
      To remove that delay, for x86_64 initially, grab a copy of
      arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
      tables.
      
      Syscalls currently not available in audit-libs:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
        Error:	Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
        Hint:	try 'perf list syscalls:sys_enter_*'
        Hint:	and: 'man syscalls'
        #
      
      With this patch:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
          8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
          8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
         30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096
         31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
         31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
      [ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b700c99
    • A
      perf tools: Allow generating per-arch syscall table arrays · 5af56fab
      Arnaldo Carvalho de Melo 提交于
      Tools should use a mechanism similar to arch/x86/entry/syscalls/ to
      generate a header file with the definitions for two variables:
      
        static const char *syscalltbl_x86_64[] = {
      	[0] = "read",
      	[1] = "write",
        <SNIP>
      	[324] = "membarrier",
      	[325] = "mlock2",
      	[326] = "copy_file_range",
        };
        static const int syscalltbl_x86_64_max_id = 326;
      
      In a per arch file that should then be included in
      tools/perf/util/syscalltbl.c.
      
      First one will be for x86_64.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-02uuamkxgccczdth8komspgp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af56fab
    • A
      perf trace: Move syscall table id <-> name routines to separate class · fd0db102
      Arnaldo Carvalho de Melo 提交于
      We're using libaudit for doing name to id and id to syscall name
      translations, but that makes 'perf trace' to have to wait for newer
      libaudit versions supporting recently added syscalls, such as
      "userfaultfd" at the time of this changeset.
      
      We have all the information right there, in the kernel sources, so move
      this code to a separate place, wrapped behind functions that will
      progressively use the kernel source files to extract the syscall table
      for use in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd0db102
    • J
      perf tools: Add dedicated unwind addr_space member into thread struct · e583d70c
      Jiri Olsa 提交于
      Milian reported issue with thread::priv, which was double booked by perf
      trace and DWARF unwind code. So using those together is impossible at
      the moment.
      
      Moving DWARF unwind private data into separate variable so perf trace
      can keep using thread::priv.
      Reported-and-Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andreas Hollmann <hollmann@in.tum.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-2-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e583d70c
  3. 07 4月, 2016 1 次提交
  4. 06 4月, 2016 3 次提交
    • A
      perf script perl: Do error checking on new backtrace routine · 76e20522
      Arnaldo Carvalho de Melo 提交于
      This ended up triggering these warnings when building on Ubuntu 12.04.5:
      
        util/scripting-engines/trace-event-perl.c: In function 'perl_process_callchain':
        util/scripting-engines/trace-event-perl.c:293:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:294:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:295:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:297:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:309:4: error: value computed is not used [-Werror=unused-value]
        cc1: all warnings being treated as errors
        mv: cannot stat `/tmp/build/perf/util/scripting-engines/.trace-event-perl.o.tmp': No such file or directory
        make[4]: *** [/tmp/build/perf/util/scripting-engines/trace-event-perl.o] Error 1
      
      Fix it by doing error checking when building the perl data structures
      related to callchains.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Fixes: f7380c12 ("perf script perl: Perl scripts now get a backtrace, like the python ones")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      76e20522
    • A
      perf probe: Check if dwarf_getlocations() is available · bd0419e2
      Arnaldo Carvalho de Melo 提交于
      If not, tell the user that:
      
        config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
      
      And return -ENOTSUPP in die_get_var_range(), failing features that
      need it, like the one pointed out above.
      
      This fixes the build on older systems, such as Ubuntu 12.04.5.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Vinson Lee <vlee@freedesktop.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bd0419e2
    • V
      perf config: Fix build with older toolchain. · d8e28654
      Vinson Lee 提交于
      Fix build error on Ubuntu 12.04.5 with GCC 4.6.3.
      
          CC       util/config.o
        util/config.c: In function ‘perf_buildid_config’:
        util/config.c:384:15: error: declaration of ‘dirname’ shadows a global declaration [-Werror=shadow]
      Signed-off-by: NVinson Lee <vlee@freedesktop.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 9cb5987c ("perf config: Rework buildid_dir_command_config to perf_buildid_config")
      Link: http://lkml.kernel.org/r/1459807659-9020-1-git-send-email-vlee@freedesktop.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8e28654
  5. 02 4月, 2016 3 次提交
    • W
      perf bpf: Add sample types for 'bpf-output' event · d37ba880
      Wang Nan 提交于
      Before this patch we can see very large time in the events before the
      'bpf-output' event. For example:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0) ...
        18446744073709.551 (         ): evt:Raise a BPF event!..)
        179427791.076 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179427791.081 (         ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
        ...
      
      We can also see the differences between bpf-output events and
      breakpoint events:
      
      For bpf output event:
         sample_type                    IP|TID|RAW|IDENTIFIER
      
      For tracepoint events:
         sample_type                    IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
      
      This patch fix this differences by adding more sample type for
      bpf-output events.
      
      After this patch:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450) ...
        179877370.878 (         ): evt:Raise a BPF event!..)
        179877370.878 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179877370.882 (         ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
        179877370.945 (         ): evt:Raise a BPF event!..)
        ...
      
        # ./perf trace -vv -T --ev sched:sched_switch \
                              --ev bpf-output/no-inherit,name=evt/ \
                              --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                              usleep 10 2>&1 | grep sample_type
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
      
      The 'IDENTIFIER' info is not required because all events have the same
      sample_type.
      
      Committer notes:
      
      Further testing, on top of the changes making 'perf trace' avoid samples
      from events without PERF_SAMPLE_TIME:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
          0.560 ( 0.001 ms): brk(                                                   ) = 0x55e5a1df8000
          18446640227439.430 (18446640227438.859 ms): nanosleep(rqtp: 0x7ffc96643370) ...
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.576 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.645 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
          0.646 ( 0.076 ms):  ... [continued]: nanosleep()) = 0
        #
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
           0.292 ( 0.001 ms): brk(                          ) = 0x55c7cd6e1000
           0.302 ( 0.004 ms): nanosleep(rqtp: 0x7ffedd8bc0f0) ...
           0.302 (         ): evt:Raise a BPF event!..)
           0.303 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.397 (         ): evt:Raise a BPF event!..)
           0.397 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.398 ( 0.100 ms):  ... [continued]: nanosleep()) = 0
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Reported-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1459517202-42320-1-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d37ba880
    • K
      perf tools: Fix PMU term format max value calculation · ac0e2cd5
      Kan Liang 提交于
      Currently the max value of format is calculated by the bits number. It
      relies on the continuity of the format.
      
      However, uncore event format is not continuous. E.g. uncore qpi event
      format can be 0-7,21.
      
      If bit 21 is set, there is parsing issues as below.
      
        $ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
        event syntax error: '..pi_0/event=0x200002,umask=0x8/'
                                          \___ value too big for format, maximum is 511
      
      This patch return the real max value by setting all possible bits to 1.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1459365375-14285-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ac0e2cd5
    • A
      perf jit: Add support for using TSC as a timestamp · 2a28e230
      Adrian Hunter 提交于
      Intel PT uses TSC as a timestamp, so add support for using TSC instead
      of the monotonic clock.  Use of TSC is selected by an environment
      variable "JITDUMP_USE_ARCH_TIMESTAMP" and flagged in the jitdump file
      with flag JITDUMP_FLAGS_ARCH_TIMESTAMP.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457426330-30226-1-git-send-email-adrian.hunter@intel.com
      [ Added the fixup from He Kuang to make it build on other arches, ]
      [ such as aarch64, to avoid inserting this bisectiong breakage upstream ]
      Link: http://lkml.kernel.org/r/1459482572-129494-1-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a28e230
  6. 31 3月, 2016 2 次提交
  7. 30 3月, 2016 6 次提交
  8. 24 3月, 2016 7 次提交
  9. 23 3月, 2016 3 次提交