1. 14 4月, 2016 1 次提交
    • A
      perf trace: Add seccomp beautifier related defines for older systems · 6fb35b95
      Arnaldo Carvalho de Melo 提交于
      Were the detached tarball (make perf-tar-src-pkg) build was failing because
      those definitions aren't available in the system headers.
      
      On RHEL7, for instance:
      
        builtin-trace.c: In function ‘syscall_arg__scnprintf_seccomp_op’:
        builtin-trace.c:1069:7: error: ‘SECCOMP_SET_MODE_STRICT’ undeclared (first use in this function)
          P_SECCOMP_SET_MODE_OP(STRICT);
               ^
        builtin-trace.c:1069:7: note: each undeclared identifier is reported only once for each function it appears in
        builtin-trace.c:1070:7: error: ‘SECCOMP_SET_MODE_FILTER’ undeclared (first use in this function)
          P_SECCOMP_SET_MODE_OP(FILTER);
               ^
        builtin-trace.c: In function ‘syscall_arg__scnprintf_seccomp_flags’:
        builtin-trace.c:1091:14: error: ‘SECCOMP_FILTER_FLAG_TSYNC’ undeclared (first use in this function)
          P_FLAG(TSYNC);
                      ^
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-4f8dzzwd7g6l5dzz693u7kul@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6fb35b95
  2. 13 4月, 2016 11 次提交
  3. 12 4月, 2016 14 次提交
    • A
      perf trace: Print unresolved symbol names as addresses · 00768a2b
      Arnaldo Carvalho de Melo 提交于
      Instead of having "[unknown]" as the name used for unresolved symbols,
      use the address in the callchain, in hexadecimal form:
      
        28.801 ( 0.007 ms): qemu-system-x8/10065 ppoll(ufds: 0x55c98b39e400, nfds: 72, tsp: 0x7fffe4e4fe60, sigsetsize: 8) = 0 Timeout
                                           ppoll+0x91 (/usr/lib64/libc-2.22.so)
                                           [0x337309] (/usr/bin/qemu-system-x86_64)
                                           [0x336ab4] (/usr/bin/qemu-system-x86_64)
                                           main+0x1724 (/usr/bin/qemu-system-x86_64)
                                           __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
                                           [0xc59a9] (/usr/bin/qemu-system-x86_64)
        35.265 (14.805 ms): gnome-shell/2287  ... [continued]: poll()) = 1
                                           [0xf6fdd] (/usr/lib64/libc-2.22.so)
                                           g_main_context_iterate.isra.29+0x17c (/usr/lib64/libglib-2.0.so.0.4600.2)
                                           g_main_loop_run+0xc2 (/usr/lib64/libglib-2.0.so.0.4600.2)
                                           meta_run+0x2c (/usr/lib64/libmutter.so.0.0.0)
                                           main+0x3f7 (/usr/bin/gnome-shell)
                                           __libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
                                           [0x2909] (/usr/bin/gnome-shell)
      Suggested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      00768a2b
    • A
      perf evsel: Allow unresolved symbol names to be printed as addresses · fd4be130
      Arnaldo Carvalho de Melo 提交于
      The fprintf_sym() and fprintf_callchain() methods now allow users to
      change the existing behaviour of showing "[unknown]" as the name of
      unresolved symbols to instead show "[0x123456]", i.e. its address.
      
      The current patch doesn't change tools to use this facility, the results
      from 'perf trace' and 'perf script' cotinue like:
      
      70.109 ( 0.001 ms): qemu-system-x8/10153 poll(ufds: 0x7f2d93ffe870, nfds: 1) = 0 Timeout
                                         [unknown] (/usr/lib64/libc-2.22.so)
                                         [unknown] (/usr/lib64/libspice-server.so.1.10.0)
                                         [unknown] (/usr/lib64/libspice-server.so.1.10.0)
                                         [unknown] (/usr/lib64/libspice-server.so.1.10.0)
                                         start_thread+0xca (/usr/lib64/libpthread-2.22.so)
                                         __clone+0x6d (/usr/lib64/libc-2.22.so)
      
      The next patch will make 'perf trace' use the new formatting.
      Suggested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd4be130
    • A
      perf trace: Make "--call-graph" affect just "raw_syscalls:sys_exit" · fde54b78
      Arnaldo Carvalho de Melo 提交于
      We don't need the callchains at the syscall enter tracepoint, just when
      finishing it at syscall exit, so reduce the overhead by asking for
      callchains just at syscall exit.
      Suggested-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fde54b78
    • A
      perf evsel: Rename config_callgraph() to config_callchain() and make it public · 01e0d50c
      Arnaldo Carvalho de Melo 提交于
      The rename is for consistency with the parameter name.
      
      Make it public for fine grained control of which evsels should have
      callchains enabled, like, for instance, will be done in the next
      changesets in 'perf trace', to enable callchains just on the
      "raw_syscalls:sys_exit" tracepoint.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-og8vup111rn357g4yagus3ao@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01e0d50c
    • A
      perf evlist: Add (reset,set)_sample_bit methods · 22c8a376
      Arnaldo Carvalho de Melo 提交于
      For fiddling with sample_type fields in all evsels in an evlist.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-dg6yavctt0hzl2tsgfb43qsr@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      22c8a376
    • A
      perf evsel: Do not use globals in config() · e68ae9cf
      Arnaldo Carvalho de Melo 提交于
      Instead receive a callchain_param pointer to configure callchain
      aspects, not doing so if NULL is passed.
      
      This will allow fine grained control over which evsels in an evlist
      gets callchains enabled.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-2mupip6khc92mh5x4nw9to82@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e68ae9cf
    • A
      perf trace: Exclude the kernel part of the callchain leading to a syscall · 44621819
      Arnaldo Carvalho de Melo 提交于
      The kernel parts are not that useful:
      
        # trace -m 512 -e nanosleep --call dwarf  usleep 1
           0.065 ( 0.065 ms): usleep/18732 nanosleep(rqtp: 0x7ffc4ee4e200) = 0
                                             syscall_slow_exit_work ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             __nanosleep (/usr/lib64/libc-2.22.so)
                                             usleep (/usr/lib64/libc-2.22.so)
                                             main (/usr/bin/usleep)
                                             __libc_start_main (/usr/lib64/libc-2.22.so)
                                             _start (/usr/bin/usleep)
        #
      
      So lets just use perf_event_attr.exclude_callchain_kernel to avoid
      collecting it in the ring buffer:
      
        # trace -m 512 -e nanosleep --call dwarf  usleep 1
           0.063 ( 0.063 ms): usleep/19212 nanosleep(rqtp: 0x7ffc3df10fb0) = 0
                                             __nanosleep (/usr/lib64/libc-2.22.so)
                                             usleep (/usr/lib64/libc-2.22.so)
                                             main (/usr/bin/usleep)
                                             __libc_start_main (/usr/lib64/libc-2.22.so)
                                             _start (/usr/bin/usleep)
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-qctu3gqhpim0dfbcp9d86c91@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44621819
    • A
      perf evsel: Introduce fprintf_callchain() method out of fprintf_sym() · ea453965
      Arnaldo Carvalho de Melo 提交于
      In 'perf trace' we're just interested in printing callchains, and we
      don't want to use the symbol_conf.use_callchain, so move the callchain
      part to a new method.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kcn3romzivcpxb3u75s9nz33@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea453965
    • A
      perf evsel: Rename print_ip() to fprintf_sym() · ff0c1078
      Arnaldo Carvalho de Melo 提交于
      As it receives a FILE, and its more than just the IP, which can even be
      requested not to be printed.
      
      For consistency with other similar methods in tools/perf/, name it as
      perf_evsel__fprintf_sym() and make it return the number of bytes
      printed, just like 'fprintf(3)'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-84gawlqa3lhk63nf0t9vnqnn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff0c1078
    • M
      perf trace: Add support for printing call chains on sys_exit events. · 566a0885
      Milian Wolff 提交于
      Now, one can print the call chain for every encountered sys_exit event,
      e.g.:
      
          $ perf trace -e nanosleep --call-graph dwarf path/to/ex_sleep
          1005.757 (1000.090 ms): ex_sleep/13167 nanosleep(...) = 0
                                                   syscall_slow_exit_work ([kernel.kallsyms])
                                                   syscall_return_slowpath ([kernel.kallsyms])
                                                   int_ret_from_sys_call ([kernel.kallsyms])
                                                   __nanosleep (/usr/lib/libc-2.23.so)
                                                   [unknown] (/usr/lib/libQt5Core.so.5.6.0)
                                                   QThread::sleep (/usr/lib/libQt5Core.so.5.6.0)
                                                   main (path/to/ex_sleep)
                                                   __libc_start_main (/usr/lib/libc-2.23.so)
                                                   _start (path/to/ex_sleep)
      
      Note that it is advised to increase the number of mmap pages to prevent
      event losses when using this new feature. Often, adding `-m 10M` to the
      `perf trace` invocation is enough.
      
      This feature is also available in strace when built with libunwind via
      `strace -k`. Performance wise, this solution is much better:
      
          $ time find path/to/linux &> /dev/null
      
          real    0m0.051s
          user    0m0.013s
          sys     0m0.037s
      
          $ time perf trace -m 800M --call-graph dwarf find path/to/linux &> /dev/null
      
          real    0m2.624s
          user    0m1.203s
          sys     0m1.333s
      
          $ time strace -k find path/to/linux  &> /dev/null
      
          real    0m35.398s
          user    0m10.403s
          sys     0m23.173s
      
      Note that it is currently not possible to configure the print output.
      Adding such a feature, similar to what is available in `perf script` via
      its `--fields` knob can be added later on.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      LPU-Reference: 1460115255-17648-1-git-send-email-milian.wolff@kdab.com
      [ Split from a larger patch, do not print the IP, left align,
        remove dup call symbol__init(), added man page entry ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      566a0885
    • A
      perf evsel: Allow passing a left alignment when printing a symbol · db3617f3
      Arnaldo Carvalho de Melo 提交于
      For callchains, etc where we want it to align just below the syscall
      name, for instance, in 'perf trace'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-uk9ekchd67651c625ltaur5y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db3617f3
    • M
      perf evsel: Allow specifying a file to output in perf_evsel__print_ip · 6186de9a
      Milian Wolff 提交于
      As this function will be used in 'perf trace'.
      
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/n/tip-8x297v9utnxq77onikevvlse@git.kernel.org
      [ Split from a larger patch ]
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      6186de9a
    • W
      perf bpf: Automatically create bpf-output event __bpf_stdout__ · 72c08098
      Wang Nan 提交于
      This patch removes the need to set a bpf-output event in cmdline.  By
      referencing a map named '__bpf_stdout__', perf automatically creates an
      event for it.
      
      For example:
      
        # perf record -e ./test_bpf_trace.c usleep 100000
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
        # perf script
                 usleep  4639 [000] 261895.307826:        0            __bpf_stdout__:  ffffffff810eb9a1 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
                 usleep  4639 [000] 261895.407883:        0            __bpf_stdout__:  ffffffff8105d609 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
        perf record -e ./test_bpf_trace.c usleep 100000
      
        equals to:
      
        perf record -e bpf-output/no-inherit=1,name=__bpf_stdout__/ \
                    -e ./test_bpf_trace.c/map:__bpf_stdout__.event=__bpf_stdout__/ \
                    usleep 100000
      
      Where test_bpf_trace.c is:
      
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
               unsigned int type;
               unsigned int key_size;
               unsigned int value_size;
               unsigned int max_entries;
        };
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
               (void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
               (void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
               (void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
               (void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") __bpf_stdout__ = {
               .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
               .key_size = sizeof(int),
               .value_size = sizeof(u32),
               .max_entries = __NR_CPUS__,
        };
      
        static inline int __attribute__((always_inline))
        func(void *ctx, int type)
        {
      	char output_str[] = "Raise a BPF event!";
      	char err_str[] = "BAD %d\n";
      	int err;
      
              err = perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(),
      			        &output_str, sizeof(output_str));
      	if (err)
      		trace_printk(err_str, sizeof(err_str), err);
              return 1;
        }
        SEC("func_begin=sys_nanosleep")
        int func_begin(void *ctx) {return func(ctx, 1);}
        SEC("func_end=sys_nanosleep%return")
        int func_end(void *ctx) { return func(ctx, 2);}
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Committer note:
      
      Testing with 'perf trace':
      
        # trace -e nanosleep --ev test_bpf_stdout.c usleep 1
           0.007 ( 0.007 ms): usleep/729 nanosleep(rqtp: 0x7ffc5bbc5fe0) ...
           0.007 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.008 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.069 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.070 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.072 ( 0.072 ms): usleep/729  ... [continued]: nanosleep()) = 0
        #
      Suggested-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-5-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      72c08098
    • W
      perf bpf: Clone bpf stdout events in multiple bpf scripts · d7888573
      Wang Nan 提交于
      This patch allows cloning bpf-output event configuration among multiple
      bpf scripts. If there exist a map named '__bpf_output__' and not
      configured using 'map:__bpf_output__.event=', this patch clones the
      configuration of another '__bpf_stdout__' map. For example, following
      command:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/ \
                     --ev ./test_bpf_trace2.c usleep 100000
      
      equals to:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/  \
                     --ev ./test_bpf_trace2.c/map:__bpf_stdout__.event=evt/ \
                     usleep 100000
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7888573
  4. 08 4月, 2016 13 次提交
    • A
      perf dwarf: Guard !x86_64 definitions under #ifdef else clause · f9383452
      Arnaldo Carvalho de Melo 提交于
      To fix the build on Fedora Rawhide (gcc 6.0.0 20160311 (Red Hat 6.0.0-0.17):
      
          CC       /tmp/build/perf/arch/x86/util/dwarf-regs.o
        arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=]
         static const struct pt_regs_offset x86_32_regoffset_table[] = {
                                            ^~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f9383452
    • A
      perf tools: Use readdir() instead of deprecated readdir_r() · bfc279f3
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when parsing tracepoint event definitions, to
      avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
      instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bfc279f3
    • A
      perf tools: Use readdir() instead of deprecated readdir_r() · 7093b4c9
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when synthesizing events for pre-existing threads
      by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
      (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
         CC       /tmp/build/perf/util/event.o
        util/event.c: In function '__event__synthesize_thread':
        util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
          while (!readdir_r(tasks, &dirent, &next) && next) {
          ^~~~~
        In file included from /usr/include/features.h:368:0,
                         from /usr/include/stdint.h:25,
                         from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
                         from /git/linux/tools/include/linux/types.h:6,
                         from util/event.c:1:
        /usr/include/dirent.h:189:12: note: declared here
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7093b4c9
    • A
      perf thread_map: Use readdir() instead of deprecated readdir_r() · 3354cf71
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in thread_map, so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3354cf71
    • A
      perf script: Use readdir() instead of deprecated readdir_r() · a5e8e825
      Arnaldo Carvalho de Melo 提交于
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in 'perf script', so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a5e8e825
    • W
      perf symbols: Adjust symbol for shared objects · 99e87f7b
      Wang Nan 提交于
      He Kuang reported a problem that perf fails to get correct symbol on
      Android platform in [1]. The problem can be reproduced on normal x86_64
      platform. I will describe the reproducing steps in detail at the end of
      commit message.
      
      The reason of this problem is the missing of symbol adjustment for normal
      shared objects. In most of the cases skipping adjustment is okay. However,
      when '.text' section have different 'address' and 'offset' the result is wrong.
      I checked all shared objects in my working platform, only wine dll objects and
      debug objects (in .debug) have this problem. However, it is common on Android.
      For example:
      
       $ readelf -S ./libsurfaceflinger.so | grep \.text
         [10] .text             PROGBITS         0000000000029030  00012030
      
      This patch enables symbol adjustment for dynamic objects so the symbol
      address got from elfutils would be adjusted correctly.
      
      Now nearly all types of ELF files should adjust symbols. Makes
      ss->adjust_symbols default to true.
      
      Steps to reproduce the problem:
      
        $ cat ./Makefile
        PWD := $(shell pwd)
        LDFLAGS += "-Wl,-rpath=$(PWD)"
        CFLAGS += -g
        main: main.c libbuggy.so
        libbuggy.so: buggy.c
      	gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
        clean:
      	rm -rf main libbuggy.so *.o
      
        $ cat ./buggy.c
        int fib(int x)
        {
            return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
        }
      
        $ cat ./main.c
        #include <stdio.h>
      
        extern int fib(int x);
        int main()
        {
           int i;
      
           for (i = 0; i < 40; i++)
               printf("%d\n", fib(i));
           return 0;
       }
      
       $ make
       $ perf record ./main
       ...
       $ perf report --stdio
       # Overhead  Command  Shared Object      Symbol
       # ........  .......  .................  ...............................
       #
           14.97%  main     libbuggy.so        [.] 0x000000000000066c
            8.68%  main     libbuggy.so        [.] 0x00000000000006aa
            8.52%  main     libbuggy.so        [.] fib@plt
            7.95%  main     libbuggy.so        [.] 0x0000000000000664
            5.94%  main     libbuggy.so        [.] 0x00000000000006a9
            5.35%  main     libbuggy.so        [.] 0x0000000000000678
       ...
      
      The correct result should be (after this patch):
      
        # Overhead  Command  Shared Object      Symbol
        # ........  .......  .................  ...............................
        #
            91.47%  main     libbuggy.so        [.] fib
             8.52%  main     libbuggy.so        [.] fib@plt
             0.00%  main     [kernel.kallsyms]  [k] kmem_cache_free
      
      [1] http://lkml.kernel.org/g/1452567507-54013-1-git-send-email-hekuang@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99e87f7b
    • W
      perf symbols: Record text offset in dso to calculate objdump address · a58f7033
      Wang Nan 提交于
      In this patch, the offset of '.text' section is stored into dso
      and used here to re-calculate address to objdump.
      
      In most of the cases, executable code is in '.text' section, so the
      adjustment made to a symbol in dso__load_sym (using
      sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
      'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
      get objdump address from symbol address (rip). However, it is not true
      for kernel and kernel module since there could be multiple executable
      sections with different offset. Exclude kernel for this reason.
      
      After this patch, even dso->adjust_symbols is set to true for shared
      objects, map__rip_2objdump() and map__objdump_2mem() would return
      correct result, so perf behavior of annotate won't be changed.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-2-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a58f7033
    • A
      perf tools: Build syscall table .c header from kernel's syscall_64.tbl · 1b700c99
      Arnaldo Carvalho de Melo 提交于
      We used libaudit to map ids to syscall names and vice-versa, but that
      imposes a delay in supporting new syscalls, having to wait for libaudit
      to get those new syscalls on its tables.
      
      To remove that delay, for x86_64 initially, grab a copy of
      arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
      tables.
      
      Syscalls currently not available in audit-libs:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
        Error:	Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
        Hint:	try 'perf list syscalls:sys_enter_*'
        Hint:	and: 'man syscalls'
        #
      
      With this patch:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
          8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
          8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
         30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096
         31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
         31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
      [ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b700c99
    • A
      perf tools: Allow generating per-arch syscall table arrays · 5af56fab
      Arnaldo Carvalho de Melo 提交于
      Tools should use a mechanism similar to arch/x86/entry/syscalls/ to
      generate a header file with the definitions for two variables:
      
        static const char *syscalltbl_x86_64[] = {
      	[0] = "read",
      	[1] = "write",
        <SNIP>
      	[324] = "membarrier",
      	[325] = "mlock2",
      	[326] = "copy_file_range",
        };
        static const int syscalltbl_x86_64_max_id = 326;
      
      In a per arch file that should then be included in
      tools/perf/util/syscalltbl.c.
      
      First one will be for x86_64.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-02uuamkxgccczdth8komspgp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5af56fab
    • A
      perf trace: Move syscall table id <-> name routines to separate class · fd0db102
      Arnaldo Carvalho de Melo 提交于
      We're using libaudit for doing name to id and id to syscall name
      translations, but that makes 'perf trace' to have to wait for newer
      libaudit versions supporting recently added syscalls, such as
      "userfaultfd" at the time of this changeset.
      
      We have all the information right there, in the kernel sources, so move
      this code to a separate place, wrapped behind functions that will
      progressively use the kernel source files to extract the syscall table
      for use in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd0db102
    • A
      perf trace: Beautify mode_t arguments · ba2f22cf
      Arnaldo Carvalho de Melo 提交于
      When reading the syscall tracepoint /format file, look for arguments of type
      "mode_t" and attach a beautifier:
      
        [root@jouet ~]# cat ~/bin/tp_with_fields_of_type
        #!/bin/bash
        grep -w $1 /sys/kernel/tracing/events/syscalls/*/format | sed -r 's%.*sys_enter_(.*)/format.*%\1%g' | paste -d, -s
        # tp_with_fields_of_type umode_t
        chmod,creat,fchmodat,fchmod,mkdirat,mkdir,mknodat,mknod,mq_open,openat,open
        #
      
      Testing it:
      
        #define S_ISUID 0004000
        #define S_ISGID 0002000
        #define S_ISVTX 0001000
        #define S_IRWXU 0000700
        #define S_IRUSR 0000400
        #define S_IWUSR 0000200
        #define S_IXUSR 0000100
      
        #define S_IRWXG 0000070
        #define S_IRGRP 0000040
        #define S_IWGRP 0000020
        #define S_IXGRP 0000010
      
        #define S_IRWXO 0000007
        #define S_IROTH 0000004
        #define S_IWOTH 0000002
        #define S_IXOTH 0000001
      
        # for mode in 4000 2000 1000 700 400 200 100 70 40 20 10 7 4 2 1 ; do \
            echo -n $mode '->' ; trace --no-inherit -e chmod,fchmodat,fchmod chmod $mode x; \
          done
        4000 -> 0.338 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID) = 0
        2000 -> 0.438 ( 0.015 ms): fchmodat(dfd: CWD, filename: x, mode: ISGID) = 0
        1000 -> 0.677 ( 0.040 ms): fchmodat(dfd: CWD, filename: x, mode: ISVTX) = 0
         700 -> 0.394 ( 0.013 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXU) = 0
         400 -> 0.337 ( 0.010 ms): fchmodat(dfd: CWD, filename: x, mode: IRUSR) = 0
         200 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IWUSR) = 0
         100 -> 0.249 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXUSR) = 0
          70 -> 0.266 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXG) = 0
          40 -> 0.329 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRGRP) = 0
          20 -> 0.250 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWGRP) = 0
          10 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXGRP) = 0
           7 -> 0.249 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXO) = 0
           4 -> 0.278 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IROTH) = 0
           2 -> 0.276 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWOTH) = 0
           1 -> 0.250 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXOTH) = 0
        #
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7777 x
           0.258 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IALLUGO) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7770 x
           0.258 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID|ISGID|ISVTX|IRWXU|IRWXG) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 777 x
           0.293 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXUGO
        #
      
      Now lets see if check by using the tracepoint for that specific syscall,
      instead of raw_syscalls:sys_enter as 'trace' does for its strace fu:
      
        # trace --no-inherit --ev syscalls:sys_enter_fchmodat -e fchmodat chmod 666 x
           0.255 (         ): syscalls:sys_enter_fchmodat:dfd: 0xffffffffffffff9c, filename: 0x55db32a3f0f0, mode: 0x000001b6)
           0.268 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRUGO|IWUGO                     ) = 0
        #
      
      Perfect, 0x1bc == 0666.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-18e8zfgbkj83xo87yoom43kd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba2f22cf
    • J
      perf script: Process event update events · 91daee30
      Jiri Olsa 提交于
      Andreas reported following command produces no output:
      
        # cat test.py
        #!/usr/bin/env python
      
        def stat__krava(cpu, thread, time, val, ena, run):
            print "event %s cpu %d, thread %d, time %d, val %d, ena %d, run %d" % \
                  ("krava", cpu, thread, time, val, ena, run)
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        ^C
        #
      
      The reason is that 'perf script' does not process event update events and
      will never get the event name update thus the python callback is never
      called.
      
      The fix is just to add already existing callback we use in 'perf stat
      report'.
      
      Committer note:
      
      After the patch:
      
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        event krava cpu -1, thread -1, time 1000239179, val 1789051, ena 4000690920, run 4000690920
        event krava cpu -1, thread -1, time 2000479061, val 2391338, ena 4000879596, run 4000879596
        event krava cpu -1, thread -1, time 3000740802, val 1939121, ena 4000977209, run 4000977209
        event krava cpu -1, thread -1, time 4001006730, val 2356115, ena 4001000489, run 4001000489
        ^C
        #
      Reported-by: NAndreas Hollmann <hollmann@in.tum.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-3-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91daee30
    • J
      perf tools: Add dedicated unwind addr_space member into thread struct · e583d70c
      Jiri Olsa 提交于
      Milian reported issue with thread::priv, which was double booked by perf
      trace and DWARF unwind code. So using those together is impossible at
      the moment.
      
      Moving DWARF unwind private data into separate variable so perf trace
      can keep using thread::priv.
      Reported-and-Tested-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andreas Hollmann <hollmann@in.tum.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-2-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e583d70c
  5. 07 4月, 2016 1 次提交