1. 22 8月, 2017 1 次提交
    • A
      perf evsel: Fix buffer overflow while freeing events · 475fb533
      Andi Kleen 提交于
      Fix buffer overflow for:
      
        % perf stat -e msr/tsc/,cstate_core/c7-residency/ true
      
      that causes glibc free list corruption. For some reason it doesn't
      trigger in valgrind, but it is visible in AS:
      
        =================================================================
        ==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
        READ of size 4 at 0x603000003f5c thread T0
          #0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
          #1 0x56c57a in perf_evsel__close util/evsel.c:1717
          #2 0x55ed5f in perf_evlist__close util/evlist.c:1631
          #3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
          #4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
          #5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
          #6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
          #7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
          #8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
          #9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
          #10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
          #11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
      
        0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
        allocated by thread T0 here:
          #0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
          #1 0x648a2d in zalloc util/util.h:23
          #2 0x648a88 in xyarray__new util/xyarray.c:9
          #3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
          #4 0x56b427 in perf_evsel__open util/evsel.c:1529
          #5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
          #6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
          #7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
          #8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
          #9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
          #10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
          #11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
          #12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
          #13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
          #14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
      
      The event is allocated with cpus == 1, but freed with cpus == real number
      When the evsel close function walks the file descriptors it exceeds the
      fd xyarray boundaries and reads random memory.
      
      v2:
      
      Now that xyarrays save their original dimensions we can use these to
      iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
      evsel.c to use these fields directly. This allows simplifying the code
      and dropping quite a few function arguments. Adjust all callers by
      removing the unneeded arguments.
      
      The actual perf event reading still uses the original values from the
      evsel list.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
      [ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      475fb533
  2. 27 7月, 2017 1 次提交
  3. 19 7月, 2017 2 次提交
    • A
      perf evsel: Allow asking for max precise_ip in new_cycles() · 30269dc1
      Arnaldo Carvalho de Melo 提交于
      There are cases where we want to leave attr.precise_ip as zero, such
      as when using 'perf record --no-samples', where this would make the
      kernel return -EINVAL.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-4zq1udecxa51gsapyfwej5fj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      30269dc1
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  4. 23 3月, 2017 2 次提交
    • A
      perf pmu: Add support for MetricName JSON attribute · 96284814
      Andi Kleen 提交于
      Add support for a new JSON event attribute to name MetricExpr for better
      output in perf stat.
      
      If the event has no MetricName it uses the normal event name instead to
      describe the metric.
      
      Before
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time unc_p_freq_max_os_cycles
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      
      After
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time freq_max_os_cycles %
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      96284814
    • A
      perf stat: Output JSON MetricExpr metric · 37932c18
      Andi Kleen 提交于
      Add generic infrastructure to perf stat to output ratios for
      "MetricExpr" entries in the event lists. Many events are more useful as
      ratios than in raw form, typically some count in relation to total
      ticks.
      
      Transfer the MetricExpr information from the alias to the evsel.
      
      We mark the events that need to be collected for MetricExpr, and also
      link the events using them with a pointer. The code is careful to always
      prefer the right event in the same group to minimize multiplexing
      errors. At the moment only a single relation is supported.
      
      Then add a rblist to the stat shadow code that remembers stats based on
      the cpu and context.
      
      Then finally update and retrieve and print these values similarly to the
      existing hardcoded perf metrics. We use the simple expression parser
      added earlier to evaluate the expression.
      
      Normally we just output the result without further commentary, but for
      --metric-only this would lead to empty columns. So for this case use the
      original event as description.
      
      There is no attempt to automatically add the MetricExpr event, if it is
      missing, however we suggest it to the user, because the user tool
      doesn't have enough information to reliably construct a group that is
      guaranteed to schedule. So we leave that to the user.
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
             1.000147889        800,085,181      unc_p_clockticks
             1.000147889         93,126,241      unc_p_freq_max_os_cycles  #     11.6
             2.000448381        800,218,217      unc_p_clockticks
             2.000448381        142,516,095      unc_p_freq_max_os_cycles  #     17.8
             3.000639852        800,243,057      unc_p_clockticks
             3.000639852        162,292,689      unc_p_freq_max_os_cycles  #     20.3
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
        #    time         freq_max_os_cycles %
             1.000127077      0.9
             2.000301436      0.7
             3.000456379      0.0
      
      v2: Change from DivideBy to MetricExpr
      v3: Use expr__ prefix.  Support more than one other event.
      v4: Update description
      v5: Only print warning message once for multiple PMUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37932c18
  5. 22 3月, 2017 1 次提交
    • A
      perf stat: Collapse identically named events · 430daf2d
      Andi Kleen 提交于
      The uncore PMU has a lot of duplicated PMUs for different subsystems.
      When expanding an uncore alias we usually end up with a large
      number of identically named aliases, which makes perf stat
      output difficult to read.
      
      Automatically sum them up in perf stat, unless --no-merge is specified.
      
      This can be default because only the uncores generally have duplicated
      aliases. Other PMUs have unique names.
      
      Before:
      
        % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
                 694,976 Bytes unc_c_llc_lookup.any
                 706,304 Bytes unc_c_llc_lookup.any
                 956,608 Bytes unc_c_llc_lookup.any
                 782,720 Bytes unc_c_llc_lookup.any
                 605,696 Bytes unc_c_llc_lookup.any
                 442,816 Bytes unc_c_llc_lookup.any
                 659,328 Bytes unc_c_llc_lookup.any
                 509,312 Bytes unc_c_llc_lookup.any
                 263,936 Bytes unc_c_llc_lookup.any
                 592,448 Bytes unc_c_llc_lookup.any
                 672,448 Bytes unc_c_llc_lookup.any
                 608,640 Bytes unc_c_llc_lookup.any
                 641,024 Bytes unc_c_llc_lookup.any
                 856,896 Bytes unc_c_llc_lookup.any
                 808,832 Bytes unc_c_llc_lookup.any
                 684,864 Bytes unc_c_llc_lookup.any
                 710,464 Bytes unc_c_llc_lookup.any
                 538,304 Bytes unc_c_llc_lookup.any
      
             1.002577660 seconds time elapsed
      
      After:
      
        % perf stat -a -e unc_c_llc_lookup.any sleep 1
      
        Performance counter stats for 'system wide':
      
               2,685,120 Bytes unc_c_llc_lookup.any
      
             1.002648032 seconds time elapsed
      
      v2: Split collect_aliases. Rename alias flag.
      v3: Make sure unsupported/not counted is always printed.
      v4: Factor out callback change into separate patch.
      v5: Move check for bad results here
          Move merged check into collect_data
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      430daf2d
  6. 16 12月, 2016 1 次提交
    • J
      perf evsel: Allow to ignore missing pid · a359c17a
      Jiri Olsa 提交于
      Adding perf_evsel::ignore_missing_cpu_thread bool.
      
      When set true, it allows perf to ignore error of missing pid of perf
      event syscall.
      
      We remove missing thread id from the thread_map, so the rest of the
      processing like ioctl and mmap won't get disturbed with -1 fd.
      
      The reason for supporting this is to ease up monitoring group of pids,
      that 'disappear' before perf opens their event. This currently leads
      perf to report error and exit and makes perf record's -u option unusable
      under certain setup.
      
      With this change we will allow this race and ignore such failure with
      following warning:
      
        WARNING: Ignored open failure for pid 8605
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161213074622.GA3084@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a359c17a
  7. 25 11月, 2016 1 次提交
  8. 23 11月, 2016 1 次提交
  9. 24 10月, 2016 1 次提交
  10. 29 9月, 2016 3 次提交
  11. 14 9月, 2016 1 次提交
  12. 29 7月, 2016 1 次提交
  13. 16 7月, 2016 2 次提交
    • W
      perf tools: Enable overwrite settings · 626a6b78
      Wang Nan 提交于
      This patch allows following config terms and option:
      
      Globally setting events to overwrite;
      
        # perf record --overwrite ...
      
      Set specific events to be overwrite or no-overwrite.
      
        # perf record --event cycles/overwrite/ ...
        # perf record --event cycles/no-overwrite/ ...
      
      Add missing config terms and update the config term array size because
      the longest string length has changed.
      
      For overwritable events, it automatically selects attr.write_backward
      since perf requires it to be backward for reading.
      
      Test result:
      
        # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
        # perf evlist -v
        syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      626a6b78
    • A
      perf evlist: Drop redundant evsel->overwrite indicator · 32a951b4
      Arnaldo Carvalho de Melo 提交于
      evsel->overwrite indicator means an event should be put into
      overwritable ring buffer. In current implementation, it equals to
      evsel->attr.write_backward. To reduce compliexity, remove
      evsel->overwrite, use evsel->attr.write_backward instead.
      
      In addition, in __perf_evsel__open(), if kernel doesn't support
      write_backward and user explicitly set it in evsel, don't fallback
      like other missing feature, since it is meaningless to fall back to
      a forward ring buffer in this case: we are unable to stably read
      from an forward overwritable ring buffer.
      
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1468485287-33422-2-git-send-email-wangnan0@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32a951b4
  14. 13 7月, 2016 1 次提交
  15. 30 6月, 2016 1 次提交
  16. 04 6月, 2016 1 次提交
  17. 30 5月, 2016 1 次提交
    • A
      perf tools: Per event max-stack settings · 792d48b4
      Arnaldo Carvalho de Melo 提交于
      The tooling counterpart, now it is possible to do:
      
        # perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
        # perf evlist -v
        sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
        cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
        cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
      be further configurable by means of some .perfconfig knob.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      792d48b4
  18. 21 5月, 2016 1 次提交
  19. 18 4月, 2016 1 次提交
  20. 15 4月, 2016 3 次提交
  21. 13 4月, 2016 1 次提交
  22. 12 4月, 2016 3 次提交
  23. 09 3月, 2016 1 次提交
    • C
      perf tools: Fix perf script python database export crash · 616df645
      Chris Phlipot 提交于
      Remove the union in evsel so that the database id and priv pointer can
      be used simultainously without conflicting and crashing.
      
      Detailed Description for the fixed bug follows:
      
      perf script crashes with a segmentation fault on user space tool version
      4.5.rc7.ge2857b when using the python database export API. It works
      properly in 4.4 and prior versions.
      
      the crash fist appeared in:
      
      cfc8874a ("perf script: Process cpu/threads maps")
      
      How to reproduce the bug:
      
      Remove any temporary files left over from a previous crash (if you have
      already attemped to reproduce the bug):
      
        $ rm -r test_db-perf-data
        $ dropdb test_db
      
        $ perf record timeout 1 yes >/dev/null
        $ perf script -s scripts/python/export-to-postgresql.py test_db
      
        Stack Trace:
        Program received signal SIGSEGV, Segmentation fault.
        __GI___libc_free (mem=0x1) at malloc.c:2929
        2929	malloc.c: No such file or directory.
        (gdb) bt
          at util/stat.c:122
          argv=<optimized out>, prefix=<optimized out>) at builtin-script.c:2231
          argc=argc@entry=4, argv=argv@entry=0x7fffffffdf70) at perf.c:390
          at perf.c:451
      Signed-off-by: NChris Phlipot <cphlipot0@gmail.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: cfc8874a ("perf script: Process cpu/threads maps")
      Link: http://lkml.kernel.org/r/1457500314-8912-1-git-send-email-cphlipot0@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      616df645
  24. 23 2月, 2016 1 次提交
    • W
      perf tools: Introduce bpf-output event · 03e0a7df
      Wang Nan 提交于
      Commit a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      adds a helper to enable a BPF program to output data to a perf ring
      buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
      patch enables perf to create events of that type. Now a perf user can
      use the following cmdline to receive output data from BPF programs:
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script
           perf 1560 [004] 347747.086295:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086300:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086315:  evt: ffffffff811fd201 sys_write ...
                  ...
      
      Test result:
      
        # cat test_bpf_output.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
       	struct {
       		u64 ktime;
       		int cpuid;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output: %d\n";
      
       	output_data.cpuid = get_smp_processor_id();
       	output_data.ktime = ktime_get_ns();
       	int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				    &output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data), err);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************ END ***************************/
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script | grep ls
           ls  2242 [003] 347851.557563:   evt: ffffffff811fd201 sys_write ...
           ls  2242 [003] 347851.557571:   evt: ffffffff811fd201 sys_write ...
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03e0a7df
  25. 09 1月, 2016 1 次提交
    • N
      perf evlist: Add --trace-fields option to show trace fields · 775d8a1b
      Namhyung Kim 提交于
      To use dynamic sort keys, it might be good to add an option to see the
      list of field names.
      
        $ perf evlist -i perf.data.sched
        sched:sched_switch
        sched:sched_stat_wait
        sched:sched_stat_sleep
        sched:sched_stat_iowait
        sched:sched_stat_runtime
        sched:sched_process_fork
        sched:sched_wakeup
        sched:sched_wakeup_new
        sched:sched_migrate_task
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
        $ perf evlist -i perf.data.sched --trace-fields
        sched:sched_switch: trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio
        sched:sched_stat_wait: trace_fields: comm,pid,delay
        sched:sched_stat_sleep: trace_fields: comm,pid,delay
        sched:sched_stat_iowait: trace_fields: comm,pid,delay
        sched:sched_stat_runtime: trace_fields: comm,pid,runtime,vruntime
        sched:sched_process_fork: trace_fields: parent_comm,parent_pid,child_comm,child_pid
        sched:sched_wakeup: trace_fields: comm,pid,prio,success,target_cpu
        sched:sched_wakeup_new: trace_fields: comm,pid,prio,success,target_cpu
        sched:sched_migrate_task: trace_fields: comm,pid,prio,orig_cpu,dest_cpu
      
      Committer notes:
      
      For another file, in verbose mode:
      
        # perf evlist -v --trace-fields
        sched:sched_switch: type: 2, size: 112, config: 0x10b, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio
        #
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1452125549-1511-5-git-send-email-namhyung@kernel.org
      [ Replaced 'trace_fields=' with 'trace_fields: ' to make the output consistent in -v mode ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      775d8a1b
  26. 08 12月, 2015 2 次提交
  27. 30 10月, 2015 1 次提交
    • W
      perf bpf: Attach eBPF filter to perf event · 1f45b1d4
      Wang Nan 提交于
      This is the final patch which makes basic BPF filter work. After
      applying this patch, users are allowed to use BPF filter like:
      
       # perf record --event ./hello_world.o ls
      
      A bpf_fd field is appended to 'struct evsel', and setup during the
      callback function add_bpf_event() for each 'probe_trace_event'.
      
      PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF program to a newly
      created perf event. The file descriptor of the eBPF program is passed to
      perf record using previous patches, and stored into evsel->bpf_fd.
      
      It is possible that different perf event are created for one kprobe
      events for different CPUs. In this case, when trying to call the ioctl,
      EEXIST will be return. This patch doesn't treat it as an error.
      
      Committer note:
      
      The bpf proggie used so far:
      
        __attribute__((section("fork=_do_fork"), used))
        int fork(void *ctx)
        {
      	  return 0;
        }
      
        char _license[] __attribute__((section("license"), used)) = "GPL";
        int _version __attribute__((section("version"), used)) = 0x40300;
      
      failed to produce any samples, even with forks happening and it being
      running in system wide mode.
      
      That is because now the filter is being associated, and the code above
      always returns zero, meaning that all forks will be probed but filtered
      away ;-/
      
      Change it to 'return 1;' instead and after that:
      
        # trace --no-syscalls --event /tmp/foo.o
           0.000 perf_bpf_probe:fork:(ffffffff8109be30))
           2.333 perf_bpf_probe:fork:(ffffffff8109be30))
           3.725 perf_bpf_probe:fork:(ffffffff8109be30))
           4.550 perf_bpf_probe:fork:(ffffffff8109be30))
        ^C#
      
      And it works with all tools, including 'perf trace'.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-8-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1f45b1d4
  28. 28 10月, 2015 2 次提交
    • W
      perf tools: Enable pre-event inherit setting by config terms · 374ce938
      Wang Nan 提交于
      This patch allows perf record setting event's attr.inherit bit by
      config terms like:
      
        # perf record -e cycles/no-inherit/ ...
        # perf record -e cycles/inherit/ ...
      
      So user can control inherit bit for each event separately.
      
      In following example, a.out fork()s in main then do some complex
      CPU intensive computations in both of its children.
      
      Basic result with and without inherit:
      
        # perf record -e cycles -e instructions ./a.out
        [ perf record: Woken up 9 times to write data ]
        [ perf record: Captured and wrote 2.205 MB perf.data (47920 samples) ]
        # perf report --stdio
        # ...
        # Samples: 23K of event 'cycles'
        # Event count (approx.): 23641752891
        ...
        # Samples: 24K of event 'instructions'
        # Event count (approx.): 30428312415
      
        # perf record -i -e cycles -e instructions ./a.out
        [ perf record: Woken up 5 times to write data ]
        [ perf record: Captured and wrote 1.111 MB perf.data (24019 samples) ]
        ...
        # Samples: 12K of event 'cycles'
        # Event count (approx.): 11699501775
        ...
        # Samples: 12K of event 'instructions'
        # Event count (approx.): 15058023559
      
      Cancel inherit for one event when globally enable:
      
        # perf record -e cycles/no-inherit/ -e instructions ./a.out
        [ perf record: Woken up 7 times to write data ]
        [ perf record: Captured and wrote 1.660 MB perf.data (36004 samples) ]
        ...
        # Samples: 12K of event 'cycles/no-inherit/'
        # Event count (approx.): 11895759282
       ...
        # Samples: 24K of event 'instructions'
        # Event count (approx.): 30668000441
      
      Enable inherit for one event when globally disable:
      
        # perf record -i -e cycles/inherit/ -e instructions ./a.out
        [ perf record: Woken up 7 times to write data ]
        [ perf record: Captured and wrote 1.654 MB perf.data (35868 samples) ]
        ...
        # Samples: 23K of event 'cycles/inherit/'
        # Event count (approx.): 23285400229
        ...
        # Samples: 11K of event 'instructions'
        # Event count (approx.): 14969050259
      
      Committer note:
      
      One can check if the bit was set, in addition to seeing the result in
      the perf.data file size as above by doing one of:
      
        # perf record -e cycles -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.911 MB perf.data (63 samples) ]
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
        #
      
      So, the inherit bit was set in both, now, if we disable it globally using
      --no-inherit:
      
        # perf record --no-inherit -e cycles -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.910 MB perf.data (56 samples) ]
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
      
      No inherit bit set, then disabling it and setting just on the cycles event:
      
        # perf record --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.909 MB perf.data (48 samples) ]
        # perf evlist -v
        cycles/inherit/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
        #
      
      We can see it as well in by using a more verbose level of debug messages in
      the tool that sets up the perf_event_attr, 'perf record' in this case:
      
        [root@zoo ~]# perf record -vv --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          mmap                             1
          comm                             1
          freq                             1
          task                             1
          sample_id_all                    1
          exclude_guest                    1
          mmap2                            1
          comm_exec                        1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          config                           0x1
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          freq                             1
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
      
      <SNIP>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446029705-199659-2-git-send-email-wangnan0@huawei.com
      [ s/u64/bool/ for the perf_evsel_config_term inherit field - jolsa]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      374ce938
    • J
      perf evsel: Move id_offset out of struct perf_evsel union member · af339981
      Jiri Olsa 提交于
      Because the 'perf stat record' patches will use the id_offset member
      together with the priv pointer.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NKan Liang <kan.liang@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1445784728-21732-29-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af339981
  29. 06 10月, 2015 1 次提交