1. 21 9月, 2019 1 次提交
  2. 20 9月, 2019 4 次提交
    • S
      perf stat: Fix a segmentation fault when using repeat forever · 443f2d5b
      Srikar Dronamraju 提交于
      Observe a segmentation fault when 'perf stat' is asked to repeat forever
      with the interval option.
      
      Without fix:
      
        # perf stat -r 0 -I 5000 -e cycles -a sleep 10
        #           time             counts unit events
             5.000211692  3,13,89,82,34,157      cycles
            10.000380119  1,53,98,52,22,294      cycles
            10.040467280       17,16,79,265      cycles
        Segmentation fault
      
      This problem was only observed when we use forever option aka -r 0 and
      works with limited repeats. Calling print_counter with ts being set to
      NULL, is not a correct option when interval is set. Hence avoid
      print_counter(NULL,..)  if interval is set.
      
      With fix:
      
        # perf stat -r 0 -I 5000 -e cycles -a sleep 10
         #           time             counts unit events
             5.019866622  3,15,14,43,08,697      cycles
            10.039865756  3,15,16,31,95,261      cycles
            10.059950628     1,26,05,47,158      cycles
             5.009902655  3,14,52,62,33,932      cycles
            10.019880228  3,14,52,22,89,154      cycles
            10.030543876       66,90,18,333      cycles
             5.009848281  3,14,51,98,25,437      cycles
            10.029854402  3,15,14,93,04,918      cycles
             5.009834177  3,14,51,95,92,316      cycles
      
      Committer notes:
      
      Did the 'git bisect' to find the cset introducing the problem to add the
      Fixes tag below, and at that time the problem reproduced as:
      
        (gdb) run stat -r0 -I500 sleep 1
        <SNIP>
        Program received signal SIGSEGV, Segmentation fault.
        print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
        866		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
        (gdb) bt
        #0  print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866
        #1  0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938
        #2  0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411
        #3  0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370
        #4  0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429
        #5  0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473
        #6  0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588
        (gdb)
      
      Mostly the same as just before this patch:
      
        Program received signal SIGSEGV, Segmentation fault.
        0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
        964		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
        (gdb) bt
        #0  0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964
        #1  0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670)
            at util/stat-display.c:1172
        #2  0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656
        #3  0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960
        #4  0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310
        #5  0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362
        #6  0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406
        #7  0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531
        (gdb)
      
      Fixes: d4f63a47 ("perf stat: Introduce print_counters function")
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # v4.2+
      Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      443f2d5b
    • S
      perf stat: Reset previous counts on repeat with interval · b63fd11c
      Srikar Dronamraju 提交于
      When using 'perf stat' with repeat and interval option, it shows wrong
      values for events.
      
      The wrong values will be shown for the first interval on the second and
      subsequent repetitions.
      
      Without the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.000282489                 53      faults
           2.000282489                513      sched:sched_switch
           4.005478208              3,721      faults
           4.005478208              2,666      sched:sched_switch
           5.025470933                395      faults
           5.025470933              1,307      sched:sched_switch
           2.009602825 1,84,46,74,40,73,70,95,47,520      faults 		<------
           2.009602825 1,84,46,74,40,73,70,95,49,568      sched:sched_switch  <------
           4.019612206              4,730      faults
           4.019612206              2,746      sched:sched_switch
           5.039615484              3,953      faults
           5.039615484              1,496      sched:sched_switch
           2.000274620 1,84,46,74,40,73,70,95,47,520      faults		<------
           2.000274620 1,84,46,74,40,73,70,95,47,520      sched:sched_switch	<------
           4.000480342              4,282      faults
           4.000480342              2,303      sched:sched_switch
           5.000916811              1,322      faults
           5.000916811              1,064      sched:sched_switch
        #
      
      prev_raw_counts is allocated when using intervals. This is used when
      calculating the difference in the counts of events when using interval.
      
      The current counts are stored in prev_raw_counts to calculate the
      differences in the next iteration.
      
      On the first interval of the second and subsequent repetitions,
      prev_raw_counts would be the values stored in the last interval of the
      previous repetitions, while the current counts will only be for the
      first interval of the current repetition.
      
      Hence there is a possibility of events showing up as big number.
      
      Fix this by resetting prev_raw_counts whenever perf stat repeats the
      command.
      
      With the fix:
      
        # perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
      
           2.019349347              2,597      faults
           2.019349347              2,753      sched:sched_switch
           4.019577372              3,098      faults
           4.019577372              2,532      sched:sched_switch
           5.019415481              1,879      faults
           5.019415481              1,356      sched:sched_switch
           2.000178813              8,468      faults
           2.000178813              2,254      sched:sched_switch
           4.000404621              7,440      faults
           4.000404621              1,266      sched:sched_switch
           5.040196079              2,458      faults
           5.040196079                556      sched:sched_switch
           2.000191939              6,870      faults
           2.000191939              1,170      sched:sched_switch
           4.000414103                541      faults
           4.000414103                902      sched:sched_switch
           5.000809863                450      faults
           5.000809863                364      sched:sched_switch
        #
      
      Committer notes:
      
      This was broken since the cset introducing the --interval feature, i.e.
      --repeat + --interval wasn't tested at that point, add the Fixes tag so
      that automatic scripts can pick this up.
      
      Fixes: 13370a9b ("perf stat: Add interval printing")
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@vger.kernel.org # v3.9+
      Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com
      [ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b63fd11c
    • A
      perf tools: Move event synthesizing routines to separate header · ea49e01c
      Arnaldo Carvalho de Melo 提交于
      Those are the only routines using the perf_event__handler_t typedef and
      are all related, so move to a separate header to reduce the header
      dependency tree, lots of places were getting event.h and even stdio.h,
      limits.h indirectly, so fix those as well.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-yvx9u1mf7baq6cu1abfhbqgs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea49e01c
    • A
      perf stat: Move perf_stat_synthesize_config() to event.h · b251892d
      Arnaldo Carvalho de Melo 提交于
      Together with the other synthsizers, and rename it to
      perf_event__synthesize_stat_events().
      
      This allows us to stop including event.h in util/stat.h.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-q5ebhrp44txboobs86htu5r9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b251892d
  3. 10 9月, 2019 1 次提交
  4. 01 9月, 2019 1 次提交
  5. 30 8月, 2019 2 次提交
  6. 29 8月, 2019 1 次提交
  7. 26 8月, 2019 1 次提交
  8. 23 8月, 2019 1 次提交
  9. 22 8月, 2019 1 次提交
  10. 30 7月, 2019 17 次提交
  11. 23 7月, 2019 1 次提交
    • J
      perf stat: Fix segfault for event group in repeat mode · 08ef3af1
      Jiri Olsa 提交于
      Numfor Mbiziwo-Tiapo reported segfault on stat of event group in repeat
      mode:
      
        # perf stat -e '{cycles,instructions}' -r 10 ls
      
      It's caused by memory corruption due to not cleaned evsel's id array and
      index, which needs to be rebuilt in every stat iteration. Currently the
      ids index grows, while the array (which is also not freed) has the same
      size.
      
      Fixing this by releasing id array and zeroing ids index in
      perf_evsel__close function.
      
      We also need to keep the evsel_list alive for stat record (which is
      disabled in repeat mode).
      Reported-by: NNumfor Mbiziwo-Tiapo <nums@google.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Drayton <mbd@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190715142121.GC6032@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      08ef3af1
  12. 09 7月, 2019 3 次提交
  13. 26 6月, 2019 1 次提交
    • A
      tools perf: Move from sane_ctype.h obtained from git to the Linux's original · 3052ba56
      Arnaldo Carvalho de Melo 提交于
      We got the sane_ctype.h headers from git and kept using it so far, but
      since that code originally came from the kernel sources to the git
      sources, perhaps its better to just use the one in the kernel, so that
      we can leverage tools/perf/check_headers.sh to be notified when our copy
      gets out of sync, i.e. when fixes or goodies are added to the code we've
      copied.
      
      This will help with things like tools/lib/string.c where we want to have
      more things in common with the kernel, such as strim(), skip_spaces(),
      etc so as to go on removing the things that we have in tools/perf/util/
      and instead using the code in the kernel, indirectly and removing things
      like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements
      are made to the original code.
      
      Hopefully this also should help with reducing the difference of code
      hosted in tools/ to the one in the kernel proper.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3052ba56
  14. 11 6月, 2019 1 次提交
    • K
      perf stat: Support per-die aggregation · db5742b6
      Kan Liang 提交于
      It is useful to aggregate counts per die. E.g. Uncore becomes die-scope
      on Xeon Cascade Lake-AP.
      
      Introduce a new option "--per-die" to support per-die aggregation.
      
      The global id for each core has been changed to socket + die id + core
      id. The global id for each die is socket + die id.
      
      Add die information for per-core aggregation. The output of per-core
      aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts
      which rely on the output format of per-core aggregation probably be
      broken.
      
      For 'perf stat record/report', there is no die information when
      processing the old perf.data. The per-die result will be the same as
      per-socket.
      
      Committer notes:
      
      Renamed 'die' variable to 'die_id' to fix the build in some systems:
      
          CC       /tmp/build/perf/builtin-script.o
        cc1: warnings being treated as errors
        builtin-stat.c: In function 'perf_env__get_die':
        builtin-stat.c:963: error: declaration of 'die' shadows a global declaration
        util/util.h:19: error: shadowed declaration is here
        mv: cannot stat `/tmp/build/perf/.builtin-stat.o.tmp': No such file or directory
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/n/tip-bsnhx7vgsuu6ei307mw60mbj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db5742b6
  15. 05 6月, 2019 1 次提交
  16. 17 5月, 2019 1 次提交
    • J
      perf stat: Support 'percore' event qualifier · 4fc4d8df
      Jin Yao 提交于
      With this patch, we can use the 'percore' event qualifier in perf-stat.
      
        root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
          1.000773050 S0-C0   98,352,832 cpu/event=0,umask=0x3,percore=1/  (50.01%)
          1.000773050 S0-C1  103,763,057 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C2  196,776,995 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C3  176,493,779 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 CPU0    47,699,641 cpu/event=0,umask=0x3/            (50.02%)
          1.000773050 CPU1    49,052,451 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU2   102,771,422 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU3   100,784,662 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU4    43,171,342 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU5    54,152,158 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU6    93,618,410 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU7    74,477,589 cpu/event=0,umask=0x3/            (49.99%)
      
      In this example, we count the event 'ref-cycles' per-core and per-CPU in
      one perf stat command-line. From the output, we can see:
      
        S0-C0 = CPU0 + CPU4
        S0-C1 = CPU1 + CPU5
        S0-C2 = CPU2 + CPU6
        S0-C3 = CPU3 + CPU7
      
      So the result is expected (tiny difference is ignored).
      
      Note that, the 'percore' event qualifier needs to use with option '-A'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4fc4d8df
  17. 16 4月, 2019 1 次提交
  18. 02 4月, 2019 1 次提交
    • A
      perf stat: Implement duration_time as a proper event · f0fbb114
      Andi Kleen 提交于
      The perf metric expression use 'duration_time' internally to normalize
      events.  Normal 'perf stat' without -x also prints the duration time.
      But when using -x, the interval is not output anywhere, which is
      inconvenient for any post processing which often wants to normalize
      values to time.
      
      So implement 'duration_time' as a proper perf event that can be
      specified explicitely with -e.
      
      The previous implementation of 'duration_time' only worked for metric
      processing. This adds the concept of a tool event that is handled by the
      tool. On the kernel level it is still mapped to the dummy software
      event, but the values are not read anymore, but instead computed by the
      tool.
      
      Add proper plumbing to handle this in the event parser, and display it
      in 'perf stat'. We don't want 'duration_time' to be added up, so it's
      only printed for the first CPU.
      
      % perf stat -e duration_time,cycles true
      
       Performance counter stats for 'true':
      
                 555,476 ns   duration_time
                 771,958      cycles
      
             0.000555476 seconds time elapsed
      
             0.000644000 seconds user
             0.000000000 seconds sys
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190326221823.11518-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f0fbb114