1. 12 4月, 2018 1 次提交
  2. 17 3月, 2018 1 次提交
    • T
      perf stat: Fix core dump when flag T is used · fca32340
      Thomas Richter 提交于
      Executing command 'perf stat -T -- ls' dumps core on x86 and s390.
      
      Here is the call back chain (done on x86):
      
       # gdb ./perf
       ....
       (gdb) r stat -T -- ls
      ...
      Program received signal SIGSEGV, Segmentation fault.
      0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
      (gdb) where
       #0  0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
       #1  0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
       #2  0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
          list=0xbfb970, name=0xbf3ef0 "cpu",
          head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
       #3  0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
          list=0xbfb970, name=0xbf3ef0 "cpu",
          head_config=0xbfb930) at util/parse-events.c:1288
       #4  0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
          scanner=0xbf4210) at util/parse-events.y:234
       #5  0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
          "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
          parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
       #6  0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
          "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
          at util/parse-events.c:1713
       #7  0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
       #8  0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
          builtin-stat.c:2828
       #9  0x00000000004c8b0f in run_builtin (p=0xab01a0 <commands+288>, argc=4,
          argv=0x7fffffffe3b0) at perf.c:297
       #10 0x00000000004c8d7c in handle_internal_command (argc=4,
          argv=0x7fffffffe3b0) at perf.c:349
       #11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
         argv=0x7fffffffe200) at perf.c:393
       #12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
      (gdb)
      
      It turns out that a NULL pointer is referenced. Here are the
      function calls:
      
        ...
        cmd_stat()
        +---> add_default_attributes()
      	+---> parse_events(evsel_list, transaction_attrs, NULL);
      	             3rd parameter set to NULL
      
      Function parse_events(xx, xx, struct parse_events_error *err) dives
      into a bison generated scanner and creates
      parser state information for it first:
      
         struct parse_events_state parse_state = {
                      .list   = LIST_HEAD_INIT(parse_state.list),
                      .idx    = evlist->nr_entries,
                      .error  = err,   <--- NULL POINTER !!!
                      .evlist = evlist,
              };
      
      Now various functions inside the bison scanner are called to end up in
      __parse_events_add_pmu(struct parse_events_state *parse_state, ..) with
      first parameter being a pointer to above structure definition.
      
      Now the PMU event name is not found (because being executed in a VM) and
      this function tries to create an error message with
      
         asprintf(&parse_state->error.str, ....)
      
      which references a NULL pointer and dumps core.
      
      Fix this by providing a pointer to the necessary error information
      instead of NULL. Technically only the else part is needed to avoid the
      core dump, just lets be safe...
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180308145735.64717-1-tmricht@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fca32340
  3. 08 3月, 2018 1 次提交
    • A
      perf pmu: Display pmu name when printing unmerged events in stat · 8c5421c0
      Agustin Vega-Frias 提交于
      To simplify creation of events accross multiple instances of the same
      type of PMU stat supports two methods for creating multiple events from
      a single event specification:
      
      1. A prefix or glob can be used in the PMU name.
      2. Aliases, which are listed immediately after the Kernel PMU events
         by perf list, are used.
      
      When the --no-merge option is passed and these events are displayed
      individually the PMU name is lost and it's not possible to see which
      count corresponds to which pmu:
      
          $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null
      
           Performance counter stats for 'system wide':
      
                          67      l3cache/read-miss/
                          67      l3cache/read-miss/
                          63      l3cache/read-miss/
                          60      l3cache/read-miss/
      
                 0.001675706 seconds time elapsed
      
          $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null
      
           Performance counter stats for 'system wide':
      
                          12      l3cache_read_miss
                          17      l3cache_read_miss
                          10      l3cache_read_miss
                           8      l3cache_read_miss
      
                 0.001661305 seconds time elapsed
      
      This change adds the original pmu name to the event. For dynamic pmu
      events the pmu name is restored in the event name:
      
          $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null
      
           Performance counter stats for 'system wide':
      
                          63      l3cache_0_3/read-miss/
                          74      l3cache_0_1/read-miss/
                          64      l3cache_0_2/read-miss/
                          74      l3cache_0_0/read-miss/
      
                 0.001675706 seconds time elapsed
      
      For alias events the name is added after the event name:
      
          $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null
      
           Performance counter stats for 'system wide':
      
                          10      l3cache_read_miss [l3cache_0_3]
                          12      l3cache_read_miss [l3cache_0_1]
                          10      l3cache_read_miss [l3cache_0_2]
                          17      l3cache_read_miss [l3cache_0_0]
      
                 0.001661305 seconds time elapsed
      Signed-off-by: NAgustin Vega-Frias <agustinv@codeaurora.org>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timur Tabi <timur@codeaurora.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Change-Id: I8056b9eda74bda33e95065056167ad96e97cb1fb
      Link: http://lkml.kernel.org/r/1520345084-42646-3-git-send-email-agustinv@codeaurora.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8c5421c0
  4. 06 3月, 2018 1 次提交
  5. 27 2月, 2018 1 次提交
    • J
      perf stat: Ignore error thread when enabling system-wide --per-thread · ab6c79b8
      Jin Yao 提交于
      If we execute 'perf stat --per-thread' with non-root account (even set
      kernel.perf_event_paranoid = -1 yet), it reports the error:
      
        jinyao@skl:~$ perf stat --per-thread
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
              Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
        >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
              Disallow raw tracepoint access by users without CAP_SYS_ADMIN
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
      
        To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
      
                kernel.perf_event_paranoid = -1
      
      Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
      the global --per-thread mode had better ignore such errors and continue
      working on other threads.
      
      This patch will record the index of error thread in perf_evsel__open()
      and remove this thread before retrying.
      
      For example (run with non-root, kernel.perf_event_paranoid isn't set):
      
        jinyao@skl:~$ perf stat --per-thread
        ^C
         Performance counter stats for 'system wide':
      
               vmstat-3458    6.171984   cpu-clock:u (msec) #  0.000 CPUs utilized
                 perf-3670    0.515599   cpu-clock:u (msec) #  0.000 CPUs utilized
               vmstat-3458   1,163,643   cycles:u           #  0.189 GHz
                 perf-3670      40,881   cycles:u           #  0.079 GHz
               vmstat-3458   1,410,238   instructions:u     #  1.21  insn per cycle
                 perf-3670       3,536   instructions:u     #  0.09  insn per cycle
               vmstat-3458     288,937   branches:u         # 46.814 M/sec
                 perf-3670         936   branches:u         #  1.815 M/sec
               vmstat-3458      15,195   branch-misses:u    #  5.26% of all branches
                 perf-3670          76   branch-misses:u    #  8.12% of all branches
      
              12.651675247 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ab6c79b8
  6. 21 2月, 2018 1 次提交
  7. 16 2月, 2018 2 次提交
    • Y
      perf stat: Add support to print counts after a period of time · f1f8ad52
      yuzhoujian 提交于
      Introduce a new option to print counts after N milliseconds and update
      'perf stat' documentation accordingly.
      
      Show below is the output of the new option for perf stat.
      
        $ perf stat --time 2000 -e cycles -a
        Performance counter stats for 'system wide':
      
              157,260,423      cycles
      
              2.003060766 seconds time elapsed
      
      We can print the count deltas after N milliseconds with this new
      introduced option. This option is not supported with "-I" option.
      
      In addition, according to Kangliang's patch(19afd104), the
      monitoring overhead for system-wide core event could be very high if the
      interval-print parameter was below 100ms, and the limitation value is
      10ms.
      
      So the same warning will be displayed when the time is set between 10ms
      to 100ms, and the minimal time is limited to 10ms. Users can make a
      decision according to their spcific cases.
      
      Committer notes:
      
      This actually stops the workload after the specified time, then prints
      the counts.
      
      So I renamed the option to --timeout and updated the documentation to
      state that it will not just print the counts after the specified time,
      but will really stop the 'perf stat' session and print the counts.
      
      The rename from 'time' to 'timeout' also fixes the build in systems
      where 'time' is used by glibc and can't be used as a name of a variable,
      such as centos:5 and centos:6.
      
      Changes since v3:
      - none.
      
      Changes since v2:
      - modify the time check in __run_perf_stat func to keep some consistency
        with the workload case.
      - add the warning when the time is set between 10ms to 100ms.
      - add the pr_err when the time is set below 10ms.
      
      Changes since v1:
      - none.
      Signed-off-by: Nyuzhoujian <yuzhoujian@didichuxing.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1f8ad52
    • Y
      perf stat: Add support to print counts for fixed times · db06a269
      yuzhoujian 提交于
      Introduce a new option to print counts for fixed number of times and
      update 'perf stat' documentation accordingly.
      
      Show below is the output of the new option for perf stat.
      
        $ perf stat -I 1000 --interval-count 2 -e cycles -a
        #           time             counts unit events
                 1.002827089         93,884,870      cycles
                 2.004231506         56,573,446      cycles
      
      We can just print the counts for several times with this newly
      introduced option. The usage of it is a little like 'vmstat', and it
      should be used together with "-I" option.
      
        $ vmstat -n 1 2
        procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
         r  b swpd   free   buff   cache    si   so  bi   bo  in   cs us sy id wa st
         0  0    0 78270544 547484 51732076  0   0   0   20    1    1  1  0 99  0 0
         0  0    0 78270512 547484 51732080  0   0   0   16  477 1555  0  0 100 0 0
      
      Changes since v3:
      - merge interval_count check and times check to one line.
      - fix the wrong indent in stat.h
      - use stat_config.times instead of 'times' in cmd_stat function.
      
      Changes since v2:
      - none.
      
      Changes since v1:
      - change the name of the new option "times-print" to "interval-count".
      - keep the new option interval specifically.
      Signed-off-by: Nyuzhoujian <yuzhoujian@didichuxing.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db06a269
  8. 27 12月, 2017 7 次提交
    • P
      perf perf: Remove duplicate includes · 3315d14f
      Pravin Shedge 提交于
      These duplicate includes have been found with scripts/checkincludes.pl
      but they have been removed manually to avoid removing false positives.
      Signed-off-by: NPravin Shedge <pravin.shedge4linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3315d14f
    • J
      perf stat: Resort '--per-thread' result · 29734550
      Jin Yao 提交于
      There are many threads reported if we enable '--per-thread'
      globally.
      
      1. Most of the threads are not counted or counting value 0.
      This patch removes these threads.
      
      2. We also resort the threads in display according to the
      counting value. It's useful for user to see the hottest
      threads easily.
      
      For example, the new results would be:
      
      root@skl:/tmp# perf stat --per-thread
      ^C
       Performance counter stats for 'system wide':
      
                  perf-24165              4.302433      cpu-clock (msec)          #    0.001 CPUs utilized
                vmstat-23127              1.562215      cpu-clock (msec)          #    0.000 CPUs utilized
            irqbalance-2780               0.827851      cpu-clock (msec)          #    0.000 CPUs utilized
                  sshd-23111              0.278308      cpu-clock (msec)          #    0.000 CPUs utilized
              thermald-2841               0.230880      cpu-clock (msec)          #    0.000 CPUs utilized
                  sshd-23058              0.207306      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/0:2-19991              0.133983      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/u16:1-18249              0.125636      cpu-clock (msec)          #    0.000 CPUs utilized
             rcu_sched-8                  0.085533      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/u16:2-23146              0.077139      cpu-clock (msec)          #    0.000 CPUs utilized
                 gmain-2700               0.041789      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/4:1-15354              0.028370      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/6:0-17528              0.023895      cpu-clock (msec)          #    0.000 CPUs utilized
          kworker/4:1H-1887               0.013209      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/5:2-31362              0.011627      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/0-11                 0.010892      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/3:2-12870              0.010220      cpu-clock (msec)          #    0.000 CPUs utilized
           ksoftirqd/0-7                  0.008869      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/1-14                 0.008476      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/7-50                 0.002944      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/3-26                 0.002893      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/4-32                 0.002759      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/2-20                 0.002429      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/6-44                 0.001491      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/5-38                 0.001477      cpu-clock (msec)          #    0.000 CPUs utilized
             rcu_sched-8                        10      context-switches          #    0.117 M/sec
         kworker/u16:1-18249                     7      context-switches          #    0.056 M/sec
                  sshd-23111                     4      context-switches          #    0.014 M/sec
                vmstat-23127                     4      context-switches          #    0.003 M/sec
                  perf-24165                     4      context-switches          #    0.930 K/sec
           kworker/0:2-19991                     3      context-switches          #    0.022 M/sec
         kworker/u16:2-23146                     3      context-switches          #    0.039 M/sec
           kworker/4:1-15354                     2      context-switches          #    0.070 M/sec
           kworker/6:0-17528                     2      context-switches          #    0.084 M/sec
                  sshd-23058                     2      context-switches          #    0.010 M/sec
           ksoftirqd/0-7                         1      context-switches          #    0.113 M/sec
            watchdog/0-11                        1      context-switches          #    0.092 M/sec
            watchdog/1-14                        1      context-switches          #    0.118 M/sec
            watchdog/2-20                        1      context-switches          #    0.412 M/sec
            watchdog/3-26                        1      context-switches          #    0.346 M/sec
            watchdog/4-32                        1      context-switches          #    0.362 M/sec
            watchdog/5-38                        1      context-switches          #    0.677 M/sec
            watchdog/6-44                        1      context-switches          #    0.671 M/sec
            watchdog/7-50                        1      context-switches          #    0.340 M/sec
          kworker/4:1H-1887                      1      context-switches          #    0.076 M/sec
              thermald-2841                      1      context-switches          #    0.004 M/sec
                 gmain-2700                      1      context-switches          #    0.024 M/sec
            irqbalance-2780                      1      context-switches          #    0.001 M/sec
           kworker/3:2-12870                     1      context-switches          #    0.098 M/sec
           kworker/5:2-31362                     1      context-switches          #    0.086 M/sec
         kworker/u16:1-18249                     2      cpu-migrations            #    0.016 M/sec
         kworker/u16:2-23146                     2      cpu-migrations            #    0.026 M/sec
             rcu_sched-8                         1      cpu-migrations            #    0.012 M/sec
                  sshd-23058                     1      cpu-migrations            #    0.005 M/sec
                  perf-24165             8,833,385      cycles                    #    2.053 GHz
                vmstat-23127             1,702,699      cycles                    #    1.090 GHz
            irqbalance-2780                739,847      cycles                    #    0.894 GHz
                  sshd-23111               269,506      cycles                    #    0.968 GHz
              thermald-2841                204,556      cycles                    #    0.886 GHz
                  sshd-23058               158,780      cycles                    #    0.766 GHz
           kworker/0:2-19991               112,981      cycles                    #    0.843 GHz
         kworker/u16:1-18249               100,926      cycles                    #    0.803 GHz
             rcu_sched-8                    74,024      cycles                    #    0.865 GHz
         kworker/u16:2-23146                55,984      cycles                    #    0.726 GHz
                 gmain-2700                 34,278      cycles                    #    0.820 GHz
           kworker/4:1-15354                20,665      cycles                    #    0.728 GHz
           kworker/6:0-17528                16,445      cycles                    #    0.688 GHz
           kworker/5:2-31362                 9,492      cycles                    #    0.816 GHz
            watchdog/3-26                    8,695      cycles                    #    3.006 GHz
          kworker/4:1H-1887                  8,238      cycles                    #    0.624 GHz
            watchdog/4-32                    7,580      cycles                    #    2.747 GHz
           kworker/3:2-12870                 7,306      cycles                    #    0.715 GHz
            watchdog/2-20                    7,274      cycles                    #    2.995 GHz
            watchdog/0-11                    6,988      cycles                    #    0.642 GHz
           ksoftirqd/0-7                     6,376      cycles                    #    0.719 GHz
            watchdog/1-14                    5,340      cycles                    #    0.630 GHz
            watchdog/5-38                    4,061      cycles                    #    2.749 GHz
            watchdog/6-44                    3,976      cycles                    #    2.667 GHz
            watchdog/7-50                    3,418      cycles                    #    1.161 GHz
                vmstat-23127             2,511,699      instructions              #    1.48  insn per cycle
                  perf-24165             1,829,908      instructions              #    0.21  insn per cycle
            irqbalance-2780              1,190,204      instructions              #    1.61  insn per cycle
              thermald-2841                143,544      instructions              #    0.70  insn per cycle
                  sshd-23111               128,138      instructions              #    0.48  insn per cycle
                  sshd-23058                57,654      instructions              #    0.36  insn per cycle
             rcu_sched-8                    44,063      instructions              #    0.60  insn per cycle
         kworker/u16:1-18249                42,551      instructions              #    0.42  insn per cycle
           kworker/0:2-19991                25,873      instructions              #    0.23  insn per cycle
         kworker/u16:2-23146                21,407      instructions              #    0.38  insn per cycle
                 gmain-2700                 13,691      instructions              #    0.40  insn per cycle
           kworker/4:1-15354                12,964      instructions              #    0.63  insn per cycle
           kworker/6:0-17528                10,034      instructions              #    0.61  insn per cycle
           kworker/5:2-31362                 5,203      instructions              #    0.55  insn per cycle
           kworker/3:2-12870                 4,866      instructions              #    0.67  insn per cycle
          kworker/4:1H-1887                  3,586      instructions              #    0.44  insn per cycle
           ksoftirqd/0-7                     3,463      instructions              #    0.54  insn per cycle
            watchdog/0-11                    3,135      instructions              #    0.45  insn per cycle
            watchdog/1-14                    3,135      instructions              #    0.59  insn per cycle
            watchdog/2-20                    3,135      instructions              #    0.43  insn per cycle
            watchdog/3-26                    3,135      instructions              #    0.36  insn per cycle
            watchdog/4-32                    3,135      instructions              #    0.41  insn per cycle
            watchdog/5-38                    3,135      instructions              #    0.77  insn per cycle
            watchdog/6-44                    3,135      instructions              #    0.79  insn per cycle
            watchdog/7-50                    3,135      instructions              #    0.92  insn per cycle
                vmstat-23127               539,181      branches                  #  345.139 M/sec
                  perf-24165               375,364      branches                  #   87.245 M/sec
            irqbalance-2780                262,092      branches                  #  316.593 M/sec
              thermald-2841                 31,611      branches                  #  136.915 M/sec
                  sshd-23111                21,874      branches                  #   78.596 M/sec
                  sshd-23058                10,682      branches                  #   51.528 M/sec
             rcu_sched-8                     8,693      branches                  #  101.633 M/sec
         kworker/u16:1-18249                 7,891      branches                  #   62.808 M/sec
           kworker/0:2-19991                 5,761      branches                  #   42.998 M/sec
         kworker/u16:2-23146                 4,099      branches                  #   53.138 M/sec
           kworker/4:1-15354                 2,755      branches                  #   97.110 M/sec
                 gmain-2700                  2,638      branches                  #   63.127 M/sec
           kworker/6:0-17528                 2,216      branches                  #   92.739 M/sec
           kworker/5:2-31362                 1,132      branches                  #   97.360 M/sec
           kworker/3:2-12870                 1,081      branches                  #  105.773 M/sec
          kworker/4:1H-1887                    725      branches                  #   54.887 M/sec
           ksoftirqd/0-7                       707      branches                  #   79.716 M/sec
            watchdog/0-11                      652      branches                  #   59.860 M/sec
            watchdog/1-14                      652      branches                  #   76.923 M/sec
            watchdog/2-20                      652      branches                  #  268.423 M/sec
            watchdog/3-26                      652      branches                  #  225.372 M/sec
            watchdog/4-32                      652      branches                  #  236.318 M/sec
            watchdog/5-38                      652      branches                  #  441.435 M/sec
            watchdog/6-44                      652      branches                  #  437.290 M/sec
            watchdog/7-50                      652      branches                  #  221.467 M/sec
                vmstat-23127                 8,960      branch-misses             #    1.66% of all branches
            irqbalance-2780                  3,047      branch-misses             #    1.16% of all branches
                  perf-24165                 2,876      branch-misses             #    0.77% of all branches
                  sshd-23111                 1,843      branch-misses             #    8.43% of all branches
              thermald-2841                  1,444      branch-misses             #    4.57% of all branches
                  sshd-23058                 1,379      branch-misses             #   12.91% of all branches
         kworker/u16:1-18249                   982      branch-misses             #   12.44% of all branches
             rcu_sched-8                       893      branch-misses             #   10.27% of all branches
         kworker/u16:2-23146                   578      branch-misses             #   14.10% of all branches
           kworker/0:2-19991                   376      branch-misses             #    6.53% of all branches
                 gmain-2700                    280      branch-misses             #   10.61% of all branches
           kworker/6:0-17528                   196      branch-misses             #    8.84% of all branches
           kworker/4:1-15354                   187      branch-misses             #    6.79% of all branches
           kworker/5:2-31362                   123      branch-misses             #   10.87% of all branches
            watchdog/0-11                       95      branch-misses             #   14.57% of all branches
            watchdog/4-32                       89      branch-misses             #   13.65% of all branches
           kworker/3:2-12870                    80      branch-misses             #    7.40% of all branches
            watchdog/3-26                       61      branch-misses             #    9.36% of all branches
          kworker/4:1H-1887                     60      branch-misses             #    8.28% of all branches
            watchdog/2-20                       52      branch-misses             #    7.98% of all branches
           ksoftirqd/0-7                        47      branch-misses             #    6.65% of all branches
            watchdog/1-14                       46      branch-misses             #    7.06% of all branches
            watchdog/7-50                       13      branch-misses             #    1.99% of all branches
            watchdog/5-38                        8      branch-misses             #    1.23% of all branches
            watchdog/6-44                        7      branch-misses             #    1.07% of all branches
      
             3.695150786 seconds time elapsed
      
      root@skl:/tmp# perf stat --per-thread -M IPC,CPI
      ^C
      
       Performance counter stats for 'system wide':
      
                vmstat-23127             2,000,783      inst_retired.any          #      1.5 IPC
              thermald-2841              1,472,670      inst_retired.any          #      1.3 IPC
                  sshd-23111               977,374      inst_retired.any          #      1.2 IPC
                  perf-24163               483,779      inst_retired.any          #      0.2 IPC
                 gmain-2700                341,213      inst_retired.any          #      0.9 IPC
                  sshd-23058               148,891      inst_retired.any          #      0.8 IPC
          rtkit-daemon-3288                 71,210      inst_retired.any          #      0.7 IPC
         kworker/u16:1-18249                39,562      inst_retired.any          #      0.3 IPC
             rcu_sched-8                    14,474      inst_retired.any          #      0.8 IPC
           kworker/0:2-19991                 7,659      inst_retired.any          #      0.2 IPC
           kworker/4:1-15354                 6,714      inst_retired.any          #      0.8 IPC
          rtkit-daemon-3289                  4,839      inst_retired.any          #      0.3 IPC
           kworker/6:0-17528                 3,321      inst_retired.any          #      0.6 IPC
           kworker/5:2-31362                 3,215      inst_retired.any          #      0.5 IPC
           kworker/7:2-23145                 3,173      inst_retired.any          #      0.7 IPC
          kworker/4:1H-1887                  1,719      inst_retired.any          #      0.3 IPC
            watchdog/0-11                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/1-14                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/2-20                    1,479      inst_retired.any          #      0.4 IPC
            watchdog/3-26                    1,479      inst_retired.any          #      0.4 IPC
            watchdog/4-32                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/5-38                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/6-44                    1,479      inst_retired.any          #      0.7 IPC
            watchdog/7-50                    1,479      inst_retired.any          #      0.7 IPC
         kworker/u16:2-23146                 1,408      inst_retired.any          #      0.5 IPC
                  perf-24163             2,249,872      cpu_clk_unhalted.thread
                vmstat-23127             1,352,455      cpu_clk_unhalted.thread
              thermald-2841              1,161,140      cpu_clk_unhalted.thread
                  sshd-23111               807,827      cpu_clk_unhalted.thread
                 gmain-2700                375,535      cpu_clk_unhalted.thread
                  sshd-23058               194,071      cpu_clk_unhalted.thread
         kworker/u16:1-18249               114,306      cpu_clk_unhalted.thread
          rtkit-daemon-3288                103,547      cpu_clk_unhalted.thread
           kworker/0:2-19991                46,550      cpu_clk_unhalted.thread
             rcu_sched-8                    18,855      cpu_clk_unhalted.thread
          rtkit-daemon-3289                 17,549      cpu_clk_unhalted.thread
           kworker/4:1-15354                 8,812      cpu_clk_unhalted.thread
           kworker/5:2-31362                 6,812      cpu_clk_unhalted.thread
          kworker/4:1H-1887                  5,270      cpu_clk_unhalted.thread
           kworker/6:0-17528                 5,111      cpu_clk_unhalted.thread
           kworker/7:2-23145                 4,667      cpu_clk_unhalted.thread
            watchdog/0-11                    4,663      cpu_clk_unhalted.thread
            watchdog/1-14                    4,663      cpu_clk_unhalted.thread
            watchdog/4-32                    4,626      cpu_clk_unhalted.thread
            watchdog/5-38                    4,403      cpu_clk_unhalted.thread
            watchdog/3-26                    3,936      cpu_clk_unhalted.thread
            watchdog/2-20                    3,850      cpu_clk_unhalted.thread
         kworker/u16:2-23146                 2,654      cpu_clk_unhalted.thread
            watchdog/6-44                    2,017      cpu_clk_unhalted.thread
            watchdog/7-50                    2,017      cpu_clk_unhalted.thread
                vmstat-23127             2,000,783      inst_retired.any          #      0.7 CPI
              thermald-2841              1,472,670      inst_retired.any          #      0.8 CPI
                  sshd-23111               977,374      inst_retired.any          #      0.8 CPI
                  perf-24163               495,037      inst_retired.any          #      4.7 CPI
                 gmain-2700                341,213      inst_retired.any          #      1.1 CPI
                  sshd-23058               148,891      inst_retired.any          #      1.3 CPI
          rtkit-daemon-3288                 71,210      inst_retired.any          #      1.5 CPI
         kworker/u16:1-18249                39,562      inst_retired.any          #      2.9 CPI
             rcu_sched-8                    14,474      inst_retired.any          #      1.3 CPI
           kworker/0:2-19991                 7,659      inst_retired.any          #      6.1 CPI
           kworker/4:1-15354                 6,714      inst_retired.any          #      1.3 CPI
          rtkit-daemon-3289                  4,839      inst_retired.any          #      3.6 CPI
           kworker/6:0-17528                 3,321      inst_retired.any          #      1.5 CPI
           kworker/5:2-31362                 3,215      inst_retired.any          #      2.1 CPI
           kworker/7:2-23145                 3,173      inst_retired.any          #      1.5 CPI
          kworker/4:1H-1887                  1,719      inst_retired.any          #      3.1 CPI
            watchdog/0-11                    1,479      inst_retired.any          #      3.2 CPI
            watchdog/1-14                    1,479      inst_retired.any          #      3.2 CPI
            watchdog/2-20                    1,479      inst_retired.any          #      2.6 CPI
            watchdog/3-26                    1,479      inst_retired.any          #      2.7 CPI
            watchdog/4-32                    1,479      inst_retired.any          #      3.1 CPI
            watchdog/5-38                    1,479      inst_retired.any          #      3.0 CPI
            watchdog/6-44                    1,479      inst_retired.any          #      1.4 CPI
            watchdog/7-50                    1,479      inst_retired.any          #      1.4 CPI
         kworker/u16:2-23146                 1,408      inst_retired.any          #      1.9 CPI
                  perf-24163             2,302,323      cycles
                vmstat-23127             1,352,455      cycles
              thermald-2841              1,161,140      cycles
                  sshd-23111               807,827      cycles
                 gmain-2700                375,535      cycles
                  sshd-23058               194,071      cycles
         kworker/u16:1-18249               114,306      cycles
          rtkit-daemon-3288                103,547      cycles
           kworker/0:2-19991                46,550      cycles
             rcu_sched-8                    18,855      cycles
          rtkit-daemon-3289                 17,549      cycles
           kworker/4:1-15354                 8,812      cycles
           kworker/5:2-31362                 6,812      cycles
          kworker/4:1H-1887                  5,270      cycles
           kworker/6:0-17528                 5,111      cycles
           kworker/7:2-23145                 4,667      cycles
            watchdog/0-11                    4,663      cycles
            watchdog/1-14                    4,663      cycles
            watchdog/4-32                    4,626      cycles
            watchdog/5-38                    4,403      cycles
            watchdog/3-26                    3,936      cycles
            watchdog/2-20                    3,850      cycles
         kworker/u16:2-23146                 2,654      cycles
            watchdog/6-44                    2,017      cycles
            watchdog/7-50                    2,017      cycles
      
             2.175726600 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-12-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29734550
    • J
      perf stat: Remove --per-thread pid/tid limitation · 1d9f8d1b
      Jin Yao 提交于
      Currently, if we execute 'perf stat --per-thread' without specifying
      pid/tid, perf will return error.
      
      root@skl:/tmp# perf stat --per-thread
      The --per-thread option is only available when monitoring via -p -t options.
          -p, --pid <pid>       stat events on existing process id
          -t, --tid <tid>       stat events on existing thread id
      
      This patch removes this limitation. If no pid/tid specified, it returns
      all threads (get threads from /proc).
      
      Note that it doesn't support cpu_list yet so if it's a cpu_list case,
      then skip.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-11-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1d9f8d1b
    • J
      perf stat: Update or print per-thread stats · 14e72a21
      Jin Yao 提交于
      If the stats pointer in stat_config structure is not null, it will
      update the per-thread stats or print the per-thread stats on this
      buffer.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-9-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      14e72a21
    • J
      perf stat: Allocate shadow stats buffer for threads · 56739444
      Jin Yao 提交于
      After perf_evlist__create_maps() being executed, we can get all threads
      from /proc. And via thread_map__nr(), we can also get the number of
      threads.
      
      With the number of threads, the patch allocates a buffer which will
      record the shadow stats for these threads.
      
      The buffer pointer is saved in stat_config.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-8-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      56739444
    • J
      perf stat: Print per-thread shadow stats · e0128b30
      Jin Yao 提交于
      The function perf_stat__print_shadow_stats() is called to print the
      shadow stats on a set of static variables.
      
      But the static variables are the limitations to support
      per-thread shadow stats.
      
      This patch lets the perf_stat__print_shadow_stats() support
      to print the shadow stats from a input parameter 'st'.
      
      It will not directly get value from static variable. Instead,
      it now uses runtime_stat_avg() and runtime_stat_n() to get and
      compute the values.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
      [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0128b30
    • J
      perf stat: Update per-thread shadow stats · 1fcd0394
      Jin Yao 提交于
      The functions perf_stat__update_shadow_stats() is called to update the
      shadow stats on a set of static variables.
      
      But the static variables are the limitations to be extended to support
      per-thread shadow stats.
      
      This patch lets the perf_stat__update_shadow_stats() support to update
      the shadow stats on a input parameter 'st' and uses
      update_runtime_stat() to update the stats. It will not directly update
      the static variables as before.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
      [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fcd0394
  9. 30 11月, 2017 1 次提交
  10. 31 10月, 2017 3 次提交
  11. 27 10月, 2017 1 次提交
  12. 13 9月, 2017 5 次提交
    • A
      perf stat: Fall weak group back even for EBADF · 35c1980e
      Andi Kleen 提交于
      It's not possible to run a package event and a per cpu event in the same
      group. This is used by some of the power metrics.  They work correctly
      when not using a group.
      
      Normally weak groups should handle that, but in this case EBADF is
      returned instead of the normal EINVAL.
      
        $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
        Using CPUID GenuineIntel-6-3E
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
        perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
        perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)
      
      and perf errors out.
      
      Make weak groups trigger a fall back for EBADF too. Then this case works correctly:
      
        $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
        Using CPUID GenuineIntel-6-3E
        Weak group for cstate_pkg/c2-residency//2 failed
        cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
        msr/tsc/: 39625837911 12007369110 12007369110
      
         Performance counter stats for 'system wide':
      
               476,709,882      cstate_pkg/c2-residency/
            39,625,837,911      msr/tsc/
      
               1.000697588 seconds time elapsed
      
        This fixes perf stat -M Power ...
      
        $ perf stat -M Power --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Turbo_Utilization  C3_Core_Residency  C6_Core_Residency C7_Core_Residency  C2_Pkg_Residency   C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
             1.0                 0.7                30.0               0.0               0.9                 0.1               0.4                 0.0
      
               1.001240740 seconds time elapsed
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      35c1980e
    • A
      perf stat: Update walltime_nsecs_stats in interval mode · b90f1333
      Andi Kleen 提交于
      Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
      Compute it for each interval instead of only at the end.
      
      Pointed out by Jiri.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b90f1333
    • A
      perf stat: Hide internal duration_time counter · e864c5ca
      Andi Kleen 提交于
      Some perf stat metrics use an internal "duration_time" metric. It is not
      correctly printed however. So hide it during output to avoid confusing
      users with 0 counts.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e864c5ca
    • A
      perf stat: Support JSON metrics in perf stat · b18f3e36
      Andi Kleen 提交于
      Add generic support for standalone metrics specified in JSON files to
      perf stat. A metric is a formula that uses multiple events to compute a
      higher level result (e.g. IPC).
      
      Previously metrics were always tied to an event and automatically
      enabled with that event. But now change it that we can have standalone
      metrics. They are in the same JSON data structure as events, but don't
      have an event name.
      
      We also allow to organize the metrics in metric groups, which allows a
      short cut to select several related metrics at once.
      
      Add a new -M / --metrics option to perf stat that adds the metrics or
      metric groups specified.
      
      Add the core code to manage and parse the metric groups. They are
      collected from the JSON data structures into a separate rblist.  When
      computing shadow values look for metrics in that list.  Then they are
      computed using the existing saved values infrastructure in stat-shadow.c
      
      The actual JSON metrics are in a separate pull request.
      
        % perf stat -M Summary --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Instructions   CLKS          CPU_Utilization  GFLOPs   SMT_2T_Utilization   Kernel_Utilization
        317614222.0    1392930775.0  0.0              0.0      0.2                  0.1
      
             1.001497549 seconds time elapsed
      
        % perf stat -M GFLOPs flops
      
         Performance counter stats for 'flops':
      
           3,999,541,471  fp_comp_ops_exe.sse_scalar_single #  1.2 GFLOPs   (66.65%)
                      14  fp_comp_ops_exe.sse_scalar_double                 (66.65%)
                       0  fp_comp_ops_exe.sse_packed_double                 (66.67%)
                       0  fp_comp_ops_exe.sse_packed_single                 (66.70%)
                       0  simd_fp_256.packed_double                         (66.70%)
                       0  simd_fp_256.packed_single                         (66.67%)
                       0  duration_time
      
             3.238372845 seconds time elapsed
      
      v2: Add missing header file
      v3: Move find_map to pmu.c
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b18f3e36
    • A
      perf tools: Support weak groups in 'perf stat' · 5a5dfe4b
      Andi Kleen 提交于
      Setting up groups can be complicated due to the complicated scheduling
      restrictions of different PMUs.
      
      User tools usually don't understand all these restrictions.
      
      Still in many cases it is useful to set up groups and they work most of
      the time. However if the group is set up wrong some members will not
      report any value because they never get scheduled.
      
      Add a concept of a 'weak group': try to set up a group, but if it's not
      schedulable fallback to not using a group. That gives us the best of
      both worlds: groups if they work, but still a usable fallback if they
      don't.
      
      In theory it would be possible to have more complex fallback strategies
      (e.g. try to split the group in half), but the simple fallback of not
      using a group seems to work for now.
      
      So far the weak group is only implemented for perf stat, not for record.
      
      Here's an unschedulable group (on IvyBridge with SMT on)
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
              73,806,067      branches
               4,848,144      branch-misses             #    6.57% of all branches
              14,754,458      l1d.replacement
              24,905,558      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd         <------- will never report anything
      
      With the weak group:
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1
      
             125,366,055      branches                                                      (80.02%)
               9,208,402      branch-misses             #    7.35% of all branches          (80.01%)
              24,560,249      l1d.replacement                                               (80.00%)
              43,174,971      l2_lines_in.all                                               (80.05%)
              31,891,457      l2_rqsts.all_code_rd                                          (79.92%)
      
      The extra event scheduled with some extra multiplexing
      
      v2: Move fallback code to separate function.
      Add comment on for_each_group_member
      Adjust to new perf_evsel__close interface
      v3: Fix debug print out.
      
      Committer testing:
      
      Before:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
           <not counted>      branches
           <not counted>      branch-misses
           <not counted>      l1d.replacement
           <not counted>      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd
      
             1.002147212 seconds time elapsed
      
        # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
              83,207,892      branches
              11,065,444      l1d.replacement
              28,484,024      l2_lines_in.all
              12,186,179      l2_rqsts.all_code_rd
      
             1.001739493 seconds time elapsed
      
      After:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1
      
         Performance counter stats for 'system wide':
      
             543,323,909      branches                                                      (80.01%)
              27,100,512      branch-misses             #    4.99% of all branches          (80.02%)
              50,402,905      l1d.replacement                                               (80.03%)
              67,385,892      l2_lines_in.all                                               (80.01%)
              21,352,885      l2_rqsts.all_code_rd                                          (79.94%)
      
             1.001086658 seconds time elapsed
      
        #
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org
      [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a5dfe4b
  13. 12 9月, 2017 1 次提交
    • M
      perf stat: Wait for the correct child · dfc9eec7
      Milian Wolff 提交于
      When packaging the perf userland application into an AppImage, the
      wait() call in perf stat returned too early. It turned out that some
      other child process exited, but not the one perf stat launched:
      
        $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
        execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
        clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
        strace: Process 3912 attached
        [pid  3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
        strace: Process 3914 attached
        [pid  3912] +++ exited with 0 +++
        [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
        [pid  3914] clone(strace: Process 3915 attached
        child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
        [pid  3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
        [pid  3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
        strace: Process 3916 attached
        [pid  3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
        [pid  3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
         Performance counter stats for 'sleep 1':
      
             <not counted>	task-clock
             <not counted>	context-switches
             <not counted>	cpu-migrations
             <not counted>	page-faults
             <not counted>	cycles
             <not counted>	instructions
             <not counted>      branches
             <not counted>      branch-misses
      
               0.000047194 seconds time elapsed
      
        [pid  3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
        [pid  3916] +++ killed by SIGTERM +++
        [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
        [pid  3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
        [pid  3911] +++ exited with 0 +++
        [pid  3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
        [pid  3915] +++ exited with 0 +++
        +++ exited with 0 +++
      
      This patch uses waitpid instead to ensure the call waits for the
      debuggee application launched by 'perf stat'. This fixes 'perf stat'
      when launched from an AppImage:
      
        $ ./perf-x86_64.AppImage stat sleep 1
      
         Performance counter stats for 'sleep 1':
      
                0.357235      task-clock (msec)         #    0.000 CPUs utilized
                       1      context-switches          #    0.003 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      50      page-faults               #    0.140 M/sec
                 1269602      cycles                    #    3.554 GHz
                  654278      instructions              #    0.52  insn per cycle
                  129963      branches                  #  363.803 M/sec
                    7082      branch-misses             #    5.45% of all branches
      
             1.000633420 seconds time elapsed
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dfc9eec7
  14. 02 9月, 2017 1 次提交
    • A
      perf stat: Only auto-merge events that are PMU aliases · 63ce8449
      Arnaldo Carvalho de Melo 提交于
      Peter reported that when he explicitely asked for multiple events with
      the same name on the command line it got coalesced into just one line,
      i.e.:
      
         # perf stat -e cycles -e cycles -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
               3,269,652      cycles
      
             0.000884123 seconds time elapsed
      
        #
      
      And while there is the --no-merges option to disable that auto-merging,
      this is a blunt change in behaviour for such explicit request, so change
      the code so that this auto merging is done only when handling the multi
      PMU aliases with the same name that introduced this coalescing,
      restoring the previous behaviour for the explicit case:
      
        # perf stat -e cycles -e cycles -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
               1,472,837      cycles
               1,472,837      cycles
               1,472,837      cycles
      
             0.001764870 seconds time elapsed
      
        #
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 430daf2d ("perf stat: Collapse identically named events")
      Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63ce8449
  15. 27 7月, 2017 1 次提交
    • J
      perf stat: Use group read for event groups · 82bf311e
      Jiri Olsa 提交于
      Make perf stat use  group read if there  are groups defined. The group
      read will get the values for all member of groups within a single
      syscall instead of calling read syscall for every event.
      
      We can see considerable less amount of kernel cycles spent on single
      group read, than reading each event separately, like for following perf
      stat command:
      
        # perf stat -e {cycles,instructions} -I 10 -a sleep 1
      
      Monitored with "perf stat -r 5 -e '{cycles:u,cycles:k}'"
      
      Before:
      
              24,325,676      cycles:u
             297,040,775      cycles:k
      
             1.038554134 seconds time elapsed
      
      After:
              25,034,418      cycles:u
             158,256,395      cycles:k
      
             1.036864497 seconds time elapsed
      
      The perf_evsel__open fallback changes contributed by Andi Kleen.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170726120206.9099-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      82bf311e
  16. 27 6月, 2017 1 次提交
  17. 21 6月, 2017 1 次提交
    • K
      perf stat: Add support to measure SMI cost · daefd0bc
      Kan Liang 提交于
      Implementing a new --smi-cost mode in perf stat to measure SMI cost.
      
      During the measurement, the /sys/device/cpu/freeze_on_smi will be set.
      
      The measurement can be done with one counter (unhalted core cycles), and
      two free running MSR counters (IA32_APERF and SMI_COUNT).
      
      In practice, the percentages of SMI core cycles should be more useful
      than absolute value. So the output will be the percentage of SMI core
      cycles and SMI#. metric_only will be set by default.
      
      SMI cycles% = (aperf - unhalted core cycles) / aperf
      
      Here is an example output.
      
       Performance counter stats for 'sudo echo ':
      
      SMI cycles%          SMI#
          0.1%              1
      
             0.010858678 seconds time elapsed
      
      Users who wants to get the actual value can apply additional
      --no-metric-only.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      daefd0bc
  18. 02 6月, 2017 1 次提交
  19. 25 4月, 2017 2 次提交
  20. 21 4月, 2017 1 次提交
  21. 20 4月, 2017 4 次提交
  22. 13 4月, 2017 1 次提交
  23. 11 4月, 2017 1 次提交