1. 27 12月, 2017 7 次提交
    • P
      perf perf: Remove duplicate includes · 3315d14f
      Pravin Shedge 提交于
      These duplicate includes have been found with scripts/checkincludes.pl
      but they have been removed manually to avoid removing false positives.
      Signed-off-by: NPravin Shedge <pravin.shedge4linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3315d14f
    • J
      perf stat: Resort '--per-thread' result · 29734550
      Jin Yao 提交于
      There are many threads reported if we enable '--per-thread'
      globally.
      
      1. Most of the threads are not counted or counting value 0.
      This patch removes these threads.
      
      2. We also resort the threads in display according to the
      counting value. It's useful for user to see the hottest
      threads easily.
      
      For example, the new results would be:
      
      root@skl:/tmp# perf stat --per-thread
      ^C
       Performance counter stats for 'system wide':
      
                  perf-24165              4.302433      cpu-clock (msec)          #    0.001 CPUs utilized
                vmstat-23127              1.562215      cpu-clock (msec)          #    0.000 CPUs utilized
            irqbalance-2780               0.827851      cpu-clock (msec)          #    0.000 CPUs utilized
                  sshd-23111              0.278308      cpu-clock (msec)          #    0.000 CPUs utilized
              thermald-2841               0.230880      cpu-clock (msec)          #    0.000 CPUs utilized
                  sshd-23058              0.207306      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/0:2-19991              0.133983      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/u16:1-18249              0.125636      cpu-clock (msec)          #    0.000 CPUs utilized
             rcu_sched-8                  0.085533      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/u16:2-23146              0.077139      cpu-clock (msec)          #    0.000 CPUs utilized
                 gmain-2700               0.041789      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/4:1-15354              0.028370      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/6:0-17528              0.023895      cpu-clock (msec)          #    0.000 CPUs utilized
          kworker/4:1H-1887               0.013209      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/5:2-31362              0.011627      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/0-11                 0.010892      cpu-clock (msec)          #    0.000 CPUs utilized
           kworker/3:2-12870              0.010220      cpu-clock (msec)          #    0.000 CPUs utilized
           ksoftirqd/0-7                  0.008869      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/1-14                 0.008476      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/7-50                 0.002944      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/3-26                 0.002893      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/4-32                 0.002759      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/2-20                 0.002429      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/6-44                 0.001491      cpu-clock (msec)          #    0.000 CPUs utilized
            watchdog/5-38                 0.001477      cpu-clock (msec)          #    0.000 CPUs utilized
             rcu_sched-8                        10      context-switches          #    0.117 M/sec
         kworker/u16:1-18249                     7      context-switches          #    0.056 M/sec
                  sshd-23111                     4      context-switches          #    0.014 M/sec
                vmstat-23127                     4      context-switches          #    0.003 M/sec
                  perf-24165                     4      context-switches          #    0.930 K/sec
           kworker/0:2-19991                     3      context-switches          #    0.022 M/sec
         kworker/u16:2-23146                     3      context-switches          #    0.039 M/sec
           kworker/4:1-15354                     2      context-switches          #    0.070 M/sec
           kworker/6:0-17528                     2      context-switches          #    0.084 M/sec
                  sshd-23058                     2      context-switches          #    0.010 M/sec
           ksoftirqd/0-7                         1      context-switches          #    0.113 M/sec
            watchdog/0-11                        1      context-switches          #    0.092 M/sec
            watchdog/1-14                        1      context-switches          #    0.118 M/sec
            watchdog/2-20                        1      context-switches          #    0.412 M/sec
            watchdog/3-26                        1      context-switches          #    0.346 M/sec
            watchdog/4-32                        1      context-switches          #    0.362 M/sec
            watchdog/5-38                        1      context-switches          #    0.677 M/sec
            watchdog/6-44                        1      context-switches          #    0.671 M/sec
            watchdog/7-50                        1      context-switches          #    0.340 M/sec
          kworker/4:1H-1887                      1      context-switches          #    0.076 M/sec
              thermald-2841                      1      context-switches          #    0.004 M/sec
                 gmain-2700                      1      context-switches          #    0.024 M/sec
            irqbalance-2780                      1      context-switches          #    0.001 M/sec
           kworker/3:2-12870                     1      context-switches          #    0.098 M/sec
           kworker/5:2-31362                     1      context-switches          #    0.086 M/sec
         kworker/u16:1-18249                     2      cpu-migrations            #    0.016 M/sec
         kworker/u16:2-23146                     2      cpu-migrations            #    0.026 M/sec
             rcu_sched-8                         1      cpu-migrations            #    0.012 M/sec
                  sshd-23058                     1      cpu-migrations            #    0.005 M/sec
                  perf-24165             8,833,385      cycles                    #    2.053 GHz
                vmstat-23127             1,702,699      cycles                    #    1.090 GHz
            irqbalance-2780                739,847      cycles                    #    0.894 GHz
                  sshd-23111               269,506      cycles                    #    0.968 GHz
              thermald-2841                204,556      cycles                    #    0.886 GHz
                  sshd-23058               158,780      cycles                    #    0.766 GHz
           kworker/0:2-19991               112,981      cycles                    #    0.843 GHz
         kworker/u16:1-18249               100,926      cycles                    #    0.803 GHz
             rcu_sched-8                    74,024      cycles                    #    0.865 GHz
         kworker/u16:2-23146                55,984      cycles                    #    0.726 GHz
                 gmain-2700                 34,278      cycles                    #    0.820 GHz
           kworker/4:1-15354                20,665      cycles                    #    0.728 GHz
           kworker/6:0-17528                16,445      cycles                    #    0.688 GHz
           kworker/5:2-31362                 9,492      cycles                    #    0.816 GHz
            watchdog/3-26                    8,695      cycles                    #    3.006 GHz
          kworker/4:1H-1887                  8,238      cycles                    #    0.624 GHz
            watchdog/4-32                    7,580      cycles                    #    2.747 GHz
           kworker/3:2-12870                 7,306      cycles                    #    0.715 GHz
            watchdog/2-20                    7,274      cycles                    #    2.995 GHz
            watchdog/0-11                    6,988      cycles                    #    0.642 GHz
           ksoftirqd/0-7                     6,376      cycles                    #    0.719 GHz
            watchdog/1-14                    5,340      cycles                    #    0.630 GHz
            watchdog/5-38                    4,061      cycles                    #    2.749 GHz
            watchdog/6-44                    3,976      cycles                    #    2.667 GHz
            watchdog/7-50                    3,418      cycles                    #    1.161 GHz
                vmstat-23127             2,511,699      instructions              #    1.48  insn per cycle
                  perf-24165             1,829,908      instructions              #    0.21  insn per cycle
            irqbalance-2780              1,190,204      instructions              #    1.61  insn per cycle
              thermald-2841                143,544      instructions              #    0.70  insn per cycle
                  sshd-23111               128,138      instructions              #    0.48  insn per cycle
                  sshd-23058                57,654      instructions              #    0.36  insn per cycle
             rcu_sched-8                    44,063      instructions              #    0.60  insn per cycle
         kworker/u16:1-18249                42,551      instructions              #    0.42  insn per cycle
           kworker/0:2-19991                25,873      instructions              #    0.23  insn per cycle
         kworker/u16:2-23146                21,407      instructions              #    0.38  insn per cycle
                 gmain-2700                 13,691      instructions              #    0.40  insn per cycle
           kworker/4:1-15354                12,964      instructions              #    0.63  insn per cycle
           kworker/6:0-17528                10,034      instructions              #    0.61  insn per cycle
           kworker/5:2-31362                 5,203      instructions              #    0.55  insn per cycle
           kworker/3:2-12870                 4,866      instructions              #    0.67  insn per cycle
          kworker/4:1H-1887                  3,586      instructions              #    0.44  insn per cycle
           ksoftirqd/0-7                     3,463      instructions              #    0.54  insn per cycle
            watchdog/0-11                    3,135      instructions              #    0.45  insn per cycle
            watchdog/1-14                    3,135      instructions              #    0.59  insn per cycle
            watchdog/2-20                    3,135      instructions              #    0.43  insn per cycle
            watchdog/3-26                    3,135      instructions              #    0.36  insn per cycle
            watchdog/4-32                    3,135      instructions              #    0.41  insn per cycle
            watchdog/5-38                    3,135      instructions              #    0.77  insn per cycle
            watchdog/6-44                    3,135      instructions              #    0.79  insn per cycle
            watchdog/7-50                    3,135      instructions              #    0.92  insn per cycle
                vmstat-23127               539,181      branches                  #  345.139 M/sec
                  perf-24165               375,364      branches                  #   87.245 M/sec
            irqbalance-2780                262,092      branches                  #  316.593 M/sec
              thermald-2841                 31,611      branches                  #  136.915 M/sec
                  sshd-23111                21,874      branches                  #   78.596 M/sec
                  sshd-23058                10,682      branches                  #   51.528 M/sec
             rcu_sched-8                     8,693      branches                  #  101.633 M/sec
         kworker/u16:1-18249                 7,891      branches                  #   62.808 M/sec
           kworker/0:2-19991                 5,761      branches                  #   42.998 M/sec
         kworker/u16:2-23146                 4,099      branches                  #   53.138 M/sec
           kworker/4:1-15354                 2,755      branches                  #   97.110 M/sec
                 gmain-2700                  2,638      branches                  #   63.127 M/sec
           kworker/6:0-17528                 2,216      branches                  #   92.739 M/sec
           kworker/5:2-31362                 1,132      branches                  #   97.360 M/sec
           kworker/3:2-12870                 1,081      branches                  #  105.773 M/sec
          kworker/4:1H-1887                    725      branches                  #   54.887 M/sec
           ksoftirqd/0-7                       707      branches                  #   79.716 M/sec
            watchdog/0-11                      652      branches                  #   59.860 M/sec
            watchdog/1-14                      652      branches                  #   76.923 M/sec
            watchdog/2-20                      652      branches                  #  268.423 M/sec
            watchdog/3-26                      652      branches                  #  225.372 M/sec
            watchdog/4-32                      652      branches                  #  236.318 M/sec
            watchdog/5-38                      652      branches                  #  441.435 M/sec
            watchdog/6-44                      652      branches                  #  437.290 M/sec
            watchdog/7-50                      652      branches                  #  221.467 M/sec
                vmstat-23127                 8,960      branch-misses             #    1.66% of all branches
            irqbalance-2780                  3,047      branch-misses             #    1.16% of all branches
                  perf-24165                 2,876      branch-misses             #    0.77% of all branches
                  sshd-23111                 1,843      branch-misses             #    8.43% of all branches
              thermald-2841                  1,444      branch-misses             #    4.57% of all branches
                  sshd-23058                 1,379      branch-misses             #   12.91% of all branches
         kworker/u16:1-18249                   982      branch-misses             #   12.44% of all branches
             rcu_sched-8                       893      branch-misses             #   10.27% of all branches
         kworker/u16:2-23146                   578      branch-misses             #   14.10% of all branches
           kworker/0:2-19991                   376      branch-misses             #    6.53% of all branches
                 gmain-2700                    280      branch-misses             #   10.61% of all branches
           kworker/6:0-17528                   196      branch-misses             #    8.84% of all branches
           kworker/4:1-15354                   187      branch-misses             #    6.79% of all branches
           kworker/5:2-31362                   123      branch-misses             #   10.87% of all branches
            watchdog/0-11                       95      branch-misses             #   14.57% of all branches
            watchdog/4-32                       89      branch-misses             #   13.65% of all branches
           kworker/3:2-12870                    80      branch-misses             #    7.40% of all branches
            watchdog/3-26                       61      branch-misses             #    9.36% of all branches
          kworker/4:1H-1887                     60      branch-misses             #    8.28% of all branches
            watchdog/2-20                       52      branch-misses             #    7.98% of all branches
           ksoftirqd/0-7                        47      branch-misses             #    6.65% of all branches
            watchdog/1-14                       46      branch-misses             #    7.06% of all branches
            watchdog/7-50                       13      branch-misses             #    1.99% of all branches
            watchdog/5-38                        8      branch-misses             #    1.23% of all branches
            watchdog/6-44                        7      branch-misses             #    1.07% of all branches
      
             3.695150786 seconds time elapsed
      
      root@skl:/tmp# perf stat --per-thread -M IPC,CPI
      ^C
      
       Performance counter stats for 'system wide':
      
                vmstat-23127             2,000,783      inst_retired.any          #      1.5 IPC
              thermald-2841              1,472,670      inst_retired.any          #      1.3 IPC
                  sshd-23111               977,374      inst_retired.any          #      1.2 IPC
                  perf-24163               483,779      inst_retired.any          #      0.2 IPC
                 gmain-2700                341,213      inst_retired.any          #      0.9 IPC
                  sshd-23058               148,891      inst_retired.any          #      0.8 IPC
          rtkit-daemon-3288                 71,210      inst_retired.any          #      0.7 IPC
         kworker/u16:1-18249                39,562      inst_retired.any          #      0.3 IPC
             rcu_sched-8                    14,474      inst_retired.any          #      0.8 IPC
           kworker/0:2-19991                 7,659      inst_retired.any          #      0.2 IPC
           kworker/4:1-15354                 6,714      inst_retired.any          #      0.8 IPC
          rtkit-daemon-3289                  4,839      inst_retired.any          #      0.3 IPC
           kworker/6:0-17528                 3,321      inst_retired.any          #      0.6 IPC
           kworker/5:2-31362                 3,215      inst_retired.any          #      0.5 IPC
           kworker/7:2-23145                 3,173      inst_retired.any          #      0.7 IPC
          kworker/4:1H-1887                  1,719      inst_retired.any          #      0.3 IPC
            watchdog/0-11                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/1-14                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/2-20                    1,479      inst_retired.any          #      0.4 IPC
            watchdog/3-26                    1,479      inst_retired.any          #      0.4 IPC
            watchdog/4-32                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/5-38                    1,479      inst_retired.any          #      0.3 IPC
            watchdog/6-44                    1,479      inst_retired.any          #      0.7 IPC
            watchdog/7-50                    1,479      inst_retired.any          #      0.7 IPC
         kworker/u16:2-23146                 1,408      inst_retired.any          #      0.5 IPC
                  perf-24163             2,249,872      cpu_clk_unhalted.thread
                vmstat-23127             1,352,455      cpu_clk_unhalted.thread
              thermald-2841              1,161,140      cpu_clk_unhalted.thread
                  sshd-23111               807,827      cpu_clk_unhalted.thread
                 gmain-2700                375,535      cpu_clk_unhalted.thread
                  sshd-23058               194,071      cpu_clk_unhalted.thread
         kworker/u16:1-18249               114,306      cpu_clk_unhalted.thread
          rtkit-daemon-3288                103,547      cpu_clk_unhalted.thread
           kworker/0:2-19991                46,550      cpu_clk_unhalted.thread
             rcu_sched-8                    18,855      cpu_clk_unhalted.thread
          rtkit-daemon-3289                 17,549      cpu_clk_unhalted.thread
           kworker/4:1-15354                 8,812      cpu_clk_unhalted.thread
           kworker/5:2-31362                 6,812      cpu_clk_unhalted.thread
          kworker/4:1H-1887                  5,270      cpu_clk_unhalted.thread
           kworker/6:0-17528                 5,111      cpu_clk_unhalted.thread
           kworker/7:2-23145                 4,667      cpu_clk_unhalted.thread
            watchdog/0-11                    4,663      cpu_clk_unhalted.thread
            watchdog/1-14                    4,663      cpu_clk_unhalted.thread
            watchdog/4-32                    4,626      cpu_clk_unhalted.thread
            watchdog/5-38                    4,403      cpu_clk_unhalted.thread
            watchdog/3-26                    3,936      cpu_clk_unhalted.thread
            watchdog/2-20                    3,850      cpu_clk_unhalted.thread
         kworker/u16:2-23146                 2,654      cpu_clk_unhalted.thread
            watchdog/6-44                    2,017      cpu_clk_unhalted.thread
            watchdog/7-50                    2,017      cpu_clk_unhalted.thread
                vmstat-23127             2,000,783      inst_retired.any          #      0.7 CPI
              thermald-2841              1,472,670      inst_retired.any          #      0.8 CPI
                  sshd-23111               977,374      inst_retired.any          #      0.8 CPI
                  perf-24163               495,037      inst_retired.any          #      4.7 CPI
                 gmain-2700                341,213      inst_retired.any          #      1.1 CPI
                  sshd-23058               148,891      inst_retired.any          #      1.3 CPI
          rtkit-daemon-3288                 71,210      inst_retired.any          #      1.5 CPI
         kworker/u16:1-18249                39,562      inst_retired.any          #      2.9 CPI
             rcu_sched-8                    14,474      inst_retired.any          #      1.3 CPI
           kworker/0:2-19991                 7,659      inst_retired.any          #      6.1 CPI
           kworker/4:1-15354                 6,714      inst_retired.any          #      1.3 CPI
          rtkit-daemon-3289                  4,839      inst_retired.any          #      3.6 CPI
           kworker/6:0-17528                 3,321      inst_retired.any          #      1.5 CPI
           kworker/5:2-31362                 3,215      inst_retired.any          #      2.1 CPI
           kworker/7:2-23145                 3,173      inst_retired.any          #      1.5 CPI
          kworker/4:1H-1887                  1,719      inst_retired.any          #      3.1 CPI
            watchdog/0-11                    1,479      inst_retired.any          #      3.2 CPI
            watchdog/1-14                    1,479      inst_retired.any          #      3.2 CPI
            watchdog/2-20                    1,479      inst_retired.any          #      2.6 CPI
            watchdog/3-26                    1,479      inst_retired.any          #      2.7 CPI
            watchdog/4-32                    1,479      inst_retired.any          #      3.1 CPI
            watchdog/5-38                    1,479      inst_retired.any          #      3.0 CPI
            watchdog/6-44                    1,479      inst_retired.any          #      1.4 CPI
            watchdog/7-50                    1,479      inst_retired.any          #      1.4 CPI
         kworker/u16:2-23146                 1,408      inst_retired.any          #      1.9 CPI
                  perf-24163             2,302,323      cycles
                vmstat-23127             1,352,455      cycles
              thermald-2841              1,161,140      cycles
                  sshd-23111               807,827      cycles
                 gmain-2700                375,535      cycles
                  sshd-23058               194,071      cycles
         kworker/u16:1-18249               114,306      cycles
          rtkit-daemon-3288                103,547      cycles
           kworker/0:2-19991                46,550      cycles
             rcu_sched-8                    18,855      cycles
          rtkit-daemon-3289                 17,549      cycles
           kworker/4:1-15354                 8,812      cycles
           kworker/5:2-31362                 6,812      cycles
          kworker/4:1H-1887                  5,270      cycles
           kworker/6:0-17528                 5,111      cycles
           kworker/7:2-23145                 4,667      cycles
            watchdog/0-11                    4,663      cycles
            watchdog/1-14                    4,663      cycles
            watchdog/4-32                    4,626      cycles
            watchdog/5-38                    4,403      cycles
            watchdog/3-26                    3,936      cycles
            watchdog/2-20                    3,850      cycles
         kworker/u16:2-23146                 2,654      cycles
            watchdog/6-44                    2,017      cycles
            watchdog/7-50                    2,017      cycles
      
             2.175726600 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-12-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29734550
    • J
      perf stat: Remove --per-thread pid/tid limitation · 1d9f8d1b
      Jin Yao 提交于
      Currently, if we execute 'perf stat --per-thread' without specifying
      pid/tid, perf will return error.
      
      root@skl:/tmp# perf stat --per-thread
      The --per-thread option is only available when monitoring via -p -t options.
          -p, --pid <pid>       stat events on existing process id
          -t, --tid <tid>       stat events on existing thread id
      
      This patch removes this limitation. If no pid/tid specified, it returns
      all threads (get threads from /proc).
      
      Note that it doesn't support cpu_list yet so if it's a cpu_list case,
      then skip.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-11-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1d9f8d1b
    • J
      perf stat: Update or print per-thread stats · 14e72a21
      Jin Yao 提交于
      If the stats pointer in stat_config structure is not null, it will
      update the per-thread stats or print the per-thread stats on this
      buffer.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-9-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      14e72a21
    • J
      perf stat: Allocate shadow stats buffer for threads · 56739444
      Jin Yao 提交于
      After perf_evlist__create_maps() being executed, we can get all threads
      from /proc. And via thread_map__nr(), we can also get the number of
      threads.
      
      With the number of threads, the patch allocates a buffer which will
      record the shadow stats for these threads.
      
      The buffer pointer is saved in stat_config.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-8-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      56739444
    • J
      perf stat: Print per-thread shadow stats · e0128b30
      Jin Yao 提交于
      The function perf_stat__print_shadow_stats() is called to print the
      shadow stats on a set of static variables.
      
      But the static variables are the limitations to support
      per-thread shadow stats.
      
      This patch lets the perf_stat__print_shadow_stats() support
      to print the shadow stats from a input parameter 'st'.
      
      It will not directly get value from static variable. Instead,
      it now uses runtime_stat_avg() and runtime_stat_n() to get and
      compute the values.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
      [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0128b30
    • J
      perf stat: Update per-thread shadow stats · 1fcd0394
      Jin Yao 提交于
      The functions perf_stat__update_shadow_stats() is called to update the
      shadow stats on a set of static variables.
      
      But the static variables are the limitations to be extended to support
      per-thread shadow stats.
      
      This patch lets the perf_stat__update_shadow_stats() support to update
      the shadow stats on a input parameter 'st' and uses
      update_runtime_stat() to update the stats. It will not directly update
      the static variables as before.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
      [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fcd0394
  2. 30 11月, 2017 1 次提交
  3. 31 10月, 2017 3 次提交
  4. 27 10月, 2017 1 次提交
  5. 13 9月, 2017 5 次提交
    • A
      perf stat: Fall weak group back even for EBADF · 35c1980e
      Andi Kleen 提交于
      It's not possible to run a package event and a per cpu event in the same
      group. This is used by some of the power metrics.  They work correctly
      when not using a group.
      
      Normally weak groups should handle that, but in this case EBADF is
      returned instead of the normal EINVAL.
      
        $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
        Using CPUID GenuineIntel-6-3E
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
        perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
        perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
        perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)
      
      and perf errors out.
      
      Make weak groups trigger a fall back for EBADF too. Then this case works correctly:
      
        $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
        Using CPUID GenuineIntel-6-3E
        Weak group for cstate_pkg/c2-residency//2 failed
        cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
        msr/tsc/: 39625837911 12007369110 12007369110
      
         Performance counter stats for 'system wide':
      
               476,709,882      cstate_pkg/c2-residency/
            39,625,837,911      msr/tsc/
      
               1.000697588 seconds time elapsed
      
        This fixes perf stat -M Power ...
      
        $ perf stat -M Power --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Turbo_Utilization  C3_Core_Residency  C6_Core_Residency C7_Core_Residency  C2_Pkg_Residency   C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
             1.0                 0.7                30.0               0.0               0.9                 0.1               0.4                 0.0
      
               1.001240740 seconds time elapsed
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      35c1980e
    • A
      perf stat: Update walltime_nsecs_stats in interval mode · b90f1333
      Andi Kleen 提交于
      Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
      Compute it for each interval instead of only at the end.
      
      Pointed out by Jiri.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b90f1333
    • A
      perf stat: Hide internal duration_time counter · e864c5ca
      Andi Kleen 提交于
      Some perf stat metrics use an internal "duration_time" metric. It is not
      correctly printed however. So hide it during output to avoid confusing
      users with 0 counts.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e864c5ca
    • A
      perf stat: Support JSON metrics in perf stat · b18f3e36
      Andi Kleen 提交于
      Add generic support for standalone metrics specified in JSON files to
      perf stat. A metric is a formula that uses multiple events to compute a
      higher level result (e.g. IPC).
      
      Previously metrics were always tied to an event and automatically
      enabled with that event. But now change it that we can have standalone
      metrics. They are in the same JSON data structure as events, but don't
      have an event name.
      
      We also allow to organize the metrics in metric groups, which allows a
      short cut to select several related metrics at once.
      
      Add a new -M / --metrics option to perf stat that adds the metrics or
      metric groups specified.
      
      Add the core code to manage and parse the metric groups. They are
      collected from the JSON data structures into a separate rblist.  When
      computing shadow values look for metrics in that list.  Then they are
      computed using the existing saved values infrastructure in stat-shadow.c
      
      The actual JSON metrics are in a separate pull request.
      
        % perf stat -M Summary --metric-only -a sleep 1
      
         Performance counter stats for 'system wide':
      
        Instructions   CLKS          CPU_Utilization  GFLOPs   SMT_2T_Utilization   Kernel_Utilization
        317614222.0    1392930775.0  0.0              0.0      0.2                  0.1
      
             1.001497549 seconds time elapsed
      
        % perf stat -M GFLOPs flops
      
         Performance counter stats for 'flops':
      
           3,999,541,471  fp_comp_ops_exe.sse_scalar_single #  1.2 GFLOPs   (66.65%)
                      14  fp_comp_ops_exe.sse_scalar_double                 (66.65%)
                       0  fp_comp_ops_exe.sse_packed_double                 (66.67%)
                       0  fp_comp_ops_exe.sse_packed_single                 (66.70%)
                       0  simd_fp_256.packed_double                         (66.70%)
                       0  simd_fp_256.packed_single                         (66.67%)
                       0  duration_time
      
             3.238372845 seconds time elapsed
      
      v2: Add missing header file
      v3: Move find_map to pmu.c
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b18f3e36
    • A
      perf tools: Support weak groups in 'perf stat' · 5a5dfe4b
      Andi Kleen 提交于
      Setting up groups can be complicated due to the complicated scheduling
      restrictions of different PMUs.
      
      User tools usually don't understand all these restrictions.
      
      Still in many cases it is useful to set up groups and they work most of
      the time. However if the group is set up wrong some members will not
      report any value because they never get scheduled.
      
      Add a concept of a 'weak group': try to set up a group, but if it's not
      schedulable fallback to not using a group. That gives us the best of
      both worlds: groups if they work, but still a usable fallback if they
      don't.
      
      In theory it would be possible to have more complex fallback strategies
      (e.g. try to split the group in half), but the simple fallback of not
      using a group seems to work for now.
      
      So far the weak group is only implemented for perf stat, not for record.
      
      Here's an unschedulable group (on IvyBridge with SMT on)
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
              73,806,067      branches
               4,848,144      branch-misses             #    6.57% of all branches
              14,754,458      l1d.replacement
              24,905,558      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd         <------- will never report anything
      
      With the weak group:
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1
      
             125,366,055      branches                                                      (80.02%)
               9,208,402      branch-misses             #    7.35% of all branches          (80.01%)
              24,560,249      l1d.replacement                                               (80.00%)
              43,174,971      l2_lines_in.all                                               (80.05%)
              31,891,457      l2_rqsts.all_code_rd                                          (79.92%)
      
      The extra event scheduled with some extra multiplexing
      
      v2: Move fallback code to separate function.
      Add comment on for_each_group_member
      Adjust to new perf_evsel__close interface
      v3: Fix debug print out.
      
      Committer testing:
      
      Before:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
           <not counted>      branches
           <not counted>      branch-misses
           <not counted>      l1d.replacement
           <not counted>      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd
      
             1.002147212 seconds time elapsed
      
        # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
              83,207,892      branches
              11,065,444      l1d.replacement
              28,484,024      l2_lines_in.all
              12,186,179      l2_rqsts.all_code_rd
      
             1.001739493 seconds time elapsed
      
      After:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1
      
         Performance counter stats for 'system wide':
      
             543,323,909      branches                                                      (80.01%)
              27,100,512      branch-misses             #    4.99% of all branches          (80.02%)
              50,402,905      l1d.replacement                                               (80.03%)
              67,385,892      l2_lines_in.all                                               (80.01%)
              21,352,885      l2_rqsts.all_code_rd                                          (79.94%)
      
             1.001086658 seconds time elapsed
      
        #
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org
      [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a5dfe4b
  6. 12 9月, 2017 1 次提交
    • M
      perf stat: Wait for the correct child · dfc9eec7
      Milian Wolff 提交于
      When packaging the perf userland application into an AppImage, the
      wait() call in perf stat returned too early. It turned out that some
      other child process exited, but not the one perf stat launched:
      
        $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
        execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
        clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
        strace: Process 3912 attached
        [pid  3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
        strace: Process 3914 attached
        [pid  3912] +++ exited with 0 +++
        [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
        [pid  3914] clone(strace: Process 3915 attached
        child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
        [pid  3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
        [pid  3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
        strace: Process 3916 attached
        [pid  3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
        [pid  3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
        [pid  3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
         Performance counter stats for 'sleep 1':
      
             <not counted>	task-clock
             <not counted>	context-switches
             <not counted>	cpu-migrations
             <not counted>	page-faults
             <not counted>	cycles
             <not counted>	instructions
             <not counted>      branches
             <not counted>      branch-misses
      
               0.000047194 seconds time elapsed
      
        [pid  3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
        [pid  3916] +++ killed by SIGTERM +++
        [pid  3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
        [pid  3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
        [pid  3911] +++ exited with 0 +++
        [pid  3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
        [pid  3915] +++ exited with 0 +++
        +++ exited with 0 +++
      
      This patch uses waitpid instead to ensure the call waits for the
      debuggee application launched by 'perf stat'. This fixes 'perf stat'
      when launched from an AppImage:
      
        $ ./perf-x86_64.AppImage stat sleep 1
      
         Performance counter stats for 'sleep 1':
      
                0.357235      task-clock (msec)         #    0.000 CPUs utilized
                       1      context-switches          #    0.003 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      50      page-faults               #    0.140 M/sec
                 1269602      cycles                    #    3.554 GHz
                  654278      instructions              #    0.52  insn per cycle
                  129963      branches                  #  363.803 M/sec
                    7082      branch-misses             #    5.45% of all branches
      
             1.000633420 seconds time elapsed
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dfc9eec7
  7. 02 9月, 2017 1 次提交
    • A
      perf stat: Only auto-merge events that are PMU aliases · 63ce8449
      Arnaldo Carvalho de Melo 提交于
      Peter reported that when he explicitely asked for multiple events with
      the same name on the command line it got coalesced into just one line,
      i.e.:
      
         # perf stat -e cycles -e cycles -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
               3,269,652      cycles
      
             0.000884123 seconds time elapsed
      
        #
      
      And while there is the --no-merges option to disable that auto-merging,
      this is a blunt change in behaviour for such explicit request, so change
      the code so that this auto merging is done only when handling the multi
      PMU aliases with the same name that introduced this coalescing,
      restoring the previous behaviour for the explicit case:
      
        # perf stat -e cycles -e cycles -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
               1,472,837      cycles
               1,472,837      cycles
               1,472,837      cycles
      
             0.001764870 seconds time elapsed
      
        #
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 430daf2d ("perf stat: Collapse identically named events")
      Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63ce8449
  8. 27 7月, 2017 1 次提交
    • J
      perf stat: Use group read for event groups · 82bf311e
      Jiri Olsa 提交于
      Make perf stat use  group read if there  are groups defined. The group
      read will get the values for all member of groups within a single
      syscall instead of calling read syscall for every event.
      
      We can see considerable less amount of kernel cycles spent on single
      group read, than reading each event separately, like for following perf
      stat command:
      
        # perf stat -e {cycles,instructions} -I 10 -a sleep 1
      
      Monitored with "perf stat -r 5 -e '{cycles:u,cycles:k}'"
      
      Before:
      
              24,325,676      cycles:u
             297,040,775      cycles:k
      
             1.038554134 seconds time elapsed
      
      After:
              25,034,418      cycles:u
             158,256,395      cycles:k
      
             1.036864497 seconds time elapsed
      
      The perf_evsel__open fallback changes contributed by Andi Kleen.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170726120206.9099-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      82bf311e
  9. 27 6月, 2017 1 次提交
  10. 21 6月, 2017 1 次提交
    • K
      perf stat: Add support to measure SMI cost · daefd0bc
      Kan Liang 提交于
      Implementing a new --smi-cost mode in perf stat to measure SMI cost.
      
      During the measurement, the /sys/device/cpu/freeze_on_smi will be set.
      
      The measurement can be done with one counter (unhalted core cycles), and
      two free running MSR counters (IA32_APERF and SMI_COUNT).
      
      In practice, the percentages of SMI core cycles should be more useful
      than absolute value. So the output will be the percentage of SMI core
      cycles and SMI#. metric_only will be set by default.
      
      SMI cycles% = (aperf - unhalted core cycles) / aperf
      
      Here is an example output.
      
       Performance counter stats for 'sudo echo ':
      
      SMI cycles%          SMI#
          0.1%              1
      
             0.010858678 seconds time elapsed
      
      Users who wants to get the actual value can apply additional
      --no-metric-only.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      daefd0bc
  11. 02 6月, 2017 1 次提交
  12. 25 4月, 2017 2 次提交
  13. 21 4月, 2017 1 次提交
  14. 20 4月, 2017 4 次提交
  15. 13 4月, 2017 1 次提交
  16. 11 4月, 2017 1 次提交
  17. 27 3月, 2017 1 次提交
  18. 23 3月, 2017 1 次提交
    • A
      perf stat: Output JSON MetricExpr metric · 37932c18
      Andi Kleen 提交于
      Add generic infrastructure to perf stat to output ratios for
      "MetricExpr" entries in the event lists. Many events are more useful as
      ratios than in raw form, typically some count in relation to total
      ticks.
      
      Transfer the MetricExpr information from the alias to the evsel.
      
      We mark the events that need to be collected for MetricExpr, and also
      link the events using them with a pointer. The code is careful to always
      prefer the right event in the same group to minimize multiplexing
      errors. At the moment only a single relation is supported.
      
      Then add a rblist to the stat shadow code that remembers stats based on
      the cpu and context.
      
      Then finally update and retrieve and print these values similarly to the
      existing hardcoded perf metrics. We use the simple expression parser
      added earlier to evaluate the expression.
      
      Normally we just output the result without further commentary, but for
      --metric-only this would lead to empty columns. So for this case use the
      original event as description.
      
      There is no attempt to automatically add the MetricExpr event, if it is
      missing, however we suggest it to the user, because the user tool
      doesn't have enough information to reliably construct a group that is
      guaranteed to schedule. So we leave that to the user.
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
             1.000147889        800,085,181      unc_p_clockticks
             1.000147889         93,126,241      unc_p_freq_max_os_cycles  #     11.6
             2.000448381        800,218,217      unc_p_clockticks
             2.000448381        142,516,095      unc_p_freq_max_os_cycles  #     17.8
             3.000639852        800,243,057      unc_p_clockticks
             3.000639852        162,292,689      unc_p_freq_max_os_cycles  #     20.3
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
        #    time         freq_max_os_cycles %
             1.000127077      0.9
             2.000301436      0.7
             3.000456379      0.0
      
      v2: Change from DivideBy to MetricExpr
      v3: Use expr__ prefix.  Support more than one other event.
      v4: Update description
      v5: Only print warning message once for multiple PMUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37932c18
  19. 22 3月, 2017 3 次提交
  20. 04 3月, 2017 2 次提交
    • J
      perf tools: Force uncore events to system wide monitoring · e3ba76de
      Jiri Olsa 提交于
      Make system wide (-a) the default option if no target was specified and
      one of following conditions is met:
      
        - there's no workload specified (current behaviour)
        - there is workload specified but all requested
          events are system wide ones
      
      Mixed events core/uncore with workload:
      
        $ perf stat -e 'uncore_cbox_0/clockticks/,cycles' sleep 1
      
         Performance counter stats for 'sleep 1':
      
           <not supported>      uncore_cbox_0/clockticks/
                   980,489      cycles
      
               1.000897406 seconds time elapsed
      
      Uncore event with workload:
      
        $ perf stat -e 'uncore_cbox_0/clockticks/' sleep 1
      
         Performance counter stats for 'system wide':
      
        281,473,897,192,670      uncore_cbox_0/clockticks/
      
               1.000833784 seconds time elapsed
      
      Committer note:
      
      When testing I realized the default case for !root, i.e. no events
      passed via -e, was broke by v2 of this patch, reported and after a
      patch provided by Jiri it is back working:
      
        [acme@jouet linux]$ perf stat usleep 1
      
         Performance counter stats for 'usleep 1':
      
               0.401335      task-clock:u (msec)     #   0.297 CPUs utilized
                      0      context-switches:u      #   0.000 K/sec
                      0      cpu-migrations:u        #   0.000 K/sec
                     48      page-faults:u           #   0.120 M/sec
                458,146      cycles:u                #   1.142 GHz
                245,113      instructions:u          #   0.54  insn per cycle
                 47,991      branches:u              # 119.578 M/sec
                  4,022      branch-misses:u         #   8.38% of all branches
      
            0.001350029 seconds time elapsed
      
        [acme@jouet linux]$
      Suggested-and-Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170227094818.GA12764@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e3ba76de
    • B
      perf stat: Issue a HW watchdog disable hint · 02d492e5
      Borislav Petkov 提交于
      When using perf stat on an AMD F15h system with the default hw events
      attributes, some of the events don't get counted:
      
       Performance counter stats for 'sleep 1':
      
                0.749208      task-clock (msec)         #    0.001 CPUs utilized
                       1      context-switches          #    0.001 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.072 M/sec
               1,122,815      cycles                    #    1.499 GHz
                 286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
           <not counted>      stalled-cycles-backend                                        (0.00%)
           ^^^^^^^^^^^^
           <not counted>      instructions                                                  (0.00%)
           ^^^^^^^^^^^^
           <not counted>      branches                                                      (0.00%)
           <not counted>      branch-misses                                                 (0.00%)
      
             1.001550070 seconds time elapsed
      
      The reason is that we have the HW watchdog consuming one PMU counter and
      when perf tries to schedule 6 events on 6 counters and some of those
      counters are constrained to only a specific subset of PMCs by the
      hardware, the event scheduling fails.
      
      So issue a hint to disable the HW watchdog around a perf stat session.
      
      Committer note:
      
      Testing it...
      
        # perf stat -d usleep 1
      
         Performance counter stats for 'usleep 1':
      
                1.180203      task-clock (msec)         #    0.490 CPUs utilized
                       1      context-switches          #    0.847 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                      54      page-faults               #    0.046 M/sec
                 184,754      cycles                    #    0.157 GHz
                 714,553      instructions              #    3.87  insn per cycle
                 154,661      branches                  #  131.046 M/sec
                   7,247      branch-misses             #    4.69% of all branches
                 219,984      L1-dcache-loads           #  186.395 M/sec
                  17,600      L1-dcache-load-misses     #    8.00% of all L1-dcache hits    (90.16%)
           <not counted>      LLC-loads                                                     (0.00%)
           <not counted>      LLC-load-misses                                               (0.00%)
      
             0.002406823 seconds time elapsed
      
        Some events weren't counted. Try disabling the NMI watchdog:
      	echo 0 > /proc/sys/kernel/nmi_watchdog
      	perf stat ...
      	echo 1 > /proc/sys/kernel/nmi_watchdog
        #
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Vince Weaver <vince@deater.net>
      Link: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnicSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      02d492e5
  21. 20 2月, 2017 1 次提交