1. 24 9月, 2012 14 次提交
  2. 21 9月, 2012 3 次提交
    • X
      perf kvm: Events analysis tool · bcf6edcd
      Xiao Guangrong 提交于
      Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
      
      Usage:
      - kvm stat
        run a command and gather performance counter statistics, it is the alias of
        perf stat
      
      - trace kvm events:
        perf kvm stat record, or, if other tracepoints are interesting as well, we
        can append the events like this:
        perf kvm stat record -e timer:* -a
      
        If many guests are running, we can track the specified guest by using -p or
        --pid, -a is used to track events generated by all guests.
      
      - show the result:
        perf kvm stat report
      
      The output example is following:
      13005
      13059
      
      total 2 guests are running on the host
      
      Then, track the guest whose pid is 13059:
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.253 MB perf.data.guest (~11065 samples) ]
      
      See the vmexit events:
      
      Analyze events for all VCPUs:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
               APIC_ACCESS        460    70.55%     0.01%     22.44us ( +-   1.75% )
                       HLT         93    14.26%    99.98% 832077.26us ( +-  10.42% )
        EXTERNAL_INTERRUPT         64     9.82%     0.00%     35.35us ( +-  14.21% )
         PENDING_INTERRUPT         24     3.68%     0.00%      9.29us ( +-  31.39% )
                 CR_ACCESS          7     1.07%     0.00%      8.12us ( +-   5.76% )
            IO_INSTRUCTION          3     0.46%     0.00%     18.00us ( +-  11.79% )
             EXCEPTION_NMI          1     0.15%     0.00%      5.83us ( +-   -nan% )
      
      Total Samples:652, Total events handled time:77396109.80us.
      
      See the mmio events:
      
      Analyze events for all VCPUs:
      
               MMIO Access    Samples  Samples%     Time%         Avg time
      
              0xfee00380:W        387    84.31%    79.28%      8.29us ( +-   3.32% )
              0xfee00300:W         24     5.23%     9.96%     16.79us ( +-   1.97% )
              0xfee00300:R         24     5.23%     7.83%     13.20us ( +-   3.00% )
              0xfee00310:W         24     5.23%     2.93%      4.94us ( +-   3.84% )
      
      Total Samples:459, Total events handled time:4044.59us.
      
      See the ioport event:
      
      Analyze events for all VCPUs:
      
            IO Port Access    Samples  Samples%     Time%         Avg time
      
               0xc050:POUT          3   100.00%   100.00%     13.75us ( +-  10.83% )
      
      Total Samples:3, Total events handled time:41.26us.
      
      And, --vcpu is used to track the specified vcpu and --key is used to sort the
      result:
      
      Analyze events for VCPU 0:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
                       HLT         27    13.85%    99.97% 405790.24us ( +-  12.70% )
        EXTERNAL_INTERRUPT         13     6.67%     0.00%     27.94us ( +-  22.26% )
               APIC_ACCESS        146    74.87%     0.03%     21.69us ( +-   2.91% )
            IO_INSTRUCTION          2     1.03%     0.00%     17.77us ( +-  20.56% )
                 CR_ACCESS          2     1.03%     0.00%      8.55us ( +-   6.47% )
         PENDING_INTERRUPT          5     2.56%     0.00%      6.27us ( +-   3.94% )
      
      Total Samples:195, Total events handled time:10959950.90us.
      Signed-off-by: NDong Hao <haodong@linux.vnet.ibm.com>
      Signed-off-by: NRunzhen Wang <runzhen@linux.vnet.ibm.com>
      [ Dong Hao <haodong@linux.vnet.ibm.com>
        Runzhen Wang <runzhen@linux.vnet.ibm.com>:
           - rebase it on current acme's tree
           - fix the compiling-error on i386 ]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1347870675-31495-4-git-send-email-haodong@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bcf6edcd
    • E
      perf tools: Fix parallel build · e6048fb8
      Eric Sandeen 提交于
      Parallel builds of perf were failing for me on a 32p box, with:
      
          * new build flags or prefix
      util/pmu.l:7:23: error: pmu-bison.h: No such file or directory
      
      ...
      
      make: *** [util/pmu-flex.o] Error 1
      make: *** Waiting for unfinished jobs....
      
      This can pretty quickly be seen by adding a sleep in front of the bison
      calls in tools/perf/Makefile and running make -j4 on a smaller box i.e.:
      
      	sleep 10; $(QUIET_BISON)$(BISON) -v util/pmu.y -d -o $(OUTPUT)util/pmu-bison.c
      
      Adding the following dependencies fixes it for me.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/505BD190.40707@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e6048fb8
    • S
      perf record: Print event causing perf_event_open() to fail · 1863fbbb
      Stephane Eranian 提交于
      Got tired of not getting the event that caused the perf_event_open()
      syscall to fail. So I fixed the error message. This is very useful when
      monitoring lots of events in a single run.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20120920161945.GA7064@quadSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1863fbbb
  3. 20 9月, 2012 2 次提交
  4. 18 9月, 2012 11 次提交
  5. 15 9月, 2012 5 次提交
  6. 12 9月, 2012 5 次提交
    • A
      perf sched: Don't read all tracepoint variables in advance · 9ec3f4e4
      Arnaldo Carvalho de Melo 提交于
      Do it just at the actual consumer of these fields, that way we avoid
      needless lookups:
      
        [root@sandy ~]# perf sched record sleep 30s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ]
      
      Before:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                103.592215 task-clock                #    0.993 CPUs utilized            ( +-  0.33% )
                        12 context-switches          #    0.114 K/sec                    ( +-  3.29% )
                         0 cpu-migrations            #    0.000 K/sec
                     7,605 page-faults               #    0.073 M/sec                    ( +-  0.00% )
               345,796,112 cycles                    #    3.338 GHz                      ( +-  0.07% ) [82.90%]
               106,876,796 stalled-cycles-frontend   #   30.91% frontend cycles idle     ( +-  0.38% ) [83.23%]
                62,060,877 stalled-cycles-backend    #   17.95% backend  cycles idle     ( +-  0.80% ) [67.14%]
               628,246,586 instructions              #    1.82  insns per cycle
                                                     #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.64%]
               134,962,057 branches                  # 1302.820 M/sec                    ( +-  0.10% ) [83.64%]
                 1,233,037 branch-misses             #    0.91% of all branches          ( +-  0.29% ) [83.41%]
      
               0.104333272 seconds time elapsed                                          ( +-  0.33% )
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
               98.848272 task-clock                #    0.993 CPUs utilized            ( +-  0.48% )
                      11 context-switches          #    0.112 K/sec                    ( +-  2.83% )
                       0 cpu-migrations            #    0.003 K/sec                    ( +- 50.92% )
                   7,604 page-faults               #    0.077 M/sec                    ( +-  0.00% )
             332,216,085 cycles                    #    3.361 GHz                      ( +-  0.14% ) [82.87%]
             100,623,710 stalled-cycles-frontend   #   30.29% frontend cycles idle     ( +-  0.53% ) [82.95%]
              58,788,692 stalled-cycles-backend    #   17.70% backend  cycles idle     ( +-  0.59% ) [67.15%]
             609,402,433 instructions              #    1.83  insns per cycle
                                                   #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.76%]
             131,277,138 branches                  # 1328.067 M/sec                    ( +-  0.06% ) [83.77%]
               1,117,871 branch-misses             #    0.85% of all branches          ( +-  0.32% ) [83.51%]
      
             0.099580430 seconds time elapsed                                          ( +-  0.48% )
      
        [root@sandy ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-kracdpw8wqlr0xjh75uk8g11@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ec3f4e4
    • A
      perf sched: Use perf_evsel__{int,str}val · 2b7fcbc5
      Arnaldo Carvalho de Melo 提交于
      This patch also stops reading the common fields, as they were not being used except
      for one ->common_pid case that was replaced by sample->tid, i.e. the info is already
      in the perf_sample struct.
      
      Also it only fills the _event structures when there is a handler.
      
        [root@sandy ~]# perf sched record sleep 30s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ]
      
      Before:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                129.117838 task-clock                #    0.994 CPUs utilized            ( +-  0.28% )
                        14 context-switches          #    0.111 K/sec                    ( +-  2.10% )
                         0 cpu-migrations            #    0.002 K/sec                    ( +- 66.67% )
                     7,654 page-faults               #    0.059 M/sec                    ( +-  0.67% )
               438,121,661 cycles                    #    3.393 GHz                      ( +-  0.06% ) [83.06%]
               150,808,605 stalled-cycles-frontend   #   34.42% frontend cycles idle     ( +-  0.14% ) [83.10%]
                80,748,941 stalled-cycles-backend    #   18.43% backend  cycles idle     ( +-  0.64% ) [66.73%]
               758,605,879 instructions              #    1.73  insns per cycle
                                                     #    0.20  stalled cycles per insn  ( +-  0.08% ) [83.54%]
               162,164,321 branches                  # 1255.940 M/sec                    ( +-  0.10% ) [83.70%]
                 1,609,903 branch-misses             #    0.99% of all branches          ( +-  0.08% ) [83.62%]
      
               0.129949153 seconds time elapsed                                          ( +-  0.28% )
      
      After:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                103.592215 task-clock                #    0.993 CPUs utilized            ( +-  0.33% )
                        12 context-switches          #    0.114 K/sec                    ( +-  3.29% )
                         0 cpu-migrations            #    0.000 K/sec
                     7,605 page-faults               #    0.073 M/sec                    ( +-  0.00% )
               345,796,112 cycles                    #    3.338 GHz                      ( +-  0.07% ) [82.90%]
               106,876,796 stalled-cycles-frontend   #   30.91% frontend cycles idle     ( +-  0.38% ) [83.23%]
                62,060,877 stalled-cycles-backend    #   17.95% backend  cycles idle     ( +-  0.80% ) [67.14%]
               628,246,586 instructions              #    1.82  insns per cycle
                                                     #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.64%]
               134,962,057 branches                  # 1302.820 M/sec                    ( +-  0.10% ) [83.64%]
                 1,233,037 branch-misses             #    0.91% of all branches          ( +-  0.29% ) [83.41%]
      
               0.104333272 seconds time elapsed                                          ( +-  0.33% )
      
        [root@sandy ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-weu9t63zkrfrazkn0gxj48xy@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b7fcbc5
    • A
      perf evsel: Introduce perf_evsel__{str,int}val methods · 5555ded4
      Arnaldo Carvalho de Melo 提交于
      Wrappers to the libtraceevent routines, so that we can further reduce
      the surface contact perf builtins have with it.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-rtmgzptvrifzjxqwb9vs6g1b@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5555ded4
    • A
      perf sched: Use perf_tool as ancestor · 0e9b07e5
      Arnaldo Carvalho de Melo 提交于
      So that we can remove all the globals.
      
      Before:
      
         text	   data	    bss	    dec	    hex	filename
      1586833	 110368	1438600	3135801	 2fd939	/tmp/oldperf
      
      After:
      
         text	   data	    bss	    dec	    hex	filename
      1629329	  93568	 848328	2571225	 273bd9	/root/bin/perf
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-oph40vikij0crjz4eyapneov@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0e9b07e5
    • A
      perf sched: Remove unused thread parameter · 4218e673
      Arnaldo Carvalho de Melo 提交于
      From the tracepoint handling routines.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-mcqd9mv34z6he0wqiz4a3mh9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4218e673