1. 18 12月, 2015 2 次提交
  2. 14 9月, 2015 1 次提交
  3. 03 9月, 2015 1 次提交
    • K
      perf tools: Store the cpu socket and core ids in the perf.data header · 2bb00d2f
      Kan Liang 提交于
      This patch stores the cpu socket_id and core_id in a perf.data header,
      and reads them into the perf_env struct when processing perf.data files.
      
      The changes modifies the CPU_TOPOLOGY section, making sure it is
      backward/forward compatible.
      
      The patch checks the section size before reading the core and socket ids.
      
      It never reads data crossing the section boundary.  An old perf binary
      without this patch can also correctly read the perf.data from a new perf
      with this patch.
      
      Because the new info is added at the end of the cpu_topology section, an
      old perf tool ignores the extra data.
      
      Examples:
      
      1. New perf with this patch read perf.data from an old perf without the
         patch:
      
        $ perf_new report -i perf_old.data --header-only -I
        ......
        # sibling threads : 33
        # sibling threads : 34
        # sibling threads : 35
        # Core ID and Socket ID information is not available
        # node0 meminfo  : total = 32823872 kB, free = 29315548 kB
        # node0 cpu list : 0-17,36-53
        ......
      
      2. Old perf without the patch reads perf.data from a new perf with the
         patch:
      
        $ perf_old report -i perf_new.data --header-only -I
        ......
        # sibling threads : 33
        # sibling threads : 34
        # sibling threads : 35
        # node0 meminfo  : total = 32823872 kB, free = 29190932 kB
        # node0 cpu list : 0-17,36-53
        ......
      
      3. New perf read new perf.data:
      
        $ perf_new report -i perf_new.data --header-only -I
        ......
        # sibling threads : 33
        # sibling threads : 34
        # sibling threads : 35
        # CPU 0: Core ID 0, Socket ID 0
        # CPU 1: Core ID 1, Socket ID 0
        ......
        # CPU 61: Core ID 10, Socket ID 1
        # CPU 62: Core ID 11, Socket ID 1
        # CPU 63: Core ID 16, Socket ID 1
        # node0 meminfo  : total = 32823872 kB, free = 29190932 kB
        # node0 cpu list : 0-17,36-53
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1441115893-22006-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2bb00d2f
  4. 29 8月, 2015 1 次提交
  5. 22 7月, 2015 1 次提交
  6. 29 4月, 2015 1 次提交
  7. 05 11月, 2014 1 次提交
  8. 23 7月, 2014 1 次提交
  9. 02 5月, 2014 1 次提交
  10. 13 1月, 2014 1 次提交
    • A
      perf header: Pack 'struct perf_session_env' · 3ba4d2e1
      Arnaldo Carvalho de Melo 提交于
      Initial struct:
      
      [acme@ssdandy linux]$ pahole -C perf_session_env ~/bin/perf
      struct perf_session_env {
      	char *                     hostname;             /*     0     8 */
      	char *                     os_release;           /*     8     8 */
      	char *                     version;              /*    16     8 */
      	char *                     arch;                 /*    24     8 */
      	int                        nr_cpus_online;       /*    32     4 */
      	int                        nr_cpus_avail;        /*    36     4 */
      	char *                     cpu_desc;             /*    40     8 */
      	char *                     cpuid;                /*    48     8 */
      	long long unsigned int     total_mem;            /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	int                        nr_cmdline;           /*    64     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	char *                     cmdline;              /*    72     8 */
      	int                        nr_sibling_cores;     /*    80     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	char *                     sibling_cores;        /*    88     8 */
      	int                        nr_sibling_threads;   /*    96     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	char *                     sibling_threads;      /*   104     8 */
      	int                        nr_numa_nodes;        /*   112     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	char *                     numa_nodes;           /*   120     8 */
      	/* --- cacheline 2 boundary (128 bytes) --- */
      	int                        nr_pmu_mappings;      /*   128     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	char *                     pmu_mappings;         /*   136     8 */
      	int                        nr_groups;            /*   144     4 */
      
      	/* size: 152, cachelines: 3, members: 20 */
      	/* sum members: 128, holes: 5, sum holes: 20 */
      	/* padding: 4 */
      	/* last cacheline: 24 bytes */
      };
      [acme@ssdandy linux]$
      
      [acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf | grep ^/ | grep -v Final
      /* Moving 'nr_sibling_cores' from after 'cmdline' to after 'nr_cmdline' */
      /* Moving 'nr_numa_nodes' from after 'sibling_threads' to after 'nr_sibling_threads' */
      /* Moving 'nr_groups' from after 'pmu_mappings' to after 'nr_pmu_mappings' */
      [acme@ssdandy linux]$
      
      Final struct stats:
      
      [acme@ssdandy linux]$ pahole -C perf_session_env --reorganize --show_reorg_steps ~/bin/perf | tail -4
      	/* --- cacheline 2 boundary (128 bytes) --- */
      
      	/* size: 128, cachelines: 2, members: 20 */
      };   /* saved 24 bytes and 1 cacheline! */
      [acme@ssdandy linux]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-3d9tshamloinzxcqeb7mtd1n@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ba4d2e1
  11. 18 7月, 2013 4 次提交
  12. 16 7月, 2013 2 次提交
  13. 13 7月, 2013 1 次提交
  14. 29 5月, 2013 1 次提交
  15. 01 2月, 2013 1 次提交
  16. 20 11月, 2012 1 次提交
    • D
      perf: Make perf build for x86 with UAPI disintegration applied · d2709c7c
      David Howells 提交于
      Make perf build for x86 once the UAPI disintegration patches for that arch
      have been applied by adding the appropriate -I flags - in the right order -
      and then converting some #includes that use ../.. notation to find main kernel
      headerfiles to use <asm/foo.h> and <linux/foo.h> instead.
      
      Note that -Iarch/foo/include/uapi is present _before_ -Iarch/foo/include.
      This makes sure we get the userspace version of the pt_regs struct.  Ideally,
      we wouldn't have the latter -I flag at all, but unfortunately we want
      asm/svm.h and asm/vmx.h in builtin-kvm.c and these aren't part of the UAPI -
      at least not for x86.  I wonder if the bits outside of the __KERNEL__ guards
      *should* be transferred there.
      
      I note also that perf seems to do its dependency handling manually by listing
      all the header files it might want to use in LIB_H in the Makefile.  Can this
      be changed to use -MD?
      
      Note that to do make this work, we need to export and UAPI disintegrate
      linux/hw_breakpoint.h, which I think should've been exported previously so that
      perf can access the bits.  We have to do this in the same patch to maintain
      bisectability.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d2709c7c
  17. 29 10月, 2012 1 次提交
  18. 15 10月, 2012 1 次提交
  19. 24 9月, 2012 2 次提交
  20. 21 9月, 2012 1 次提交
    • X
      perf kvm: Events analysis tool · bcf6edcd
      Xiao Guangrong 提交于
      Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
      
      Usage:
      - kvm stat
        run a command and gather performance counter statistics, it is the alias of
        perf stat
      
      - trace kvm events:
        perf kvm stat record, or, if other tracepoints are interesting as well, we
        can append the events like this:
        perf kvm stat record -e timer:* -a
      
        If many guests are running, we can track the specified guest by using -p or
        --pid, -a is used to track events generated by all guests.
      
      - show the result:
        perf kvm stat report
      
      The output example is following:
      13005
      13059
      
      total 2 guests are running on the host
      
      Then, track the guest whose pid is 13059:
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.253 MB perf.data.guest (~11065 samples) ]
      
      See the vmexit events:
      
      Analyze events for all VCPUs:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
               APIC_ACCESS        460    70.55%     0.01%     22.44us ( +-   1.75% )
                       HLT         93    14.26%    99.98% 832077.26us ( +-  10.42% )
        EXTERNAL_INTERRUPT         64     9.82%     0.00%     35.35us ( +-  14.21% )
         PENDING_INTERRUPT         24     3.68%     0.00%      9.29us ( +-  31.39% )
                 CR_ACCESS          7     1.07%     0.00%      8.12us ( +-   5.76% )
            IO_INSTRUCTION          3     0.46%     0.00%     18.00us ( +-  11.79% )
             EXCEPTION_NMI          1     0.15%     0.00%      5.83us ( +-   -nan% )
      
      Total Samples:652, Total events handled time:77396109.80us.
      
      See the mmio events:
      
      Analyze events for all VCPUs:
      
               MMIO Access    Samples  Samples%     Time%         Avg time
      
              0xfee00380:W        387    84.31%    79.28%      8.29us ( +-   3.32% )
              0xfee00300:W         24     5.23%     9.96%     16.79us ( +-   1.97% )
              0xfee00300:R         24     5.23%     7.83%     13.20us ( +-   3.00% )
              0xfee00310:W         24     5.23%     2.93%      4.94us ( +-   3.84% )
      
      Total Samples:459, Total events handled time:4044.59us.
      
      See the ioport event:
      
      Analyze events for all VCPUs:
      
            IO Port Access    Samples  Samples%     Time%         Avg time
      
               0xc050:POUT          3   100.00%   100.00%     13.75us ( +-  10.83% )
      
      Total Samples:3, Total events handled time:41.26us.
      
      And, --vcpu is used to track the specified vcpu and --key is used to sort the
      result:
      
      Analyze events for VCPU 0:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
                       HLT         27    13.85%    99.97% 405790.24us ( +-  12.70% )
        EXTERNAL_INTERRUPT         13     6.67%     0.00%     27.94us ( +-  22.26% )
               APIC_ACCESS        146    74.87%     0.03%     21.69us ( +-   2.91% )
            IO_INSTRUCTION          2     1.03%     0.00%     17.77us ( +-  20.56% )
                 CR_ACCESS          2     1.03%     0.00%      8.55us ( +-   6.47% )
         PENDING_INTERRUPT          5     2.56%     0.00%      6.27us ( +-   3.94% )
      
      Total Samples:195, Total events handled time:10959950.90us.
      Signed-off-by: NDong Hao <haodong@linux.vnet.ibm.com>
      Signed-off-by: NRunzhen Wang <runzhen@linux.vnet.ibm.com>
      [ Dong Hao <haodong@linux.vnet.ibm.com>
        Runzhen Wang <runzhen@linux.vnet.ibm.com>:
           - rebase it on current acme's tree
           - fix the compiling-error on i386 ]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1347870675-31495-4-git-send-email-haodong@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bcf6edcd
  21. 11 9月, 2012 1 次提交
    • J
      perf tools: Back [vdso] DSO with real data · 7dbf4dcf
      Jiri Olsa 提交于
      Storing data for VDSO shared object, because we need it for the post
      unwind processing.
      
      The VDSO shared object is same for all process on a running system, so
      it makes no difference when we store it inside the tracer - perf.
      
      When [vdso] map memory is hit, we retrieve [vdso] DSO image and store it
      into temporary file.
      
      During the build-id processing phase, the [vdso] DSO image is stored in
      build-id db, and build-id reference is made inside perf.data. The
      build-id vdso file object is called '[vdso]'. We don't use temporary
      file name which gets removed when record is finished.
      
      During report phase the vdso build-id object is treated as any other
      build-id DSO object.
      
      Adding following API for vdso object:
      
        bool is_vdso_map(const char *filename)
          - returns true if the filename matches vdso map name
      
        struct dso *vdso__dso_findnew(struct list_head *head)
          - find/create proper vdso DSO object
      
        vdso__exit(void)
          - removes temporary VDSO image if there's any
      
      This change makes backtrace dwarf post unwind possible from [vdso] maps.
      
      Following output is current report of [vdso] sample dwarf backtrace:
      
        # Overhead  Command      Shared Object                         Symbol
        # ........  .......  .................  .............................
        #
            99.52%       ex  [vdso]             [.] 0x00007fff3ace89af
                         |
                         --- 0x7fff3ace89af
      
      Following output is new report of [vdso] sample dwarf backtrace:
      
        # Overhead  Command      Shared Object                         Symbol
        # ........  .......  .................  .............................
        #
            99.52%       ex  [vdso]             [.] 0x00000000000009af
                         |
                         --- 0x7fff3ace89af
                             main
                             __libc_start_main
                             _start
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1347295819-23177-5-git-send-email-jolsa@redhat.com
      [ committer note: s/ALIGN/PERF_ALIGN/g to cope with the android build changes ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7dbf4dcf
  22. 23 8月, 2012 1 次提交
    • R
      perf tools: Add pmu mappings to header information · 50a9667c
      Robert Richter 提交于
      With dynamic pmu allocation there are also dynamically assigned pmu ids.
      These ids are used in event->attr.type to describe the pmu to be used
      for that event. The information is available in sysfs, e.g:
      
       /sys/bus/event_source/devices/breakpoint/type: 5
       /sys/bus/event_source/devices/cpu/type: 4
       /sys/bus/event_source/devices/ibs_fetch/type: 6
       /sys/bus/event_source/devices/ibs_op/type: 7
       /sys/bus/event_source/devices/software/type: 1
       /sys/bus/event_source/devices/tracepoint/type: 2
      
      These mappings are needed to know which samples belong to which pmu.  If
      a pmu is added dynamically like for ibs_fetch or ibs_op the type value
      may vary.
      
      Now, when decoding samples from perf.data this information in sysfs
      might be no longer available or may have changed. We need to store it in
      perf.data. Using the header for this. Now the header information created
      with perf report contains an additional section looking like this:
      
       # pmu mappings: ibs_op = 7, ibs_fetch = 6, cpu = 4, breakpoint = 5, tracepoint = 2, software = 1
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1345144224-27280-9-git-send-email-robert.richter@amd.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      50a9667c
  23. 17 8月, 2012 1 次提交
  24. 22 5月, 2012 1 次提交
  25. 09 3月, 2012 1 次提交
  26. 03 2月, 2012 1 次提交
  27. 24 12月, 2011 1 次提交
  28. 28 11月, 2011 4 次提交
  29. 08 10月, 2011 1 次提交
    • S
      perf tools: Make perf.data more self-descriptive (v8) · fbe96f29
      Stephane Eranian 提交于
      The goal of this patch is to include more information about the host
      environment into the perf.data so it is more self-descriptive. Overtime,
      profiles are captured on various machines and it becomes hard to track
      what was recorded, on what machine and when.
      
      This patch provides a way to solve this by extending the perf.data file
      with basic information about the host machine. To add those extensions,
      we leverage the feature bits capabilities of the perf.data format.  The
      change is backward compatible with existing perf.data files.
      
      We define the following useful new extensions:
       - HEADER_HOSTNAME: the hostname
       - HEADER_OSRELEASE: the kernel release number
       - HEADER_ARCH: the hw architecture
       - HEADER_CPUDESC: generic CPU description
       - HEADER_NRCPUS: number of online/avail cpus
       - HEADER_CMDLINE: perf command line
       - HEADER_VERSION: perf version
       - HEADER_TOPOLOGY: cpu topology
       - HEADER_EVENT_DESC: full event description (attrs)
       - HEADER_CPUID: easy-to-parse low level CPU identication
      
      The small granularity for the entries is to make it easier to extend
      without breaking backward compatiblity. Many entries are provided as
      ASCII strings.
      
      Perf report/script have been modified to print the basic information as
      easy-to-parse ASCII strings. Extended information about CPU and NUMA
      topology may be requested with the -I option.
      
      Thanks to David Ahern for reviewing and testing the many versions of
      this patch.
      
       $ perf report --stdio
       # ========
       # captured on : Mon Sep 26 15:22:14 2011
       # hostname : quad
       # os release : 3.1.0-rc4-tip
       # perf version : 3.1.0-rc4
       # arch : x86_64
       # nrcpus online : 4
       # nrcpus avail : 4
       # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
       # cpuid : GenuineIntel,6,15,11
       # total memory : 8105360 kB
       # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
       # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
       # HEADER_CPU_TOPOLOGY info available, use -I to display
       # HEADER_NUMA_TOPOLOGY info available, use -I to display
       # ========
       #
       ...
      
       $ perf report --stdio -I
       # ========
       # captured on : Mon Sep 26 15:22:14 2011
       # hostname : quad
       # os release : 3.1.0-rc4-tip
       # perf version : 3.1.0-rc4
       # arch : x86_64
       # nrcpus online : 4
       # nrcpus avail : 4
       # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
       # cpuid : GenuineIntel,6,15,11
       # total memory : 8105360 kB
       # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
       # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
       # sibling cores   : 0-3
       # sibling threads : 0
       # sibling threads : 1
       # sibling threads : 2
       # sibling threads : 3
       # node0 meminfo  : total = 8320608 kB, free = 7571024 kB
       # node0 cpu list : 0-3
       # ========
       #
       ...
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/20110930134040.GA5575@quadSigned-off-by: NStephane Eranian <eranian@google.com>
      [ committer notes: Use --show-info in the tools as was in the docs, rename
        perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
        conflict with f69b64f7 "perf: Support setting the disassembler style" ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fbe96f29
  30. 22 5月, 2011 1 次提交
  31. 10 3月, 2011 1 次提交
    • A
      perf header: Stop using 'self' · 1c0b04d1
      Arnaldo Carvalho de Melo 提交于
      Stop using this python/OOP convention, doesn't really helps. Will do
      more from time to time till we get it cleaned up in all of tools/perf.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1c0b04d1