1. 28 5月, 2013 1 次提交
  2. 09 12月, 2012 1 次提交
  3. 07 10月, 2012 1 次提交
  4. 21 9月, 2012 1 次提交
    • X
      perf kvm: Events analysis tool · bcf6edcd
      Xiao Guangrong 提交于
      Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
      
      Usage:
      - kvm stat
        run a command and gather performance counter statistics, it is the alias of
        perf stat
      
      - trace kvm events:
        perf kvm stat record, or, if other tracepoints are interesting as well, we
        can append the events like this:
        perf kvm stat record -e timer:* -a
      
        If many guests are running, we can track the specified guest by using -p or
        --pid, -a is used to track events generated by all guests.
      
      - show the result:
        perf kvm stat report
      
      The output example is following:
      13005
      13059
      
      total 2 guests are running on the host
      
      Then, track the guest whose pid is 13059:
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.253 MB perf.data.guest (~11065 samples) ]
      
      See the vmexit events:
      
      Analyze events for all VCPUs:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
               APIC_ACCESS        460    70.55%     0.01%     22.44us ( +-   1.75% )
                       HLT         93    14.26%    99.98% 832077.26us ( +-  10.42% )
        EXTERNAL_INTERRUPT         64     9.82%     0.00%     35.35us ( +-  14.21% )
         PENDING_INTERRUPT         24     3.68%     0.00%      9.29us ( +-  31.39% )
                 CR_ACCESS          7     1.07%     0.00%      8.12us ( +-   5.76% )
            IO_INSTRUCTION          3     0.46%     0.00%     18.00us ( +-  11.79% )
             EXCEPTION_NMI          1     0.15%     0.00%      5.83us ( +-   -nan% )
      
      Total Samples:652, Total events handled time:77396109.80us.
      
      See the mmio events:
      
      Analyze events for all VCPUs:
      
               MMIO Access    Samples  Samples%     Time%         Avg time
      
              0xfee00380:W        387    84.31%    79.28%      8.29us ( +-   3.32% )
              0xfee00300:W         24     5.23%     9.96%     16.79us ( +-   1.97% )
              0xfee00300:R         24     5.23%     7.83%     13.20us ( +-   3.00% )
              0xfee00310:W         24     5.23%     2.93%      4.94us ( +-   3.84% )
      
      Total Samples:459, Total events handled time:4044.59us.
      
      See the ioport event:
      
      Analyze events for all VCPUs:
      
            IO Port Access    Samples  Samples%     Time%         Avg time
      
               0xc050:POUT          3   100.00%   100.00%     13.75us ( +-  10.83% )
      
      Total Samples:3, Total events handled time:41.26us.
      
      And, --vcpu is used to track the specified vcpu and --key is used to sort the
      result:
      
      Analyze events for VCPU 0:
      
                   VM-EXIT    Samples  Samples%     Time%         Avg time
      
                       HLT         27    13.85%    99.97% 405790.24us ( +-  12.70% )
        EXTERNAL_INTERRUPT         13     6.67%     0.00%     27.94us ( +-  22.26% )
               APIC_ACCESS        146    74.87%     0.03%     21.69us ( +-   2.91% )
            IO_INSTRUCTION          2     1.03%     0.00%     17.77us ( +-  20.56% )
                 CR_ACCESS          2     1.03%     0.00%      8.55us ( +-   6.47% )
         PENDING_INTERRUPT          5     2.56%     0.00%      6.27us ( +-   3.94% )
      
      Total Samples:195, Total events handled time:10959950.90us.
      Signed-off-by: NDong Hao <haodong@linux.vnet.ibm.com>
      Signed-off-by: NRunzhen Wang <runzhen@linux.vnet.ibm.com>
      [ Dong Hao <haodong@linux.vnet.ibm.com>
        Runzhen Wang <runzhen@linux.vnet.ibm.com>:
           - rebase it on current acme's tree
           - fix the compiling-error on i386 ]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1347870675-31495-4-git-send-email-haodong@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bcf6edcd
  5. 28 11月, 2011 1 次提交
  6. 24 1月, 2011 1 次提交
    • A
      perf threads: Move thread_map to separate file · fd78260b
      Arnaldo Carvalho de Melo 提交于
      To untangle it from struct thread handling, that is tied to symbols, etc.
      
      Right now in the python bindings I'm working on I need just a subset of
      the util/ files, untangling it allows me to do that.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd78260b
  7. 04 1月, 2011 1 次提交
    • A
      perf tools: Refactor all_tids to hold nr and the map · 5c98d466
      Arnaldo Carvalho de Melo 提交于
      So that later, we can pass the thread_map instance instead of
      (thread_num, thread_map) for things like perf_evsel__open and friends,
      just like was done with cpu_map.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5c98d466
  8. 31 7月, 2010 1 次提交
    • A
      perf tools: Release thread resources on PERF_RECORD_EXIT · 591765fd
      Arnaldo Carvalho de Melo 提交于
      For long running sessions with many threads with short lifetimes the
      amount of memory that the buildid process takes is too much.
      
      Since we don't have hist_entries that may be pointing to them, we can
      just release the resources associated with each thread when the exit
      (PERF_RECORD_EXIT) event is received.
      
      For normal processing we need to annotate maps with hits, and thus
      hist_entries pointing to it and drop the ones that had none. Will be
      done in a followup patch.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      591765fd
  9. 17 6月, 2010 1 次提交
    • A
      perf session: Remove threads from tree on PERF_RECORD_EXIT · 720a3aeb
      Arnaldo Carvalho de Melo 提交于
      Move them to a session->dead_threads list just like we do with maps that
      are replaced, because we may have hist_entries pointing to them.
      
      This fixes a bug when inserting maps for a new thread that reused the
      TID, mixing maps for two different threads, causing an endless loop.
      
      The code for insering maps should be made more robust but for .35 this
      is the minimalistic patch.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      720a3aeb
  10. 19 4月, 2010 1 次提交
  11. 26 3月, 2010 1 次提交
    • A
      perf symbols: Move map related routines to map.c · 4b8cf846
      Arnaldo Carvalho de Melo 提交于
      Thru series of refactorings functions were being renamed but not
      moved to map.c to reduce patch noise, now lets have them in the
      same place so that use of the symbol system by tools can be
      constrained to building and linking fewer source files:
      symbol.c, map.c and rbtree.c.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1269557941-15617-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4b8cf846
  12. 18 3月, 2010 1 次提交
    • Z
      perf events: Change perf parameter --pid to process-wide collection instead of thread-wide · d6d901c2
      Zhang, Yanmin 提交于
      Parameter --pid (or -p) of perf currently means a thread-wide
      collection. For exmaple, if a process whose id is 8888 has 10
      threads, 'perf top -p 8888' just collects the main thread
      statistics. That's misleading. Users are used to attach a whole
      process when debugging a process by gdb. To follow normal usage
      style, the patch change --pid to process-wide collection and add
      --tid (-t) to mean a thread-wide collection.
      
      Usage example is:
      
       # perf top -p 8888
       # perf record -p 8888 -f sleep 10
       # perf stat -p 8888 -f sleep 10
      
      Above commands collect the statistics of all threads of process
      8888.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: zhiteng.huang@intel.com
      Cc: Zachary Amsden <zamsden@redhat.com>
      LKML-Reference: <1268922965-14774-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6d901c2
  13. 10 3月, 2010 1 次提交
    • A
      perf report: Print the map table just after samples for which no map was found · 65f2ed2b
      Arnaldo Carvalho de Melo 提交于
      If -vv is used just the map table will be printed, -vvv will
      print the symbol table too, with it we can see that we have a
      bug where some samples are not being resolved to a map when we
      get them in the perf.data stream, but after we have it all
      processed, we can find the right map, some reordering probably
      is happening.
      
      Upcoming patches will provide ways to ask for most PERF_SAMPLE_
      conditional samples to be taken for !PERF_RECORD_SAMPLE events
      too, then we'll be able to ask for PERF_SAMPLE_TIME and
      PERF_SAMPLE_CPU to help diagnose this.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268161097-17761-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65f2ed2b
  14. 22 2月, 2010 1 次提交
    • A
      perf tools: Don't use parent comm if not set at fork time · faa5c5c3
      Arnaldo Carvalho de Melo 提交于
      As the parent comm then is worthless, confusing users about the
      thread where the sample really happened, leading to think that
      the sample happened in the parent, not where it really happened,
      in the children of a thread for which a PERF_RECORD_COMM event
      was not received.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1266627727-19715-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa5c5c3
  15. 04 2月, 2010 1 次提交
    • A
      perf symbols: Remove perf_session usage in symbols layer · 9de89fe7
      Arnaldo Carvalho de Melo 提交于
      I noticed while writing the first test in 'perf regtest' that to
      just test the symbol handling routines one needs to create a
      perf session, that is a layer centered on a perf.data file,
      events, etc, so I untied these layers.
      
      This reduces the complexity for the users as the number of
      parameters to most of the symbols and session APIs now was
      reduced while not adding more state to all the map instances by
      only having data that is needed to split the kernel (kallsyms
      and ELF symtab sections) maps and do vmlinux relocation on the
      main kernel map.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1265223128-11786-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9de89fe7
  16. 16 1月, 2010 1 次提交
  17. 14 1月, 2010 1 次提交
    • A
      perf tools: Encode kernel module mappings in perf.data · b7cece76
      Arnaldo Carvalho de Melo 提交于
      We were always looking at the running machine /proc/modules,
      even when processing a perf.data file, which only makes sense
      when we're doing 'perf record' and 'perf report' on the same
      machine, and in close sucession, or if we don't use modules at
      all, right Peter? ;-)
      
      Now, at 'perf record' time we read /proc/modules, find the long
      path for modules, and put them as PERF_MMAP events, just like we
      did to encode the reloc reference symbol for vmlinux. Talking
      about that now it is encoded in .pgoff, so that we can use
      .{start,len} to store the address boundaries for the kernel so
      that when we reconstruct the kmaps tree we can do lookups right
      away, without having to fixup the end of the kernel maps like we
      did in the past (and now only in perf record).
      
      One more step in the 'perf archive' direction when we'll finally
      be able to collect data in one machine and analyse in another.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1263396139-4798-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7cece76
  18. 14 12月, 2009 3 次提交
    • A
      perf session: Move kmaps to perf_session · 4aa65636
      Arnaldo Carvalho de Melo 提交于
      There is still some more work to do to disentangle map creation
      from DSO loading, but this happens only for the kernel, and for
      the early adopters of perf diff, where this disentanglement
      matters most, we'll be testing different kernels, so no problem
      here.
      
      Further clarification: right now we create the kernel maps for
      the various modules and discontiguous kernel text maps when
      loading the DSO, we should do it as a two step process, first
      creating the maps, for multiple mappings with the same DSO
      store, then doing the dso load just once, for the first hit on
      one of the maps sharing this DSO backing store.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-6-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4aa65636
    • A
      perf session: Move the global threads list to perf_session · b3165f41
      Arnaldo Carvalho de Melo 提交于
      So that we can process two perf.data files.
      
      We still need to add a O_MMAP mode for perf_session so that we
      can do all the mmap stuff in it.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-5-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3165f41
    • A
      perf session: Register the idle thread in perf_session__process_events · 13df45ca
      Arnaldo Carvalho de Melo 提交于
      No need for all tools to register it and then immediately call
      perf_session__process_events.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      13df45ca
  19. 12 12月, 2009 2 次提交
    • A
      perf symbols: Allow lookups by symbol name too · 79406cd7
      Arnaldo Carvalho de Melo 提交于
      Configurable via symbol_conf.sort_by_name, so that the cost of an
      extra rb_node on all 'struct symbol' instances is not paid by tools
      that only want to decode addresses.
      
      How to use it:
      
      	symbol_conf.sort_by_name = true;
      	symbol_init(&symbol_conf);
      
      	struct map *map = map_groups__find_by_name(kmaps, MAP__VARIABLE, "[kernel.kallsyms]");
      
      	if (map == NULL) {
      		pr_err("couldn't find map!\n");
      		kernel_maps__fprintf(stdout);
      	} else {
      		struct symbol *sym = map__find_symbol_by_name(map, sym_filter, NULL);
      		if (sym == NULL)
      			pr_err("couldn't find symbol %s!\n", sym_filter);
      		else
      			pr_info("symbol %s: %#Lx-%#Lx \n", sym_filter, sym->start, sym->end);
      	}
      
      Looking over the vmlinux/kallsyms is common enough that I'll add a
      variable to the upcoming struct perf_session to avoid the need to
      use map_groups__find_by_name to get the main vmlinux/kallsyms map.
      
      The above example looks on the 'variable' symtab, but it is just
      like that for the functions one.
      
      Also the sort operation is done when we first use
      map__find_symbol_by_name, in a lazy way.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260564622-12392-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      79406cd7
    • A
      perf symbols: Rename kthreads to kmaps, using another abstraction for it · 9958e1f0
      Arnaldo Carvalho de Melo 提交于
      Using a struct thread instance just to hold the kernel space maps
      (vmlinux + modules) is overkill and confuses people trying to
      understand the perf symbols abstractions.
      
      The kernel maps are really present in all threads, i.e. the kernel
      is a library, not a separate thread.
      
      So introduce the 'map_groups' abstraction and use it for the kernel
      maps, now in the kmaps global variable.
      
      It, in turn, will move, together with the threads list to the
      perf_file abstraction, so that we can support multiple perf_file
      instances, needed by perf diff.
      
      Brainstormed-with: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260550239-5372-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9958e1f0
  20. 28 11月, 2009 4 次提交
    • A
      perf tools: Consolidate symbol resolving across all tools · 1ed091c4
      Arnaldo Carvalho de Melo 提交于
      Now we have a very high level routine for simple tools to
      process IP sample events:
      
      	int event__preprocess_sample(const event_t *self,
      				     struct addr_location *al,
      				     symbol_filter_t filter)
      
      It receives the event itself and will insert new threads in the
      global threads list and resolve the map and symbol, filling all
      this info into the new addr_location struct, so that tools like
      annotate and report can further process the event by creating
      hist_entries in their specific way (with or without callgraphs,
      etc).
      
      It in turn uses the new next layer function:
      
      	void thread__find_addr_location(struct thread *self, u8 cpumode,
      					enum map_type type, u64 addr,
      					struct addr_location *al,
      					symbol_filter_t filter)
      
      This one will, given a thread (userspace or the kernel kthread
      one), will find the given type (MAP__FUNCTION now, MAP__VARIABLE
      too in the near future) at the given cpumode, taking vdsos into
      account (userspace hit, but kernel symbol) and will fill all
      these details in the addr_location given.
      
      Tools that need a more compact API for plain function
      resolution, like 'kmem', can use this other one:
      
      	struct symbol *thread__find_function(struct thread *self, u64 addr,
      					     symbol_filter_t filter)
      
      So, to resolve a kernel symbol, that is all the 'kmem' tool
      needs, its just a matter of calling:
      
      	sym = thread__find_function(kthread, addr, NULL);
      
      The 'filter' parameter is needed because we do lazy
      parsing/loading of ELF symtabs or /proc/kallsyms.
      
      With this we remove more code duplication all around, which is
      always good, huh? :-)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-12-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ed091c4
    • A
      perf symbols: When not using modules, discard its symbols · 1de8e245
      Arnaldo Carvalho de Melo 提交于
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-10-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1de8e245
    • A
      perf symbols: Support multiple symtabs in struct thread · 95011c60
      Arnaldo Carvalho de Melo 提交于
      Making the routines that were so far specific to the kernel maps
      useful for all threads.
      
      This is done by making the kernel maps be contained in a kernel
      "thread".
      
      This gets the kernel specific routines closer to the userspace
      counterparts, which will help in reducing the boilerplate for
      resolving a symbol, as will be demonstrated in the next patches.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-9-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      95011c60
    • A
      perf symbols: Kernel_maps should be an array of MAP__NR_TYPES entries · 23ea4a3f
      Arnaldo Carvalho de Melo 提交于
      So that we can support multiple symbol table types.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-8-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      23ea4a3f
  21. 24 11月, 2009 1 次提交
  22. 21 11月, 2009 1 次提交
    • A
      perf symbols: Do lazy symtab loading for the kernel & modules too · c338aee8
      Arnaldo Carvalho de Melo 提交于
      Just like we do with the other DSOs. This also simplifies the
      kernel_maps setup process, now all that the tools need to do is
      to call kernel_maps__init and the maps for the modules and
      kernel will be created, then, later, when
      kernel_maps__find_symbol() is used, it will also call
      maps__find_symbol that already checks if the symtab was loaded,
      loading it if needed.
      
      Now if one does 'perf top --hide_kernel_symbols' we won't pay
      the price of loading the (many) symbols in /proc/kallsyms or
      vmlinux.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1258757489-5978-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c338aee8
  23. 23 10月, 2009 1 次提交
    • F
      perf tools: Bind callchains to the first sort dimension column · a4fb581b
      Frederic Weisbecker 提交于
      Currently, the callchains are displayed using a constant left
      margin. So depending on the current sort dimension
      configuration, callchains may appear to be well attached to the
      first sort dimension column field which is mostly the case,
      except when the first dimension of sorting is done by comm,
      because these are right aligned.
      
      This patch binds the callchain to the first letter in the first
      column, whatever type of column it is (dso, comm, symbol).
      Before:
      
           0.80%             perf  [k] __lock_acquire
                   __lock_acquire
                   lock_acquire
                   |
                   |--58.33%-- _spin_lock
                   |          |
                   |          |--28.57%-- inotify_should_send_event
                   |          |          fsnotify
                   |          |          __fsnotify_parent
      
      After:
      
           0.80%             perf  [k] __lock_acquire
                             __lock_acquire
                             lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                             |          |          __fsnotify_parent
      
      Also, for clarity, we don't put anymore the callchain as is but:
      
      - If we have a top level ancestor in the callchain, start it
        with a first ascii hook.
      
        Before:
      
           0.80%             perf  [kernel]                        [k] __lock_acquire
                             __lock_acquire
                               lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                            [..]       [..]
      
         After:
      
           0.80%             perf  [kernel]                         [k] __lock_acquire
                             |
                             --- __lock_acquire
                                 lock_acquire
                                |
                                |--58.33%-- _spin_lock
                                |          |
                                |          |--28.57%-- inotify_should_send_event
                                |          |          fsnotify
                               [..]       [..]
      
      - Otherwise, if we have several top level ancestors, then
        display these like we did before:
      
             1.69%           Xorg
                             |
                             |--21.21%-- vread_hpet
                             |          0x7fffd85b46fc
                             |          0x7fffd85b494d
                             |          0x7f4fafb4e54d
                             |
                             |--15.15%-- exaOffscreenAlloc
                             |
                             |--9.09%-- I830WaitLpRing
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4fb581b
  24. 13 10月, 2009 1 次提交
  25. 09 10月, 2009 1 次提交
    • F
      perf tools: Fix thread comm resolution in perf sched · 97ea1a7f
      Frederic Weisbecker 提交于
      This reverts commit 9a92b479 ("perf
      tools: Improve thread comm resolution in perf sched") and fixes the
      real bug.
      
      The bug was elsewhere:
      
      We are failing to resolve thread names in perf sched because the
      table of threads we are building, on top of comm events, has a per
      process granularity. But perf sched, unlike the other perf tools,
      needs a per thread granularity as we are profiling every tasks
      individually.
      
      So fix it by building our threads table using the tid instead of
      the pid as the thread identifier.
      
      v2: Revert the previous fix - it is not really needed
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1255028657-11158-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97ea1a7f
  26. 08 10月, 2009 1 次提交
    • F
      perf tools: Improve thread comm resolution in perf sched · 9a92b479
      Frederic Weisbecker 提交于
      When we get sched traces that involve a task that was already
      created before opening the event, we won't have the comm event for
      it.
      
      So if we can't find the comm event for a given thread, we look at
      the traces that may contain these informations.
      
      Before:
      
       ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
       kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
       kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
       :5124:5124            |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
       :6244:6244            |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
       firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
       npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
       :6245:6245            |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |
      
      After:
      
       ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
       kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
       kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
       firefox:5124          |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
       npviewer.bin:6244     |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
       firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
       npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
       npviewer.bin:6245     |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1255012632-7882-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a92b479
  27. 02 10月, 2009 1 次提交
    • A
      perf tools: Rewrite and improve support for kernel modules · 439d473b
      Arnaldo Carvalho de Melo 提交于
      Representing modules as struct map entries, backed by a DSO, etc,
      using /proc/modules to find where the module is loaded.
      
      DSOs now can have a short and long name, so that in verbose mode we
      can show exactly which .ko or vmlinux image was used.
      
      As kernel modules now are a DSO separate from the kernel, we can
      ask for just the hits for a particular set of kernel modules, just
      like we can do with shared libraries:
      
      [root@doppio linux-2.6-tip]# perf report -n --vmlinux
      /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15
          84.58%      13266             Xorg  [k] drm_clflush_pages
           4.02%        630             Xorg  [k] trace_kmalloc.clone.0
           3.95%        619             Xorg  [k] drm_ioctl
           2.07%        324             Xorg  [k] drm_addbufs
           1.68%        263             Xorg  [k] drm_gem_close_ioctl
           0.77%        120             Xorg  [k] drm_setmaster_ioctl
           0.70%        110             Xorg  [k] drm_lastclose
           0.68%        106             Xorg  [k] drm_open
           0.54%         85             Xorg  [k] drm_mm_search_free
      [root@doppio linux-2.6-tip]#
      
      Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko
      would have the same effect. Allowing specifying just 'drm.ko' is left
      for another patch.
      
      Processing kallsyms so that per kernel module struct map are
      instantiated was also left for another patch. That will allow
      removing the module name from each of its symbols.
      
      struct symbol was reduced by removing the ->module backpointer and
      moving it (well now the map) to struct symbol_entry in perf top,
      that is its only user right now.
      
      The total linecount went down by ~500 lines.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      439d473b
  28. 30 9月, 2009 1 次提交
    • A
      perf tools: Use rb_tree for maps · 1b46cddf
      Arnaldo Carvalho de Melo 提交于
      Threads can have many and kernel modules will be represented as a
      tree of maps as well.
      
      Ah, and for a perf.data with 146607 samples:
      
      Before:
      
      [root@doppio ~]# perf stat -r 5 perf report > /dev/null
      
       Performance counter stats for 'perf report' (5 runs):
      
           699.823680  task-clock-msecs         #      0.991 CPUs    ( +-   0.454% )
                   74  context-switches         #      0.000 M/sec   ( +-   1.709% )
                    2  CPU-migrations           #      0.000 M/sec   ( +-  17.008% )
                23114  page-faults              #      0.033 M/sec   ( +-   0.000% )
           1381257019  cycles                   #   1973.721 M/sec   ( +-   0.290% )
           1456894438  instructions             #      1.055 IPC     ( +-   0.007% )
             18779818  cache-references         #     26.835 M/sec   ( +-   0.380% )
               641799  cache-misses             #      0.917 M/sec   ( +-   1.200% )
      
          0.705972729  seconds time elapsed   ( +-   0.501% )
      
      [root@doppio ~]#
      
      After
      
       Performance counter stats for 'perf report' (5 runs):
      
           691.261451  task-clock-msecs         #      0.993 CPUs    ( +-   0.307% )
                   72  context-switches         #      0.000 M/sec   ( +-   0.829% )
                    6  CPU-migrations           #      0.000 M/sec   ( +-  18.409% )
                23127  page-faults              #      0.033 M/sec   ( +-   0.000% )
           1366395876  cycles                   #   1976.670 M/sec   ( +-   0.153% )
           1443136016  instructions             #      1.056 IPC     ( +-   0.012% )
             17956402  cache-references         #     25.976 M/sec   ( +-   0.325% )
               661924  cache-misses             #      0.958 M/sec   ( +-   1.335% )
      
          0.696127275  seconds time elapsed   ( +-   0.377% )
      
      I.e. we see some speedup too.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      LKML-Reference: <20090928174846.GA3361@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b46cddf
  29. 25 9月, 2009 1 次提交
  30. 16 9月, 2009 1 次提交
    • I
      perf sched: Add 'perf sched map' scheduling event map printout · 0ec04e16
      Ingo Molnar 提交于
      This prints a textual context-switching outline of workload
      captured via perf sched record.
      
      For example, on a 16 CPU box it outputs:
      
         N1  O1  .   .   .   S1  .   .   .   B0  .  *I0  C1  .   M1  .    23002.773423 secs
         N1  O1  .  *Q0  .   S1  .   .   .   B0  .   I0  C1  .   M1  .    23002.773423 secs
         N1  O1  .   Q0  .   S1  .   .   .   B0  .  *R1  C1  .   M1  .    23002.773485 secs
         N1  O1  .   Q0  .   S1  .  *S0  .   B0  .   R1  C1  .   M1  .    23002.773478 secs
        *L0  O1  .   Q0  .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773523 secs
         L0  O1  .  *.   .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773531 secs
         L0  O1  .   .   .   S1  .   S0  .   B0  .   R1  C1 *T1  M1  .    23002.773547 secs T1 => irqbalance:2089
         L0  O1  .   .   .   S1  .   S0  .  *P0  .   R1  C1  T1  M1  .    23002.773549 secs
        *N1  O1  .   .   .   S1  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773566 secs
         N1  O1  .   .   .  *J0  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773571 secs
         N1  O1  .   .   .   J0  .   S0 *B0  P0  .   R1  C1  T1  M1  .    23002.773592 secs
         N1  O1  .   .   .   J0  .  *U0  B0  P0  .   R1  C1  T1  M1  .    23002.773582 secs
         N1  O1  .   .   .  *S1  .   U0  B0  P0  .   R1  C1  T1  M1  .    23002.773604 secs
         N1  O1  .   .   .   S1  .   U0  B0 *.   .   R1  C1  T1  M1  .    23002.773615 secs
         N1  O1  .   .   .   S1  .   U0  B0  .   .  *K0  C1  T1  M1  .    23002.773631 secs
         N1  O1  .  *M0  .   S1  .   U0  B0  .   .   K0  C1  T1  M1  .    23002.773624 secs
         N1  O1  .   M0  .   S1  .   U0 *.   .   .   K0  C1  T1  M1  .    23002.773644 secs
         N1  O1  .   M0  .   S1  .   U0  .   .   .  *R1  C1  T1  M1  .    23002.773662 secs
         N1  O1  .   M0  .   S1  .  *.   .   .   .   R1  C1  T1  M1  .    23002.773648 secs
         N1  O1  .  *.   .   S1  .   .   .   .   .   R1  C1  T1  M1  .    23002.773680 secs
         N1  O1  .   .   .  *L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773717 secs
        *N0  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773709 secs
        *N1  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773747 secs
      
      Columns stand for individual CPUs, from CPU0 to CPU15, and the
      two-letter shortcuts stand for tasks that are running on a CPU.
      
      '*' denotes the CPU that had the event.
      
      A dot signals an idle CPU.
      
      New tasks are assigned new two-letter shortcuts - when they occur
      first they are printed. In the above example 'T1' stood for irqbalance:
      
            T1 => irqbalance:2089
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec04e16
  31. 13 9月, 2009 1 次提交
    • I
      perf sched: Clean up PID sorting logic · b5fae128
      Ingo Molnar 提交于
      Use a sort list for thread atoms insertion as well - instead of
      hardcoded for PID.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5fae128
  32. 31 8月, 2009 1 次提交
  33. 15 8月, 2009 1 次提交