1. 04 2月, 2010 1 次提交
    • A
      perf symbols: Remove perf_session usage in symbols layer · 9de89fe7
      Arnaldo Carvalho de Melo 提交于
      I noticed while writing the first test in 'perf regtest' that to
      just test the symbol handling routines one needs to create a
      perf session, that is a layer centered on a perf.data file,
      events, etc, so I untied these layers.
      
      This reduces the complexity for the users as the number of
      parameters to most of the symbols and session APIs now was
      reduced while not adding more state to all the map instances by
      only having data that is needed to split the kernel (kallsyms
      and ELF symtab sections) maps and do vmlinux relocation on the
      main kernel map.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1265223128-11786-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9de89fe7
  2. 16 1月, 2010 1 次提交
  3. 14 1月, 2010 1 次提交
    • A
      perf tools: Encode kernel module mappings in perf.data · b7cece76
      Arnaldo Carvalho de Melo 提交于
      We were always looking at the running machine /proc/modules,
      even when processing a perf.data file, which only makes sense
      when we're doing 'perf record' and 'perf report' on the same
      machine, and in close sucession, or if we don't use modules at
      all, right Peter? ;-)
      
      Now, at 'perf record' time we read /proc/modules, find the long
      path for modules, and put them as PERF_MMAP events, just like we
      did to encode the reloc reference symbol for vmlinux. Talking
      about that now it is encoded in .pgoff, so that we can use
      .{start,len} to store the address boundaries for the kernel so
      that when we reconstruct the kmaps tree we can do lookups right
      away, without having to fixup the end of the kernel maps like we
      did in the past (and now only in perf record).
      
      One more step in the 'perf archive' direction when we'll finally
      be able to collect data in one machine and analyse in another.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1263396139-4798-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7cece76
  4. 14 12月, 2009 3 次提交
    • A
      perf session: Move kmaps to perf_session · 4aa65636
      Arnaldo Carvalho de Melo 提交于
      There is still some more work to do to disentangle map creation
      from DSO loading, but this happens only for the kernel, and for
      the early adopters of perf diff, where this disentanglement
      matters most, we'll be testing different kernels, so no problem
      here.
      
      Further clarification: right now we create the kernel maps for
      the various modules and discontiguous kernel text maps when
      loading the DSO, we should do it as a two step process, first
      creating the maps, for multiple mappings with the same DSO
      store, then doing the dso load just once, for the first hit on
      one of the maps sharing this DSO backing store.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-6-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4aa65636
    • A
      perf session: Move the global threads list to perf_session · b3165f41
      Arnaldo Carvalho de Melo 提交于
      So that we can process two perf.data files.
      
      We still need to add a O_MMAP mode for perf_session so that we
      can do all the mmap stuff in it.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-5-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3165f41
    • A
      perf session: Register the idle thread in perf_session__process_events · 13df45ca
      Arnaldo Carvalho de Melo 提交于
      No need for all tools to register it and then immediately call
      perf_session__process_events.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260741029-4430-3-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      13df45ca
  5. 12 12月, 2009 2 次提交
    • A
      perf symbols: Allow lookups by symbol name too · 79406cd7
      Arnaldo Carvalho de Melo 提交于
      Configurable via symbol_conf.sort_by_name, so that the cost of an
      extra rb_node on all 'struct symbol' instances is not paid by tools
      that only want to decode addresses.
      
      How to use it:
      
      	symbol_conf.sort_by_name = true;
      	symbol_init(&symbol_conf);
      
      	struct map *map = map_groups__find_by_name(kmaps, MAP__VARIABLE, "[kernel.kallsyms]");
      
      	if (map == NULL) {
      		pr_err("couldn't find map!\n");
      		kernel_maps__fprintf(stdout);
      	} else {
      		struct symbol *sym = map__find_symbol_by_name(map, sym_filter, NULL);
      		if (sym == NULL)
      			pr_err("couldn't find symbol %s!\n", sym_filter);
      		else
      			pr_info("symbol %s: %#Lx-%#Lx \n", sym_filter, sym->start, sym->end);
      	}
      
      Looking over the vmlinux/kallsyms is common enough that I'll add a
      variable to the upcoming struct perf_session to avoid the need to
      use map_groups__find_by_name to get the main vmlinux/kallsyms map.
      
      The above example looks on the 'variable' symtab, but it is just
      like that for the functions one.
      
      Also the sort operation is done when we first use
      map__find_symbol_by_name, in a lazy way.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260564622-12392-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      79406cd7
    • A
      perf symbols: Rename kthreads to kmaps, using another abstraction for it · 9958e1f0
      Arnaldo Carvalho de Melo 提交于
      Using a struct thread instance just to hold the kernel space maps
      (vmlinux + modules) is overkill and confuses people trying to
      understand the perf symbols abstractions.
      
      The kernel maps are really present in all threads, i.e. the kernel
      is a library, not a separate thread.
      
      So introduce the 'map_groups' abstraction and use it for the kernel
      maps, now in the kmaps global variable.
      
      It, in turn, will move, together with the threads list to the
      perf_file abstraction, so that we can support multiple perf_file
      instances, needed by perf diff.
      
      Brainstormed-with: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1260550239-5372-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9958e1f0
  6. 28 11月, 2009 4 次提交
    • A
      perf tools: Consolidate symbol resolving across all tools · 1ed091c4
      Arnaldo Carvalho de Melo 提交于
      Now we have a very high level routine for simple tools to
      process IP sample events:
      
      	int event__preprocess_sample(const event_t *self,
      				     struct addr_location *al,
      				     symbol_filter_t filter)
      
      It receives the event itself and will insert new threads in the
      global threads list and resolve the map and symbol, filling all
      this info into the new addr_location struct, so that tools like
      annotate and report can further process the event by creating
      hist_entries in their specific way (with or without callgraphs,
      etc).
      
      It in turn uses the new next layer function:
      
      	void thread__find_addr_location(struct thread *self, u8 cpumode,
      					enum map_type type, u64 addr,
      					struct addr_location *al,
      					symbol_filter_t filter)
      
      This one will, given a thread (userspace or the kernel kthread
      one), will find the given type (MAP__FUNCTION now, MAP__VARIABLE
      too in the near future) at the given cpumode, taking vdsos into
      account (userspace hit, but kernel symbol) and will fill all
      these details in the addr_location given.
      
      Tools that need a more compact API for plain function
      resolution, like 'kmem', can use this other one:
      
      	struct symbol *thread__find_function(struct thread *self, u64 addr,
      					     symbol_filter_t filter)
      
      So, to resolve a kernel symbol, that is all the 'kmem' tool
      needs, its just a matter of calling:
      
      	sym = thread__find_function(kthread, addr, NULL);
      
      The 'filter' parameter is needed because we do lazy
      parsing/loading of ELF symtabs or /proc/kallsyms.
      
      With this we remove more code duplication all around, which is
      always good, huh? :-)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-12-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ed091c4
    • A
      perf symbols: When not using modules, discard its symbols · 1de8e245
      Arnaldo Carvalho de Melo 提交于
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-10-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1de8e245
    • A
      perf symbols: Support multiple symtabs in struct thread · 95011c60
      Arnaldo Carvalho de Melo 提交于
      Making the routines that were so far specific to the kernel maps
      useful for all threads.
      
      This is done by making the kernel maps be contained in a kernel
      "thread".
      
      This gets the kernel specific routines closer to the userspace
      counterparts, which will help in reducing the boilerplate for
      resolving a symbol, as will be demonstrated in the next patches.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-9-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      95011c60
    • A
      perf symbols: Kernel_maps should be an array of MAP__NR_TYPES entries · 23ea4a3f
      Arnaldo Carvalho de Melo 提交于
      So that we can support multiple symbol table types.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1259346563-12568-8-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      23ea4a3f
  7. 24 11月, 2009 1 次提交
  8. 21 11月, 2009 1 次提交
    • A
      perf symbols: Do lazy symtab loading for the kernel & modules too · c338aee8
      Arnaldo Carvalho de Melo 提交于
      Just like we do with the other DSOs. This also simplifies the
      kernel_maps setup process, now all that the tools need to do is
      to call kernel_maps__init and the maps for the modules and
      kernel will be created, then, later, when
      kernel_maps__find_symbol() is used, it will also call
      maps__find_symbol that already checks if the symtab was loaded,
      loading it if needed.
      
      Now if one does 'perf top --hide_kernel_symbols' we won't pay
      the price of loading the (many) symbols in /proc/kallsyms or
      vmlinux.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1258757489-5978-4-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c338aee8
  9. 23 10月, 2009 1 次提交
    • F
      perf tools: Bind callchains to the first sort dimension column · a4fb581b
      Frederic Weisbecker 提交于
      Currently, the callchains are displayed using a constant left
      margin. So depending on the current sort dimension
      configuration, callchains may appear to be well attached to the
      first sort dimension column field which is mostly the case,
      except when the first dimension of sorting is done by comm,
      because these are right aligned.
      
      This patch binds the callchain to the first letter in the first
      column, whatever type of column it is (dso, comm, symbol).
      Before:
      
           0.80%             perf  [k] __lock_acquire
                   __lock_acquire
                   lock_acquire
                   |
                   |--58.33%-- _spin_lock
                   |          |
                   |          |--28.57%-- inotify_should_send_event
                   |          |          fsnotify
                   |          |          __fsnotify_parent
      
      After:
      
           0.80%             perf  [k] __lock_acquire
                             __lock_acquire
                             lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                             |          |          __fsnotify_parent
      
      Also, for clarity, we don't put anymore the callchain as is but:
      
      - If we have a top level ancestor in the callchain, start it
        with a first ascii hook.
      
        Before:
      
           0.80%             perf  [kernel]                        [k] __lock_acquire
                             __lock_acquire
                               lock_acquire
                             |
                             |--58.33%-- _spin_lock
                             |          |
                             |          |--28.57%-- inotify_should_send_event
                             |          |          fsnotify
                            [..]       [..]
      
         After:
      
           0.80%             perf  [kernel]                         [k] __lock_acquire
                             |
                             --- __lock_acquire
                                 lock_acquire
                                |
                                |--58.33%-- _spin_lock
                                |          |
                                |          |--28.57%-- inotify_should_send_event
                                |          |          fsnotify
                               [..]       [..]
      
      - Otherwise, if we have several top level ancestors, then
        display these like we did before:
      
             1.69%           Xorg
                             |
                             |--21.21%-- vread_hpet
                             |          0x7fffd85b46fc
                             |          0x7fffd85b494d
                             |          0x7f4fafb4e54d
                             |
                             |--15.15%-- exaOffscreenAlloc
                             |
                             |--9.09%-- I830WaitLpRing
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4fb581b
  10. 13 10月, 2009 1 次提交
  11. 09 10月, 2009 1 次提交
    • F
      perf tools: Fix thread comm resolution in perf sched · 97ea1a7f
      Frederic Weisbecker 提交于
      This reverts commit 9a92b479 ("perf
      tools: Improve thread comm resolution in perf sched") and fixes the
      real bug.
      
      The bug was elsewhere:
      
      We are failing to resolve thread names in perf sched because the
      table of threads we are building, on top of comm events, has a per
      process granularity. But perf sched, unlike the other perf tools,
      needs a per thread granularity as we are profiling every tasks
      individually.
      
      So fix it by building our threads table using the tid instead of
      the pid as the thread identifier.
      
      v2: Revert the previous fix - it is not really needed
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1255028657-11158-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97ea1a7f
  12. 08 10月, 2009 1 次提交
    • F
      perf tools: Improve thread comm resolution in perf sched · 9a92b479
      Frederic Weisbecker 提交于
      When we get sched traces that involve a task that was already
      created before opening the event, we won't have the comm event for
      it.
      
      So if we can't find the comm event for a given thread, we look at
      the traces that may contain these informations.
      
      Before:
      
       ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
       kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
       kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
       :5124:5124            |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
       :6244:6244            |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
       firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
       npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
       :6245:6245            |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |
      
      After:
      
       ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
       kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
       kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
       firefox:5124          |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
       npviewer.bin:6244     |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
       firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
       npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
       npviewer.bin:6245     |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1255012632-7882-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a92b479
  13. 02 10月, 2009 1 次提交
    • A
      perf tools: Rewrite and improve support for kernel modules · 439d473b
      Arnaldo Carvalho de Melo 提交于
      Representing modules as struct map entries, backed by a DSO, etc,
      using /proc/modules to find where the module is loaded.
      
      DSOs now can have a short and long name, so that in verbose mode we
      can show exactly which .ko or vmlinux image was used.
      
      As kernel modules now are a DSO separate from the kernel, we can
      ask for just the hits for a particular set of kernel modules, just
      like we can do with shared libraries:
      
      [root@doppio linux-2.6-tip]# perf report -n --vmlinux
      /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15
          84.58%      13266             Xorg  [k] drm_clflush_pages
           4.02%        630             Xorg  [k] trace_kmalloc.clone.0
           3.95%        619             Xorg  [k] drm_ioctl
           2.07%        324             Xorg  [k] drm_addbufs
           1.68%        263             Xorg  [k] drm_gem_close_ioctl
           0.77%        120             Xorg  [k] drm_setmaster_ioctl
           0.70%        110             Xorg  [k] drm_lastclose
           0.68%        106             Xorg  [k] drm_open
           0.54%         85             Xorg  [k] drm_mm_search_free
      [root@doppio linux-2.6-tip]#
      
      Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko
      would have the same effect. Allowing specifying just 'drm.ko' is left
      for another patch.
      
      Processing kallsyms so that per kernel module struct map are
      instantiated was also left for another patch. That will allow
      removing the module name from each of its symbols.
      
      struct symbol was reduced by removing the ->module backpointer and
      moving it (well now the map) to struct symbol_entry in perf top,
      that is its only user right now.
      
      The total linecount went down by ~500 lines.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      439d473b
  14. 30 9月, 2009 1 次提交
    • A
      perf tools: Use rb_tree for maps · 1b46cddf
      Arnaldo Carvalho de Melo 提交于
      Threads can have many and kernel modules will be represented as a
      tree of maps as well.
      
      Ah, and for a perf.data with 146607 samples:
      
      Before:
      
      [root@doppio ~]# perf stat -r 5 perf report > /dev/null
      
       Performance counter stats for 'perf report' (5 runs):
      
           699.823680  task-clock-msecs         #      0.991 CPUs    ( +-   0.454% )
                   74  context-switches         #      0.000 M/sec   ( +-   1.709% )
                    2  CPU-migrations           #      0.000 M/sec   ( +-  17.008% )
                23114  page-faults              #      0.033 M/sec   ( +-   0.000% )
           1381257019  cycles                   #   1973.721 M/sec   ( +-   0.290% )
           1456894438  instructions             #      1.055 IPC     ( +-   0.007% )
             18779818  cache-references         #     26.835 M/sec   ( +-   0.380% )
               641799  cache-misses             #      0.917 M/sec   ( +-   1.200% )
      
          0.705972729  seconds time elapsed   ( +-   0.501% )
      
      [root@doppio ~]#
      
      After
      
       Performance counter stats for 'perf report' (5 runs):
      
           691.261451  task-clock-msecs         #      0.993 CPUs    ( +-   0.307% )
                   72  context-switches         #      0.000 M/sec   ( +-   0.829% )
                    6  CPU-migrations           #      0.000 M/sec   ( +-  18.409% )
                23127  page-faults              #      0.033 M/sec   ( +-   0.000% )
           1366395876  cycles                   #   1976.670 M/sec   ( +-   0.153% )
           1443136016  instructions             #      1.056 IPC     ( +-   0.012% )
             17956402  cache-references         #     25.976 M/sec   ( +-   0.325% )
               661924  cache-misses             #      0.958 M/sec   ( +-   1.335% )
      
          0.696127275  seconds time elapsed   ( +-   0.377% )
      
      I.e. we see some speedup too.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      LKML-Reference: <20090928174846.GA3361@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b46cddf
  15. 25 9月, 2009 1 次提交
  16. 16 9月, 2009 1 次提交
    • I
      perf sched: Add 'perf sched map' scheduling event map printout · 0ec04e16
      Ingo Molnar 提交于
      This prints a textual context-switching outline of workload
      captured via perf sched record.
      
      For example, on a 16 CPU box it outputs:
      
         N1  O1  .   .   .   S1  .   .   .   B0  .  *I0  C1  .   M1  .    23002.773423 secs
         N1  O1  .  *Q0  .   S1  .   .   .   B0  .   I0  C1  .   M1  .    23002.773423 secs
         N1  O1  .   Q0  .   S1  .   .   .   B0  .  *R1  C1  .   M1  .    23002.773485 secs
         N1  O1  .   Q0  .   S1  .  *S0  .   B0  .   R1  C1  .   M1  .    23002.773478 secs
        *L0  O1  .   Q0  .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773523 secs
         L0  O1  .  *.   .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773531 secs
         L0  O1  .   .   .   S1  .   S0  .   B0  .   R1  C1 *T1  M1  .    23002.773547 secs T1 => irqbalance:2089
         L0  O1  .   .   .   S1  .   S0  .  *P0  .   R1  C1  T1  M1  .    23002.773549 secs
        *N1  O1  .   .   .   S1  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773566 secs
         N1  O1  .   .   .  *J0  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773571 secs
         N1  O1  .   .   .   J0  .   S0 *B0  P0  .   R1  C1  T1  M1  .    23002.773592 secs
         N1  O1  .   .   .   J0  .  *U0  B0  P0  .   R1  C1  T1  M1  .    23002.773582 secs
         N1  O1  .   .   .  *S1  .   U0  B0  P0  .   R1  C1  T1  M1  .    23002.773604 secs
         N1  O1  .   .   .   S1  .   U0  B0 *.   .   R1  C1  T1  M1  .    23002.773615 secs
         N1  O1  .   .   .   S1  .   U0  B0  .   .  *K0  C1  T1  M1  .    23002.773631 secs
         N1  O1  .  *M0  .   S1  .   U0  B0  .   .   K0  C1  T1  M1  .    23002.773624 secs
         N1  O1  .   M0  .   S1  .   U0 *.   .   .   K0  C1  T1  M1  .    23002.773644 secs
         N1  O1  .   M0  .   S1  .   U0  .   .   .  *R1  C1  T1  M1  .    23002.773662 secs
         N1  O1  .   M0  .   S1  .  *.   .   .   .   R1  C1  T1  M1  .    23002.773648 secs
         N1  O1  .  *.   .   S1  .   .   .   .   .   R1  C1  T1  M1  .    23002.773680 secs
         N1  O1  .   .   .  *L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773717 secs
        *N0  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773709 secs
        *N1  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773747 secs
      
      Columns stand for individual CPUs, from CPU0 to CPU15, and the
      two-letter shortcuts stand for tasks that are running on a CPU.
      
      '*' denotes the CPU that had the event.
      
      A dot signals an idle CPU.
      
      New tasks are assigned new two-letter shortcuts - when they occur
      first they are printed. In the above example 'T1' stood for irqbalance:
      
            T1 => irqbalance:2089
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec04e16
  17. 13 9月, 2009 1 次提交
    • I
      perf sched: Clean up PID sorting logic · b5fae128
      Ingo Molnar 提交于
      Use a sort list for thread atoms insertion as well - instead of
      hardcoded for PID.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5fae128
  18. 31 8月, 2009 1 次提交
  19. 15 8月, 2009 1 次提交