1. 28 6月, 2009 2 次提交
    • J
      perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run · c3043569
      Jaswinder Singh Rajput 提交于
      Set attrs and nr_counters if no event is selected and !null_run.
      
      Setting of attrs should depend on number of counters,
      so we need to memcpy only for sizeof(default_attrs)
      
      Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
      hardcoded value.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c3043569
    • J
      perf stat: Improve output · 6e750a8f
      Jaswinder Singh Rajput 提交于
      Increase size for event name to handle bigger names like
      'L1-d$-prefetch-misses'
      
      Changed scaled counters from percentage to a multiplicative
      factor because the latter is more expressive.
      
      Also aligned the scaling factor, otherwise sometimes it looks
      like:
      
                  384  iTLB-load-misses           (4.74x scaled)
               452029  branch-loads               (8.00x scaled)
                 5892  branch-load-misses         (20.39x scaled)
               972315  iTLB-loads                 (3.24x scaled)
      
      Before:
               150708  L1-d$-stores          (scaled from 23.57%)
               428804  L1-d$-prefetches      (scaled from 23.47%)
               314446  L1-d$-prefetch-misses  (scaled from 23.42%)
            252626137  L1-i$-loads           (scaled from 23.24%)
              5297550  dTLB-load-misses      (scaled from 23.96%)
            106992392  branch-loads          (scaled from 23.67%)
              5239561  branch-load-misses    (scaled from 23.43%)
      
      After:
              1731713  L1-d$-loads               (  14.25x scaled)
                44241  L1-d$-prefetches          (   3.88x scaled)
                21076  L1-d$-prefetch-misses     (   3.40x scaled)
              5789421  L1-i$-loads               (   3.78x scaled)
                29645  dTLB-load-misses          (   2.95x scaled)
               461474  branch-loads              (   6.52x scaled)
                 7493  branch-load-misses        (  26.57x scaled)
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e750a8f
  2. 27 6月, 2009 3 次提交
    • I
      perf stat: Fix multi-run stats · 566747e6
      Ingo Molnar 提交于
      In multi-run (-r/--repeat) printouts, print out the noise of
      the wall-clock average as well.
      
      Also, fix a bug in printing out scaled counters: if it was not
      scaled then we should not update the average with -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      566747e6
    • I
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar 提交于
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cfb7a13
    • I
      perf_counter tools: Remove dead code · fde953c1
      Ingo Molnar 提交于
      Vince Weaver reported that there's a handful of #ifdef __MINGW32__
      sections in the code.
      
      Remove them as they are in essence dead code - as unlike upstream
      Git, the perf tool is unlikely to be ported to Windows.
      Reported-by: NVince Weaver <vince@deater.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fde953c1
  3. 26 6月, 2009 14 次提交
  4. 25 6月, 2009 4 次提交
    • J
      perf_counter tools: Shorten names for events · e5c59547
      Jaswinder Singh Rajput 提交于
      Added new alias for events.
      
      On AMD box:
      
       $ ./perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses -- ls -lR /usr/include/ > /dev/null
      
      Before :
      
       Performance counter stats for 'ls -lR /usr/include/':
      
            248064467  L1-data-Cache-Load-Referencees  (scaled from 23.27%)
              1001433  L1-data-Cache-Load-Misses  (scaled from 23.34%)
               153691  L1-data-Cache-Store-Referencees  (scaled from 23.34%)
               423248  L1-data-Cache-Prefetch-Referencees  (scaled from 23.33%)
               302138  L1-data-Cache-Prefetch-Misses  (scaled from 23.25%)
            251217546  L1-instruction-Cache-Load-Referencees  (scaled from 23.25%)
              5757005  L1-instruction-Cache-Load-Misses  (scaled from 23.23%)
                93435  L1-instruction-Cache-Prefetch-Referencees  (scaled from 23.24%)
              6496073  L2-Cache-Load-Referencees  (scaled from 23.32%)
               609485  L2-Cache-Load-Misses  (scaled from 23.45%)
              6876991  L2-Cache-Store-Referencees  (scaled from 23.71%)
            248922840  Data-TLB-Cache-Load-Referencees  (scaled from 23.94%)
              5828386  Data-TLB-Cache-Load-Misses  (scaled from 24.17%)
            257613506  Instruction-TLB-Cache-Load-Referencees  (scaled from 24.20%)
                 6833  Instruction-TLB-Cache-Load-Misses  (scaled from 23.88%)
            109043606  Branch-Cache-Load-Referencees  (scaled from 23.64%)
              5552296  Branch-Cache-Load-Misses  (scaled from 23.42%)
      
          0.413702461  seconds time elapsed.
      
      After :
      
       Peformance counter stats for 'ls -lR /usr/include/':
      
            266590464  L1-d$-loads           (scaled from 23.03%)
              1222273  L1-d$-load-misses     (scaled from 23.58%)
               146204  L1-d$-stores          (scaled from 23.83%)
               406344  L1-d$-prefetches      (scaled from 24.09%)
               283748  L1-d$-prefetch-misses (scaled from 24.10%)
            249650965  L1-i$-loads           (scaled from 23.80%)
              3353961  L1-i$-load-misses     (scaled from 23.82%)
               104599  L1-i$-prefetches      (scaled from 23.68%)
              4836405  LLC-loads             (scaled from 23.67%)
               498214  LLC-load-misses       (scaled from 23.66%)
              4953994  LLC-stores            (scaled from 23.64%)
            243354097  dTLB-loads            (scaled from 23.77%)
              6468584  dTLB-load-misses      (scaled from 23.74%)
            249719549  iTLB-loads            (scaled from 23.25%)
                 5060  iTLB-load-misses      (scaled from 23.00%)
            112343016  branch-loads          (scaled from 22.76%)
              5528876  branch-load-misses    (scaled from 22.54%)
      
          0.427154051  seconds time elapsed.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245934522.5308.39.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5c59547
    • J
      perf_counter tools: Check for valid cache operations · 06813f6c
      Jaswinder Singh Rajput 提交于
      Made new table for cache operartion stat 'hw_cache_stat' as:
      
       L1I : Read and prefetch only
       ITLB and BPU : Read-only
      
      introduce is_cache_op_valid() for cache operation validity
      
      And checks for valid cache operations.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245930367.5308.33.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06813f6c
    • J
      perf record: Fix filemap pathname parsing in /proc/pid/maps · 76c64c5e
      Johannes Weiner 提交于
      Looking backward for the first space from the end of a line in
      /proc/pid/maps does not find the start of the pathname of the mapped
      file if it contains a space.
      
      Since the only slashes we have in this file occur in the (absolute!)
      pathname column of file mappings, looking for the first slash in a
      line is a safe method to find the name.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090624190835.GA25548@cmpxchg.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76c64c5e
    • I
      perf_counter tools: Add CREDITS file for Git contributors · 1b173f77
      Ingo Molnar 提交于
      Much of perf's libraries comes from the Git project. I noticed
      that the files (in tools/perf/util/*.[ch] and elsewhere) are
      quite spartan wrt. credits, so lets add a CREDITS file that
      includes an (incomplete!) list of main contributors.
      
      Thanks guys, these libraries are really useful. Special thanks
      go to Johannes Schindelin and Junio C Hamano for coming up with
      this list.
      List-Composed-By: NJohannes Schindelin <Johannes.Schindelin@gmx.de>
      Cc: Junio C Hamano <gitster@pobox.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b173f77
  5. 24 6月, 2009 4 次提交
    • J
      perf stat: Remove dead code · 3d632595
      Jaswinder Singh Rajput 提交于
      Remove dead code and do some code alignment.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245847774.2681.2.camel@ht.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3d632595
    • Y
      perf_counter, x86: Set global control MSR correctly · c14dab5c
      Yong Wang 提交于
      Previous code made an assumption that the power on value of global
      control MSR has enabled all fixed and general purpose counters properly.
      
      However, this is not the case for certain Intel processors, such as
      Atom - and it might also be firmware dependent.
      
      Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
      enable bits for all privilege levels in the respective IA32_PERFEVTSELx
      or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
      respective counters. Counting is enabled if the AND'ed results is true;
      counting is disabled when the result is false.
      
      The end result is that all fixed counters are always disabled on Atom
      processors because the assumption is just invalid.
      
      Fix this by not initializing the ctrl-mask out of the global MSR,
      but setting it to perf_counter_mask.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c14dab5c
    • R
      perf_counter tools: Fix strbuf_fread() error path handling · f7679dab
      Roel Kluin 提交于
      size_t res cannot be less than 0 - fread returns 0 on error.
      
      [ Updated by: René Scharfe <rene.scharfe@lsrfire.ath.cx> ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Junio C Hamano <gitster@pobox.com>
      LKML-Reference: <4A3FB479.2090902@lsrfire.ath.cx>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7679dab
    • J
      perf stat: Fix verbose for perf stat · cca03c0a
      Jaswinder Singh Rajput 提交于
      Error message should use stderr for verbose (-v), otherwise
      message will be lost for:
      
       $ ./perf stat -v <cmd>  > /dev/null
      
      For example on AMD bus-cycles event is not available so now
      it looks like:
      
       $ ./perf stat -v -e bus-cycles ls > /dev/null
      Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)
      
       Performance counter stats for 'ls':
      
        <not counted>  bus-cycles
      
          0.006765877  seconds time elapsed.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245757369.3776.1.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cca03c0a
  6. 23 6月, 2009 6 次提交
  7. 22 6月, 2009 4 次提交
  8. 21 6月, 2009 3 次提交
    • I
      perf_counter tools: Fix vmlinux fallback when running on a different kernel · c1f47b45
      Ingo Molnar 提交于
      Lucas De Marchi reported that perf report and perf annotate
      displays mismatching profile if a perf.data is analyzed on
      an older kernel - even if the correct vmlinux is specified
      via the -k option.
      
      The reason is the fallback path in util/symbol.c:dso__load_kernel():
      
      int dso__load_kernel(struct dso *self, const char *vmlinux,
                           symbol_filter_t filter, int verbose)
      {
              int err = -1;
      
              if (vmlinux)
                      err = dso__load_vmlinux(self, vmlinux, filter, verbose);
      
              if (err)
                      err = dso__load_kallsyms(self, filter, verbose);
      
              return err;
      }
      
      dso__load_vmlinux() returns negative on error, but on success it
      returns the number of symbols loaded - which confuses the function
      to load the kallsyms.
      
      This is normally harmless, as reporting is usually performed on the
      same kernel that is analyzed - but if there's a mismatch then we
      load the wrong kallsyms and create a non-sensical symbol tree.
      
      The fix is to only fall back to kallsyms on errors.
      Reported-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1f47b45
    • J
      perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD · d9f2a5ec
      Jaswinder Singh Rajput 提交于
      Fix AMD's Data Cache Refills from System event.
      
      After this patch :
      
       ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null
      
       Performance counter stats for 'ls /dev/':
      
              2499484  L1-data-Cache-Load-Referencees             (scaled from 3.97%)
                70347  L1-data-Cache-Load-Misses                  (scaled from 7.30%)
                 9360  L1-data-Cache-Store-Referencees            (scaled from 8.64%)
                32804  L1-data-Cache-Prefetch-Referencees         (scaled from 17.72%)
                 7693  L1-data-Cache-Prefetch-Misses              (scaled from 22.97%)
              2180945  L1-instruction-Cache-Load-Referencees      (scaled from 28.48%)
                14518  L1-instruction-Cache-Load-Misses           (scaled from 35.00%)
                 2405  L1-instruction-Cache-Prefetch-Referencees  (scaled from 34.89%)
                71387  L2-Cache-Load-Referencees                  (scaled from 34.94%)
                18732  L2-Cache-Load-Misses                       (scaled from 34.92%)
                79918  L2-Cache-Store-Referencees                 (scaled from 36.02%)
              1295294  Data-TLB-Cache-Load-Referencees            (scaled from 35.99%)
                30896  Data-TLB-Cache-Load-Misses                 (scaled from 33.36%)
              1222030  Instruction-TLB-Cache-Load-Referencees     (scaled from 29.46%)
                  357  Instruction-TLB-Cache-Load-Misses          (scaled from 20.46%)
               530888  Branch-Cache-Load-Referencees              (scaled from 11.48%)
                 8638  Branch-Cache-Load-Misses                   (scaled from 5.09%)
      
          0.011295149  seconds time elapsed.
      
      Earlier it always shows value 0.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f2a5ec
    • J
      mm: page_alloc: clear PG_locked before checking flags on free · c277331d
      Johannes Weiner 提交于
      da456f14 "page allocator: do not disable interrupts in free_page_mlock()" moved
      the PG_mlocked clearing after the flag sanity checking which makes mlocked
      pages always trigger 'bad page'.  Fix this by clearing the bit up front.
      Reported--and-debugged-by: NPeter Chubb <peter.chubb@nicta.com.au>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Tested-by: NMaxim Levitsky <maximlevitsky@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c277331d