1. 27 6月, 2009 2 次提交
    • I
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar 提交于
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cfb7a13
    • I
      perf_counter tools: Remove dead code · fde953c1
      Ingo Molnar 提交于
      Vince Weaver reported that there's a handful of #ifdef __MINGW32__
      sections in the code.
      
      Remove them as they are in essence dead code - as unlike upstream
      Git, the perf tool is unlikely to be ported to Windows.
      Reported-by: NVince Weaver <vince@deater.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fde953c1
  2. 26 6月, 2009 14 次提交
  3. 25 6月, 2009 4 次提交
    • J
      perf_counter tools: Shorten names for events · e5c59547
      Jaswinder Singh Rajput 提交于
      Added new alias for events.
      
      On AMD box:
      
       $ ./perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses -- ls -lR /usr/include/ > /dev/null
      
      Before :
      
       Performance counter stats for 'ls -lR /usr/include/':
      
            248064467  L1-data-Cache-Load-Referencees  (scaled from 23.27%)
              1001433  L1-data-Cache-Load-Misses  (scaled from 23.34%)
               153691  L1-data-Cache-Store-Referencees  (scaled from 23.34%)
               423248  L1-data-Cache-Prefetch-Referencees  (scaled from 23.33%)
               302138  L1-data-Cache-Prefetch-Misses  (scaled from 23.25%)
            251217546  L1-instruction-Cache-Load-Referencees  (scaled from 23.25%)
              5757005  L1-instruction-Cache-Load-Misses  (scaled from 23.23%)
                93435  L1-instruction-Cache-Prefetch-Referencees  (scaled from 23.24%)
              6496073  L2-Cache-Load-Referencees  (scaled from 23.32%)
               609485  L2-Cache-Load-Misses  (scaled from 23.45%)
              6876991  L2-Cache-Store-Referencees  (scaled from 23.71%)
            248922840  Data-TLB-Cache-Load-Referencees  (scaled from 23.94%)
              5828386  Data-TLB-Cache-Load-Misses  (scaled from 24.17%)
            257613506  Instruction-TLB-Cache-Load-Referencees  (scaled from 24.20%)
                 6833  Instruction-TLB-Cache-Load-Misses  (scaled from 23.88%)
            109043606  Branch-Cache-Load-Referencees  (scaled from 23.64%)
              5552296  Branch-Cache-Load-Misses  (scaled from 23.42%)
      
          0.413702461  seconds time elapsed.
      
      After :
      
       Peformance counter stats for 'ls -lR /usr/include/':
      
            266590464  L1-d$-loads           (scaled from 23.03%)
              1222273  L1-d$-load-misses     (scaled from 23.58%)
               146204  L1-d$-stores          (scaled from 23.83%)
               406344  L1-d$-prefetches      (scaled from 24.09%)
               283748  L1-d$-prefetch-misses (scaled from 24.10%)
            249650965  L1-i$-loads           (scaled from 23.80%)
              3353961  L1-i$-load-misses     (scaled from 23.82%)
               104599  L1-i$-prefetches      (scaled from 23.68%)
              4836405  LLC-loads             (scaled from 23.67%)
               498214  LLC-load-misses       (scaled from 23.66%)
              4953994  LLC-stores            (scaled from 23.64%)
            243354097  dTLB-loads            (scaled from 23.77%)
              6468584  dTLB-load-misses      (scaled from 23.74%)
            249719549  iTLB-loads            (scaled from 23.25%)
                 5060  iTLB-load-misses      (scaled from 23.00%)
            112343016  branch-loads          (scaled from 22.76%)
              5528876  branch-load-misses    (scaled from 22.54%)
      
          0.427154051  seconds time elapsed.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245934522.5308.39.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5c59547
    • J
      perf_counter tools: Check for valid cache operations · 06813f6c
      Jaswinder Singh Rajput 提交于
      Made new table for cache operartion stat 'hw_cache_stat' as:
      
       L1I : Read and prefetch only
       ITLB and BPU : Read-only
      
      introduce is_cache_op_valid() for cache operation validity
      
      And checks for valid cache operations.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245930367.5308.33.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06813f6c
    • J
      perf record: Fix filemap pathname parsing in /proc/pid/maps · 76c64c5e
      Johannes Weiner 提交于
      Looking backward for the first space from the end of a line in
      /proc/pid/maps does not find the start of the pathname of the mapped
      file if it contains a space.
      
      Since the only slashes we have in this file occur in the (absolute!)
      pathname column of file mappings, looking for the first slash in a
      line is a safe method to find the name.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090624190835.GA25548@cmpxchg.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76c64c5e
    • I
      perf_counter tools: Add CREDITS file for Git contributors · 1b173f77
      Ingo Molnar 提交于
      Much of perf's libraries comes from the Git project. I noticed
      that the files (in tools/perf/util/*.[ch] and elsewhere) are
      quite spartan wrt. credits, so lets add a CREDITS file that
      includes an (incomplete!) list of main contributors.
      
      Thanks guys, these libraries are really useful. Special thanks
      go to Johannes Schindelin and Junio C Hamano for coming up with
      this list.
      List-Composed-By: NJohannes Schindelin <Johannes.Schindelin@gmx.de>
      Cc: Junio C Hamano <gitster@pobox.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b173f77
  4. 24 6月, 2009 4 次提交
    • J
      perf stat: Remove dead code · 3d632595
      Jaswinder Singh Rajput 提交于
      Remove dead code and do some code alignment.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245847774.2681.2.camel@ht.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3d632595
    • Y
      perf_counter, x86: Set global control MSR correctly · c14dab5c
      Yong Wang 提交于
      Previous code made an assumption that the power on value of global
      control MSR has enabled all fixed and general purpose counters properly.
      
      However, this is not the case for certain Intel processors, such as
      Atom - and it might also be firmware dependent.
      
      Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
      enable bits for all privilege levels in the respective IA32_PERFEVTSELx
      or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
      respective counters. Counting is enabled if the AND'ed results is true;
      counting is disabled when the result is false.
      
      The end result is that all fixed counters are always disabled on Atom
      processors because the assumption is just invalid.
      
      Fix this by not initializing the ctrl-mask out of the global MSR,
      but setting it to perf_counter_mask.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c14dab5c
    • R
      perf_counter tools: Fix strbuf_fread() error path handling · f7679dab
      Roel Kluin 提交于
      size_t res cannot be less than 0 - fread returns 0 on error.
      
      [ Updated by: René Scharfe <rene.scharfe@lsrfire.ath.cx> ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Junio C Hamano <gitster@pobox.com>
      LKML-Reference: <4A3FB479.2090902@lsrfire.ath.cx>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7679dab
    • J
      perf stat: Fix verbose for perf stat · cca03c0a
      Jaswinder Singh Rajput 提交于
      Error message should use stderr for verbose (-v), otherwise
      message will be lost for:
      
       $ ./perf stat -v <cmd>  > /dev/null
      
      For example on AMD bus-cycles event is not available so now
      it looks like:
      
       $ ./perf stat -v -e bus-cycles ls > /dev/null
      Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)
      
       Performance counter stats for 'ls':
      
        <not counted>  bus-cycles
      
          0.006765877  seconds time elapsed.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245757369.3776.1.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cca03c0a
  5. 23 6月, 2009 6 次提交
  6. 22 6月, 2009 4 次提交
  7. 21 6月, 2009 6 次提交
    • I
      perf_counter tools: Fix vmlinux fallback when running on a different kernel · c1f47b45
      Ingo Molnar 提交于
      Lucas De Marchi reported that perf report and perf annotate
      displays mismatching profile if a perf.data is analyzed on
      an older kernel - even if the correct vmlinux is specified
      via the -k option.
      
      The reason is the fallback path in util/symbol.c:dso__load_kernel():
      
      int dso__load_kernel(struct dso *self, const char *vmlinux,
                           symbol_filter_t filter, int verbose)
      {
              int err = -1;
      
              if (vmlinux)
                      err = dso__load_vmlinux(self, vmlinux, filter, verbose);
      
              if (err)
                      err = dso__load_kallsyms(self, filter, verbose);
      
              return err;
      }
      
      dso__load_vmlinux() returns negative on error, but on success it
      returns the number of symbols loaded - which confuses the function
      to load the kallsyms.
      
      This is normally harmless, as reporting is usually performed on the
      same kernel that is analyzed - but if there's a mismatch then we
      load the wrong kallsyms and create a non-sensical symbol tree.
      
      The fix is to only fall back to kallsyms on errors.
      Reported-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1f47b45
    • J
      perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD · d9f2a5ec
      Jaswinder Singh Rajput 提交于
      Fix AMD's Data Cache Refills from System event.
      
      After this patch :
      
       ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null
      
       Performance counter stats for 'ls /dev/':
      
              2499484  L1-data-Cache-Load-Referencees             (scaled from 3.97%)
                70347  L1-data-Cache-Load-Misses                  (scaled from 7.30%)
                 9360  L1-data-Cache-Store-Referencees            (scaled from 8.64%)
                32804  L1-data-Cache-Prefetch-Referencees         (scaled from 17.72%)
                 7693  L1-data-Cache-Prefetch-Misses              (scaled from 22.97%)
              2180945  L1-instruction-Cache-Load-Referencees      (scaled from 28.48%)
                14518  L1-instruction-Cache-Load-Misses           (scaled from 35.00%)
                 2405  L1-instruction-Cache-Prefetch-Referencees  (scaled from 34.89%)
                71387  L2-Cache-Load-Referencees                  (scaled from 34.94%)
                18732  L2-Cache-Load-Misses                       (scaled from 34.92%)
                79918  L2-Cache-Store-Referencees                 (scaled from 36.02%)
              1295294  Data-TLB-Cache-Load-Referencees            (scaled from 35.99%)
                30896  Data-TLB-Cache-Load-Misses                 (scaled from 33.36%)
              1222030  Instruction-TLB-Cache-Load-Referencees     (scaled from 29.46%)
                  357  Instruction-TLB-Cache-Load-Misses          (scaled from 20.46%)
               530888  Branch-Cache-Load-Referencees              (scaled from 11.48%)
                 8638  Branch-Cache-Load-Misses                   (scaled from 5.09%)
      
          0.011295149  seconds time elapsed.
      
      Earlier it always shows value 0.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f2a5ec
    • J
      mm: page_alloc: clear PG_locked before checking flags on free · c277331d
      Johannes Weiner 提交于
      da456f14 "page allocator: do not disable interrupts in free_page_mlock()" moved
      the PG_mlocked clearing after the flag sanity checking which makes mlocked
      pages always trigger 'bad page'.  Fix this by clearing the bit up front.
      Reported--and-debugged-by: NPeter Chubb <peter.chubb@nicta.com.au>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Tested-by: NMaxim Levitsky <maximlevitsky@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c277331d
    • L
      x86, 64-bit: Clean up user address masking · 9063c61f
      Linus Torvalds 提交于
      The discussion about using "access_ok()" in get_user_pages_fast() (see
      commit 7f818906: "x86: don't use
      'access_ok()' as a range check in get_user_pages_fast()" for details and
      end result), made us notice that x86-64 was really being very sloppy
      about virtual address checking.
      
      So be way more careful and straightforward about masking x86-64 virtual
      addresses:
      
       - All the VIRTUAL_MASK* variants now cover half of the address
         space, it's not like we can use the full mask on a signed
         integer, and the larger mask just invites mistakes when
         applying it to either half of the 48-bit address space.
      
       - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
         obvious when it transforms a file offset into a
         (kernel-half) virtual address.
      
       - Unify/simplify the 32-bit and 64-bit USER_DS definition to
         be based on TASK_SIZE_MAX.
      
      This cleanup and more careful/obvious user virtual address checking also
      uncovered a buglet in the x86-64 implementation of strnlen_user(): it
      would do an "access_ok()" check on the whole potential area, even if the
      string itself was much shorter, and thus return an error even for valid
      strings. Our sloppy checking had hidden this.
      
      So this fixes 'strnlen_user()' to do this properly, the same way we
      already handled user strings in 'strncpy_from_user()'.  Namely by just
      checking the first byte, and then relying on fault handling for the
      rest.  That always works, since we impose a guard page that cannot be
      mapped at the end of the user space address space (and even if we
      didn't, we'd have the address space hole).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9063c61f
    • L
      Merge branch 'irq-fixes-for-linus' of... · 2453d6ff
      Linus Torvalds 提交于
      Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        genirq, irq.h: Fix kernel-doc warnings
        genirq: fix comment to say IRQ_WAKE_THREAD
      2453d6ff
    • L
      Merge branch 'perfcounters-fixes-for-linus' of... · 12e24f34
      Linus Torvalds 提交于
      Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
        perfcounter: Handle some IO return values
        perf_counter: Push perf_sample_data through the swcounter code
        perf_counter tools: Define and use our own u64, s64 etc. definitions
        perf_counter: Close race in perf_lock_task_context()
        perf_counter, x86: Improve interactions with fast-gup
        perf_counter: Simplify and fix task migration counting
        perf_counter tools: Add a data file header
        perf_counter: Update userspace callchain sampling uses
        perf_counter: Make callchain samples extensible
        perf report: Filter to parent set by default
        perf_counter tools: Handle lost events
        perf_counter: Add event overlow handling
        fs: Provide empty .set_page_dirty() aop for anon inodes
        perf_counter: tools: Makefile tweaks for 64-bit powerpc
        perf_counter: powerpc: Add processor back-end for MPC7450 family
        perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
        perf_counter: powerpc: Change how processor-specific back-ends get selected
        perf_counter: powerpc: Use unsigned long for register and constraint values
        perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
        perf_counter tools: Add and use isprint()
        ...
      12e24f34