1. 02 10月, 2014 2 次提交
    • W
      perf tools: Fix build breakage on arm64 targets · 660d1329
      Will Deacon 提交于
      Attempting to build the perf tool for an arm64 target results in the
      following failure:
      
        arch/arm64/util/unwind-libunwind.c: In function 'libunwind__arch_reg_id':
        arch/arm64/util/unwind-libunwind.c:77:3: error: implicit declaration of function 'pr_err'
           pr_err("unwind: invalid reg id %d\n", regnum);
           ^
        arch/arm64/util/unwind-libunwind.c:77:3: error: nested extern declaration of 'pr_err'
      
      This is due to commit 84f5d36f ("perf tools: Move pr_* debug macros
      into debug object") moving the pr_* macros into a new header file, but
      failing to update architectures other than x86.
      
      This patch adds the missing include, and fixes the build again.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Jean Pihet <jean.pihet@linaro.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1412076432-22045-1-git-send-email-will.deacon@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      660d1329
    • W
      perf symbols: Improve DSO long names lookup speed with rbtree · 4598a0a6
      Waiman Long 提交于
      With workload that spawns and destroys many threads and processes, it
      was found that perf-mem could took a long time to post-process the perf
      data after the target workload had completed its operation.
      
      The performance bottleneck was found to be the lookup and insertion of
      the new DSO structures (thousands of them in this case).
      
      In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
      perf profile below shows what perf was doing after the profiled AIM7
      shared workload completed:
      
      -     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
         - __strcmp_sse42
            - 99.82% map__new
                 machine__process_mmap_event
                 perf_session_deliver_event
                 perf_session__process_event
                 __perf_session__process_events
                 cmd_record
                 cmd_mem
                 run_builtin
                 main
                 __libc_start_main
      -     13.17%  perf  perf               [.] __dsos__findnew
           __dsos__findnew
           map__new
           machine__process_mmap_event
           perf_session_deliver_event
           perf_session__process_event
           __perf_session__process_events
           cmd_record
           cmd_mem
           run_builtin
           main
           __libc_start_main
      
      So about 97% of CPU times were spent in the map__new() function trying
      to insert new DSO entry into the DSO linked list. The whole
      post-processing step took about 9 minutes.
      
      The DSO structures are currently searched linearly. So the total
      processing time will be proportional to n^2.
      
      To overcome this performance problem, the DSO code is modified to also
      put the DSO structures in a RB tree sorted by its long name in
      additional to being in a simple linked list. With this change, the
      processing time will become proportional to n*log(n) which will be much
      quicker for large n. However, the short name will still be searched
      using the old linear searching method.  With that patch in place, the
      same perf-mem post-processing step took less than 30 seconds to
      complete.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4598a0a6
  2. 30 9月, 2014 5 次提交
  3. 27 9月, 2014 1 次提交
    • I
      Merge tag 'perf-core-for-mingo' of... · 07394b5f
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        o Restore "--callchain graph" output, broken in recent cset to end
          up being the same as "fractal" (Namhyung Kim)
      
        o Allow profiling when kptr_restrict == 1 for non root users,
          kernel samples will just remain unresolved (Andi Kleen)
      
        o Allow configuring default options for callchains in config file (Namhyung Kim)
      
        o Fix line number in the config file error message (Jiri Olsa)
      
        o Fix --per-core on multi socket systems (Andi Kleen)
      
      Cleanups:
      
        o Use ACCESS_ONCE() instead of volatile cast. (Pranith Kumar)
      
        o Modify error code for when perf_session__new() fails (Taeung Song)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      07394b5f
  4. 26 9月, 2014 25 次提交
  5. 24 9月, 2014 7 次提交