1. 12 11月, 2013 2 次提交
    • A
      perf record: Synthesize non-exec MMAP records when --data used · 62605dc5
      Arnaldo Carvalho de Melo 提交于
      When perf_event_attr.mmap_data is set the kernel will generate
      PERF_RECORD_MMAP events when non-exec (data, SysV mem) mmaps are
      created, so we need to synthesize from /proc/pid/maps for existing
      threads, as we do for exec mmaps.
      
      Right now just 'perf record' does it, but any other tool that uses
      perf_event__synthesize_thread(s|map) can request it.
      Reported-by: NDon Zickus <dzickus@redhat.com>
      Tested-by: NDon Zickus <dzickus@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Bill Gray <bgray@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Fowles <rfowles@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ihwzraikx23ian9txinogvv2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      62605dc5
    • A
      perf evsel: Remove idx parm from constructor · ef503831
      Arnaldo Carvalho de Melo 提交于
      Most uses of the evsel constructor are followed by a call to
      perf_evlist__add with an idex of evlist->nr_entries, so make rename
      the current constructor to perf_evsel__new_idx and remove the need
      for passing the constructor for the common case.
      
      We still need the new_idx variant because the way groups are handled,
      with evsel->nr_members holding the number of entries in an evlist,
      partitioning the evlist into sublists inside a single linked list.
      
      This asks for a clarifying refactoring, but for now simplify the non
      parser cases, so that tool writers don't have to bother with evsel idx
      setting.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-zy9tskx6jqm2rmw7468zze2a@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef503831
  2. 07 11月, 2013 3 次提交
  3. 06 11月, 2013 6 次提交
  4. 05 11月, 2013 1 次提交
  5. 04 11月, 2013 13 次提交
  6. 01 11月, 2013 1 次提交
  7. 29 10月, 2013 3 次提交
  8. 28 10月, 2013 1 次提交
  9. 24 10月, 2013 3 次提交
    • J
      perf script python: Fix mem leak due to missing Py_DECREFs on dict entries · c0268e8d
      Joseph Schuchart 提交于
      We are using the Python scripting interface in perf to extract kernel
      events relevant for performance analysis of HPC codes. We noticed that
      the "perf script" call allocates a significant amount of memory (in the
      order of several 100 MiB) during it's run, e.g. 125 MiB for a 25 MiB
      input file:
      
        $> perf record -o perf.data -a -R -g fp \
             -e power:cpu_frequency -e sched:sched_switch \
             -e sched:sched_migrate_task -e sched:sched_process_exit \
             -e sched:sched_process_fork -e sched:sched_process_exec \
             -e cycles  -m 4096 --freq 4000
        $> /usr/bin/time perf script -i perf.data -s dummy_script.py
        0.84user 0.13system 0:01.92elapsed 51%CPU (0avgtext+0avgdata
        125532maxresident)k
        73072inputs+0outputs (57major+33086minor)pagefaults 0swaps
      
      Upon further investigation using the valgrind massif tool, we noticed
      that Python objects that are created in trace-event-python.c via
      PyString_FromString*() (and their Integer and Long counterparts) are
      never free'd.
      
      The reason for this seem to be missing Py_DECREF calls on the objects
      that are returned by these functions and stored in the Python
      dictionaries. The Python dictionaries do not steal references (as
      opposed to Python tuples and lists) but instead add their own reference.
      
      Hence, the reference that is returned by these object creation functions
      is never released and the memory is leaked. (see [1,2])
      
      The attached patch fixes this by wrapping all relevant calls to
      PyDict_SetItemString() and decrementing the reference counter
      immediately after the Python function call.
      
      This reduces the allocated memory to a reasonable amount:
      
        $> /usr/bin/time perf script -i perf.data -s dummy_script.py
        0.73user 0.05system 0:00.79elapsed 99%CPU (0avgtext+0avgdata
        49132maxresident)k
        0inputs+0outputs (0major+14045minor)pagefaults 0swaps
      
      For comparison, with a 120 MiB input file the memory consumption
      reported by time drops from almost 600 MiB to 146 MiB.
      
      The patch has been tested using Linux 3.8.2 with Python 2.7.4 and Linux
      3.11.6 with Python 2.7.5.
      
      Please let me know if you need any further information.
      
      [1] http://docs.python.org/2/c-api/tuple.html#PyTuple_SetItem
      [2] http://docs.python.org/2/c-api/dict.html#PyDict_SetItemStringSigned-off-by: NJoseph Schuchart <joseph.schuchart@tu-dresden.de>
      Reviewed-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Link: http://lkml.kernel.org/r/1381468543-25334-4-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0268e8d
    • N
      perf tools: Show progress on histogram collapsing · c1fb5651
      Namhyung Kim 提交于
      It can take quite amount of time so add progress bar UI to inform user.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1381468543-25334-4-git-send-email-namhyung@kernel.org
      [ perf_progress -> ui_progress ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1fb5651
    • A
      perf ui progress: Per progress bar state · 4d3001fd
      Arnaldo Carvalho de Melo 提交于
      That will ease using a progress bar across multiple functions, like in
      the upcoming patches that will present a progress bar when collapsing
      histograms.
      
      Based on a previous patch by Namhyung Kim.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-cr7lq7ud9fj21bg7wvq27w1u@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4d3001fd
  10. 23 10月, 2013 5 次提交
  11. 22 10月, 2013 2 次提交
    • W
      perf top: Add --max-stack option to limit callchain stack scan · 5dbb6e81
      Waiman Long 提交于
      When the callgraph function is enabled (-G), it may take a long time to
      scan all the stack data and merge them accordingly.
      
      This patch adds a new --max-stack option to perf-top to limit the depth
      of callchain stack data to look at to reduce the time it takes for
      perf-top to finish its processing. It reduces the amount of information
      provided to the user in exchange for faster speed.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1382107129-2010-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dbb6e81
    • W
      perf report: Add --max-stack option to limit callchain stack scan · 91e95617
      Waiman Long 提交于
      When callgraph data was included in the perf data file, it may take a
      long time to scan all those data and merge them together especially if
      the stored callchains are long and the perf data file itself is large,
      like a Gbyte or so.
      
      The callchain stack is currently limited to PERF_MAX_STACK_DEPTH (127).
      This is a large value. Usually the callgraph data that developers are
      most interested in are the first few levels, the rests are usually not
      looked at.
      
      This patch adds a new --max-stack option to perf-report to limit the
      depth of callchain stack data to look at to reduce the time it takes for
      perf-report to finish its processing. It trades the presence of trailing
      stack information with faster speed.
      
      The following table shows the elapsed time of doing perf-report on a
      perf.data file of size 985,531,828 bytes.
      
        --max_stack   Elapsed Time    Output data size
        -----------   ------------    ----------------
        not set        88.0s          124,422,651
        64             87.5s          116,303,213
        32             87.2s          112,023,804
        16             86.6s           94,326,380
        8              59.9s           33,697,248
        4              40.7s           10,116,637
        -g none        27.1s            2,555,810
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1382107129-2010-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91e95617