提交 · 4b37af595742977f1bdd8c0fd0f3e6e55b36e7b7 · openeuler / Kernel

01 10月, 2015 20 次提交

perf top: Fix unresolved comm when -s comm is used · 4b37af59

由 Namhyung Kim 提交于 9月 30, 2015

The perf top uses 'dso,symbol' sort keys by default so it overlooked a
problem in task's comm resolving. When the sort key contains 'comm',
some task's comm is not shown properly. This is because the
perf_top__mmap_read_idx() checks the cpumode value improperly.

The cpumode value of non-sample events are 0 (PERF_RECORD_MISC_CPUMODE_
UNKNOWN) so the events will be ignored by the switch statement. This patch
allows it for non-sample events.
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1443577526-3240-2-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

4b37af59

perf record: Allocate area for sample_id_hdr in a synthesized comm event · e5bed564

由 Namhyung Kim 提交于 9月 30, 2015

A previous patch added a synthesized comm event for forked child process
but it missed that the event should contain area for sample_id_hdr at
the end.  It worked by accident since the perf_event union contains
bigger event structs like mmap_events.

This patch fixes it by dynamically allocating event struct including
those area like in perf_event__synthesize_thread_map().
Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1443577526-3240-1-git-send-email-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e5bed564

perf/x86/intel/uncore: Do not use macro DEFINE_PCI_DEVICE_TABLE() · c2365b93

由 Vaishali Thakkar 提交于 10月 01, 2015

The DEFINE_PCI_DEVICE_TABLE() macro is deprecated. Use
'struct pci_device_id' instead of DEFINE_PCI_DEVICE_TABLE(),
with the goal of getting rid of this macro completely.

This Coccinelle semantic patch performs this transformation:

	@@
	identifier a;
	declarer name DEFINE_PCI_DEVICE_TABLE;
	initializer i;
	@@
	- DEFINE_PCI_DEVICE_TABLE(a)
	+ const struct pci_device_id a[] = i;
Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20151001085201.GA16939@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>

c2365b93

Merge tag 'perf-core-for-mingo' of... · 4bc6a58f

由 Ingo Molnar 提交于 10月 01, 2015

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes:

User visible changes:

  - By default use the most precise "cycles" hw counter available, i.e.
    when the user doesn't specify any event, it will try using cycles:ppp,
    cycles:pp, etc. (Arnaldo Carvalho de Melo)

  - Remove blank lines, headers when piping output in 'perf list', so that it can
    be sanely used with 'wc -l', etc. (Arnaldo Carvalho de Melo)

  - Amend documentation about max_stack and synthesized callchains. (Adrian Hunter)

  - Fix 'perf probe -l' for probes added to kernel module functions. (Masami Hiramatsu)

Build fixes:

  - Fix shadowed declarations that break the build on older distros. (Jiri Olsa)

  - Fix build break on powerpc due to sample_reg_masks. (Sukadev Bhattiprolu)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

4bc6a58f

perf tools: By default use the most precise "cycles" hw counter available · 7f8d1ade

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

If the user doesn't specify any event, try the most precise "cycles"
available, i.e. start by "cycles:ppp" and go on removing "p" till it
works.

E.g.

  $ perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.017 MB perf.data (11 samples) ]
  $ perf evlist
  cycles:pp
  $ perf evlist -v
  cycles:pp: size: 112, { sample_period, sample_freq }: 4000, sample_type:
  IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
  enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1,
  exclude_guest: 1, mmap2: 1, comm_exec: 1
  $ grep 'model name' /proc/cpuinfo | head -1
  model name	: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
  $

When 'cycles' appears explicitely is specified this will not be tried,
i.e. the user has full control of the level of precision to be used:

  $ perf record -e cycles usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.016 MB perf.data (9 samples) ]
  $ perf evlist
  cycles
  $ perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
  IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
  enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2:
  1, comm_exec: 1
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-b1ywebmt22pi78vjxau01wth@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

7f8d1ade

perf list: Remove blank lines, headers when piping output · dfc431cb

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

So that one can, for instance, use it with wc -l:

  # perf list *:*write* | wc -l
  60

Or to look for the "bio" tracepoints, without 'perf list' headers:

  # perf list *:*bio* | head
    block:block_bio_backmerge                          [Tracepoint event]
    block:block_bio_bounce                             [Tracepoint event]
    block:block_bio_complete                           [Tracepoint event]
    block:block_bio_frontmerge                         [Tracepoint event]
    block:block_bio_queue                              [Tracepoint event]
    block:block_bio_remap                              [Tracepoint event]
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ts7sc0x8u4io4cifzkup4j44@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

dfc431cb

perf probe: Improve error message when %return is on inlined function · 6cca13bd

由 Masami Hiramatsu 提交于 10月 01, 2015

perf probe shows more precisely message when it finds given
%return target function is inlined.

Without this fix:
  ----
  # ./perf probe -V getname_flags%return
  Return probe must be on the head of a real function.
  Debuginfo analysis failed.
    Error: Failed to show vars.
  ----

With this fix:
  ----
  # ./perf probe -V getname_flags%return
  Failed to find "getname_flags%return",
   because getname_flags is an inlined function and has no return point.
  Debuginfo analysis failed.
    Error: Failed to show vars.
  ----
Suggested-by: NArnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164137.3733.55055.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

6cca13bd

perf probe: Fix a segfault bug in debuginfo_cache · 20f49859

由 Masami Hiramatsu 提交于 10月 01, 2015

perf probe --list will get a segfault if the first kprobe event is on a
module and the second or latter one is on the kernel.

e.g.
  ----
  # ./perf probe -q -m pcspkr pcspkr_event
  # ./perf probe -q vfs_read
  # ./perf probe -l
  Segmentation fault (core dumped)
  ----

This is because the debuginfo_cache fails to handle NULL module name,
which causes segfault on strcmp. (Note that strcmp("something", NULL)
always causes segfault)

To fix this debuginfo_cache__open always translates the NULL module name
to "kernel" (this is correct, because NULL module name means opening the
debuginfo for the kernel)

  ----
  # ./perf probe -l
    probe:pcspkr_event   (on pcspkr_event@drivers/input/misc/pcspkr.c
    in pcspkr)
    probe:vfs_read       (on vfs_read@ksrc/linux-3/fs/read_write.c)
  ----
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164135.3733.23993.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

20f49859

perf probe: Show correct source lines of probes on kmodules · 9b239a12

由 Masami Hiramatsu 提交于 10月 01, 2015

Perf probe always failed to find appropriate line numbers because of
failing to find .text start address offset from debuginfo.

e.g.
  ----
  # ./perf probe -m pcspkr pcspkr_event:5
  Added new events:
    probe:pcspkr_event   (on pcspkr_event:5 in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event:5 in pcspkr)

  You can now use it in all perf tools, such as:

          perf record -e probe:pcspkr_event_1 -aR sleep 1

  # ./perf probe -l
  Failed to find debug information for address ffffffffa031f006
  Failed to find debug information for address ffffffffa031f016
    probe:pcspkr_event   (on pcspkr_event+6 in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event+22 in pcspkr)
  ----

This fixes the above issue as below.
1. Get the relative address of the symbol in .text by using
   map->start.
2. Adjust the address by adding the offset of .text section
   in the kernel module binary.

With this fix, perf probe -l shows lines correctly.
  ----
  # ./perf probe -l
    probe:pcspkr_event   (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
  ----
Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164132.3733.24643.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9b239a12

perf probe: Begin and end libdwfl report session correctly · 9135949d

由 Masami Hiramatsu 提交于 10月 01, 2015

Fix a trival bug about libdwfl usage of the report session, it should
explicitly begin and end a report session around dwfl_report_offline().
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164128.3733.59876.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9135949d

perf probe: Fix to remove dot suffix from second or latter events · 663b1151

由 Masami Hiramatsu 提交于 10月 01, 2015

Fix to remove dot suffix (e.g. .const, .isra) from the second or latter
events which has suffix numbers.

Since the previous commit 35a23ff9 ("perf probe: Cut off the gcc
optimization postfixes from function name") didn't care about the suffix
numbered events, therefore we'll have an error when we add additional
events on the same dot suffix functions.

e.g.
  ----
  # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
   -a get_sigframe.isra.2.constprop.3
  Failed to write event: Invalid argument
    Error: Failed to add events.
  ----

This fixes above issue as below:
  ----
  # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
   -a get_sigframe.isra.2.constprop.3
  Added new events:
    probe:get_sigframe   (on get_sigframe.isra.2.constprop.3)
    probe:get_sigframe_1 (on get_sigframe.isra.2.constprop.3)

  You can now use it in all perf tools, such as:

          perf record -e probe:get_sigframe_1 -aR sleep 1

  ----
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164130.3733.26573.stgit@localhost.localdomainSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

663b1151

tools lib symbol: Introduce kallsyms2elf_type · f845086a

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

Map 't', 'T' (text, local, global), 'w' and 'W' (weak text, local,
global) as STT_FUNC, and the rest as STT_OBJECT

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-sbwcixulpc5v1xuxn3xvm0nn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f845086a

tools lib symbol: Rename kallsyms2elf_type to kallsyms2elf_binding · 8e947f1e

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

It is about binding, not type, we have just a letter in kallsyms that
should map both for the ELF type (STT_FUNC, etc) and to the ELF
symbol binding (STB_WEAK, STB_GLOBAL, etc), so rename it now before
introducing kallsyms2_elf_type()

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uu5vj343ms1q2wm55690on6v@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

8e947f1e

perf machine: Add method for common kernel_map(FUNCTION) operation · a5e813c6

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

And it is also a step in the direction of killing the separation of data
and text maps in map_groups.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-rrds86kb3wx5wk8v38v56gw8@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

a5e813c6

perf machine: Use machine__kernel_map() thoroughly · 77e65977

由 Arnaldo Carvalho de Melo 提交于 9月 30, 2015

In places where we were using its open coded equivalent.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-khkdugcdoqy3tkszm3jdxgbe@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

77e65977

perf tools: Fix build break on powerpc due to sample_reg_masks · eb56db54

由 Sukadev Bhattiprolu 提交于 9月 24, 2015

The perf_regs.c file does not get built on Powerpc as CONFIG_PERF_REGS
is false.  So the weak definition for 'sample_regs_masks' doesn't get
picked up.

Adding perf_regs.o to util/Build unconditionally, exposes a redefinition
error for 'perf_reg_value()' function (due to the static inline version
in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that
function.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20150930182836.GA27858@us.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

eb56db54

perf report: Amend documentation about max_stack and synthesized callchains · 40862a7b

由 Adrian Hunter 提交于 9月 29, 2015

The --max_stack option was added as an optimization to reduce processing time,
so people specifying --max-stack might get a increased processing time if
combined with synthesized callchains, but otherwise no real harm.

A warning about setting both --max_stack and the synthesized callchains max
depth seems like overkill. Amend the documentation.
Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/560A5155.4060105@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

40862a7b

perf maps: Introduce maps__find_symbol_by_name() · b7f9ff56

由 Arnaldo Carvalho de Melo 提交于 9月 29, 2015

Out of map_groups__find_symbol_by_name(), so that we can turn this later
one first into a call to maps__find_symbol_by_name(MAP__FUNCTION) +
MAP__VARIABLE, and then to just one call, we'll merge MAP__FUNCTION with
MAP__VARIABLE maps, to simplify the code.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-pvkar0jacqn92g148u9sqttt@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b7f9ff56

perf tools: Fix shadowed declaration in parse-events.c · 272ed29a

由 Jiri Olsa 提交于 9月 29, 2015

The error variable breaks build on CentOS 6.7, due to a collision with a
global error symbol:

    CC       util/parse-events.o
  cc1: warnings being treated as errors
  util/parse-events.c:419: error: declaration of ‘error’ shadows a global
  declaration
  util/util.h:135: error: shadowed declaration is here
  util/parse-events.c: In function ‘add_tracepoint_multi_event’:
  ...

Using different argument names instead to fix it.
Reported-by: NVinson Lee <vlee@twopensource.com>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-tip-commits@vger.kernel.org
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Raphael Beamonte <raphael.beamonte@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150929150531.GI27383@krava.redhat.com
[ Fix one more case, at line 770 ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

272ed29a

tools: Fix shadowed declaration in err.h · 45633a16

由 Jiri Olsa 提交于 9月 29, 2015

The error variable breaks build on CentOS 6.7, due to collision with
global error symbol:

    CC       util/evlist.o
  cc1: warnings being treated as errors
  In file included from util/evlist.c:28:
  tools/include/linux/err.h: In function ‘ERR_PTR’:
  tools/include/linux/err.h:34: error: declaration of ‘error’ shadows a global declaration
  util/util.h:135: error: shadowed declaration is here

Using 'error_' name instead to fix it.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-i9mdgdbrgauy3fe76s9rd125@git.kernel.orgReported-by: NVinson Lee <vlee@twopensource.com>
[ Use 'error_' instead of 'err' to, visually, not diverge too much from include/linux/err.h ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

45633a16

29 9月, 2015 20 次提交

Merge tag 'perf-core-for-mingo' of... · 9c17dbc6

由 Ingo Molnar 提交于 9月 29, 2015

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

  - Accept a zero --itrace period, meaning "as often as possible".  In the case
    of Intel PT that is the same as a period of 1 and a unit of 'instructions'
    (i.e.  --itrace=i1i). (Adrian Hunter)

  - Harmonize itrace's synthesized callchains with the existing --max-stack
    tool option. (Adrian Hunter)

  - Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter)

  - Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter)

  - Slighly improve Intel PT debug logging. (Adrian Hunter)

  - Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST.
    (Adrian Hunter)

  - Further document export-to-postgresql.py script. (Adrian Hunter)

  - Add option to synthesize branch stack from auxtrace data. (Adrian Hunter)

  - Use equivalent logic to avoid using dso->kernel. (Arnaldo Carvalho de Melo)

  - Show proper error messages when parsing bad terms for hw/sw events. (He Kuang)

  - Tracepoint event parsing improvements. (He Kuang)

  - Store tracing mountpoint for better error message. (Jiri Olsa)

  - Add fixdep to tools/build, bringing it closer to the kernel counterpart, from
    where it is being lifted. (Jiri Olsa)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

9c17dbc6

perf tools: Enable event_config terms to tracepoint events · e637d177

由 He Kuang 提交于 9月 28, 2015

This patch enables config terms for tracepoint perf events. Valid terms
for tracepoint events are 'call-graph' and 'stack-size', so we can use
different callgraph settings for each event and eliminate unnecessary
overhead.

Here is an example for using different call-graph config for each
tracepoint.

  $ perf record -e syscalls:sys_enter_write/call-graph=fp/
                -e syscalls:sys_exit_write/call-graph=no/
                dd if=/dev/zero of=test bs=4k count=10

  $ perf report --stdio

  #
  # Total Lost Samples: 0
  #
  # Samples: 13  of event 'syscalls:sys_enter_write'
  # Event count (approx.): 13
  #
  # Children      Self  Command  Shared Object       Symbol
  # ........  ........  .......  ..................  ......................
  #
      76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
                   |
                   ---__write_nocancel

      23.08%    23.08%  dd       libc-2.20.so        [.] write
                   |
                   ---write
                      |
                      |--33.33%-- 0x2031342820736574
                      |
                      |--33.33%-- 0xa6e69207364726f
                      |
                       --33.33%-- 0x34202c7320393039
  ...

  # Samples: 13  of event 'syscalls:sys_exit_write'
  # Event count (approx.): 13
  #
  # Children      Self  Command  Shared Object       Symbol
  # ........  ........  .......  ..................  ......................
  #
      76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
      23.08%    23.08%  dd       libc-2.20.so        [.] write
       7.69%     0.00%  dd       [unknown]           [.] 0x0a6e69207364726f
       7.69%     0.00%  dd       [unknown]           [.] 0x2031342820736574
       7.69%     0.00%  dd       [unknown]           [.] 0x34202c7320393039
Signed-off-by: NHe Kuang <hekuang@huawei.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-4-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e637d177

perf tools: Adds the tracepoint name parsing support · 865582c3

由 He Kuang 提交于 9月 28, 2015

Adds rules for parsing tracepoint names. Change rules of tracepoint which
derives from PE_NAMEs into tracepoint names directly, so adding more rules
based on tracepoint names will be easier.

Changes v2-v3:
   - Change __event_legacy_tracepoint label in bison file to tracepoint_name
   - Fix formats error.
Signed-off-by: NHe Kuang <hekuang@huawei.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-3-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

865582c3

perf tools: Show proper error message for wrong terms of hw/sw events · ffeb883e

由 He Kuang 提交于 9月 28, 2015

Show proper error message and show valid terms when wrong config terms
is specified for hw/sw type perf events.

This patch makes the original error format function formats_error_string()
more generic, which only outputs the static config terms for hw/sw perf
events, and prepends pmu formats for pmu events.

Before this patch:

  $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
  invalid or unsupported event: 'cpu-clock/freqx=200/'
  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list available events

After this patch:

  $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
  event syntax error: 'cpu-clock/freqx=200/'
                                 \___ unknown term

  valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size

  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list available events
Signed-off-by: NHe Kuang <hekuang@huawei.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-2-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ffeb883e

perf tools: Adds the config_term callback for different type events · 0b8891a8

由 He Kuang 提交于 9月 28, 2015

Currently, function config_term() is used for checking config terms of
all types of events, while unknown terms is not reported as an error
because pmu events have valid terms in sysfs.

But this is wrong when unknown terms are specificed to hw/sw events.
This patch Adds the config_term callback so we can use separate check
routines for each type of events.
Signed-off-by: NHe Kuang <hekuang@huawei.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-1-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

0b8891a8

perf intel-pt: Add mispred-all config option to aid use with autofdo · ba11ba65

由 Adrian Hunter 提交于 9月 25, 2015

autofdo incorrectly expects branch flags to include either mispred or
predicted.  In fact mispred = predicted = 0 is valid and means the flags
are not supported, which they aren't by Intel PT.

To make autofdo work, add a config option which will cause Intel PT
decoder to set the mispred flag on all branches.

Below is an example of using Intel PT with autofdo.  The example is
also added to the Intel PT documentation.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.

	$ gcc-5 -O3 sort.c -o sort_optimized
	$ ./sort_optimized 30000
	Bubble sorting array of 30000 elements
	2254 ms

	$ cat ~/.perfconfig
	[intel-pt]
		mispred-all

	$ perf record -e intel_pt//u ./sort 3000
	Bubble sorting array of 3000 elements
	58 ms
	[ perf record: Woken up 2 times to write data ]
	[ perf record: Captured and wrote 3.939 MB perf.data ]
	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ ./sort_autofdo 30000
	Bubble sorting array of 30000 elements
	2155 ms

Note there is currently no advantage to using Intel PT instead of LBR,
but that may change in the future if greater use is made of the data.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ba11ba65

perf inject: Add --strip option to strip out non-synthesized events · f56fb986

由 Adrian Hunter 提交于 9月 25, 2015

Add a new option --strip which is used with --itrace to strip out
non-synthesized events.  This results in a perf.data file that is
simpler for external tools to parse.  In particular, this can be used to
prepare a perf.data file for consumption by autofdo.

A subsequent patch makes a change to Intel PT also to enable use with
autofdo and gives an example of that use.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-25-git-send-email-adrian.hunter@intel.com
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f56fb986

perf inject: Remove more aux-related stuff when processing instruction traces · 73117308

由 Adrian Hunter 提交于 9月 25, 2015

perf inject can process instruction traces (using the --itrace option)
which removes aux-related events and replaces them with the requested
synthesized events.

However there are still some leftovers, namely PERF_RECORD_ITRACE_START
events and the original evsel (selected event) e.g. intel_pt//

For the sake of completeness, remove them too.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-24-git-send-email-adrian.hunter@intel.com
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

73117308

perf evlist: Add perf_evlist__remove() · 4768230a

由 Adrian Hunter 提交于 9月 25, 2015

Add a counterpart to perf_evlist__add() that does the opposite and
deletes the evsel.

This will be used by perf inject to remove unwanted evsels.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-23-git-send-email-adrian.hunter@intel.com
[ Renamed it from perf_evlist__del() to perf_evlist__remove() and removed the perf_evsel__delete() call ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

4768230a

perf evlist: Add perf_evlist__id2evsel_strict() · dddcf6ab

由 Adrian Hunter 提交于 9月 25, 2015

perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel()
except that it ensures that the id must match.

This will be used by perf inject to find a specific evsel that is to be
deleted, hence the need to match exactly.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-22-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

dddcf6ab

perf script: Make scripting_max_stack value allow for synthesized callchains · 3c5b645f

由 Adrian Hunter 提交于 9月 25, 2015

perf script has a setting to set the maximum stack depth when processing
callchains. The setting defaults to the hard-coded maximum definition
PERF_MAX_STACK_DEPTH which is 127.

It is possible, when processing instruction traces, to synthesize
callchains. Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024. The
default value is 16.

To allow for synthesized callchains, make the scripting_max_stack value
at least the same size as the synthesized callchain size.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-21-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3c5b645f

perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH · 44cbe729

由 Adrian Hunter 提交于 9月 25, 2015

Use the scripting_max_stack value to allow for values greater than
PERF_MAX_STACK_DEPTH.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-20-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

44cbe729

perf script: Add a setting for maximum stack depth · 03cd1fed

由 Adrian Hunter 提交于 9月 25, 2015

Add a setting for maximum stack depth in preparation for allowing for
synthesized callchains.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-19-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

03cd1fed

perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH · 96b40f3c

由 Adrian Hunter 提交于 9月 25, 2015

Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that
arbitrary-sized callchains can be supported.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-17-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

96b40f3c

perf report: Make max_stack value allow for synthesized callchains · 188bb5e2

由 Adrian Hunter 提交于 9月 25, 2015

perf report has an option (--max-stack) to set the maximum stack depth
when processing callchains. The option defaults to the hard-coded
maximum definition PERF_MAX_STACK_DEPTH which is 127. The intention of
the option is to allow the user to reduce the processing time by
reducing the amount of the callchain that is processed.

It is also possible, when processing instruction traces, to synthesize
callchains. Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024. The
default value is 16.

To allow for synthesized callchains, make the max_stack value at least
the same size as the synthesized callchain size.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-16-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

188bb5e2

perf intel-pt: Support generating branch stack · f14445ee

由 Adrian Hunter 提交于 9月 25, 2015

Add support for generating branch stack context for PT samples.  The
decoder reports a configurable number of branches as branch context for
each sample. Internally it keeps track of them by using a simple sliding
window.  We also flush the last branch buffer on each sample to avoid
overlapping intervals.

This is useful for:

- Reporting accurate basic block edge frequencies through the perf
  report branch view
- Using with --branch-history to get the wider context of samples
- Other users of LBRs

Also the Documentation is updated.

Examples:

	Record with Intel PT:

		perf record -e intel_pt//u ls

	Branch stacks are used by default if synthesized so:

		perf report --itrace=ile

	is the same as:

		perf report --itrace=ile -b

	Branch history can be requested also:

		perf report --itrace=igle --branch-history
Based-on-patch-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-15-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f14445ee

perf intel-pt: Move branch filter logic · 385e3306

由 Adrian Hunter 提交于 9月 25, 2015

intel_pt_synth_branch_sample() skips synthesizing if the branch does not
match the branch filter. That logic was sitting in the middle of the
function but is more efficiently placed at the start of the function, so
move it.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-14-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

385e3306

perf inject: Set branch stack feature flag when synthesizing branch stacks · 051a01b9

由 Adrian Hunter 提交于 9月 25, 2015

The branch stack feature flag is set by 'perf record' when recording
data that contains branch stacks. Consequently, when 'perf inject'
synthesizes branch stacks, the feature flag should be set also.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-13-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

051a01b9

perf report: Skip events with null branch stacks · f86225db

由 Adrian Hunter 提交于 9月 25, 2015

A non-synthesized event might not have a branch stack if branch stacks
have been synthesized (using itrace options).

An example of that is when Intel PT records sched_switch events for
decoding purposes. Those sched_switch events do not have branch stacks
even though the Intel PT decoder may be synthesizing other events that
do due to the itrace options.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-12-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f86225db

perf report: Also do default setup for synthesized branch stacks · fb9fab66

由 Adrian Hunter 提交于 9月 25, 2015

The 'perf report' tool will default to displaying branch stacks (-b
option) if they are present. Make that also happen for synthesized
branch stacks.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-11-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

fb9fab66

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功