提交 · e2e1680fda1573ebfdd6bba5d58f978044746993 · openeuler / raspberrypi-kernel

25 10月, 2016 2 次提交

perf bench futex: Avoid worker cacheline bouncing · e2e1680f

由 Davidlohr Bueso 提交于 10月 24, 2016

Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming a
non-trivial (i.e. 13%) amount of CPU.

This is due to cacheline bouncing due to the increment of w->ops.

We can easily fix this by just working on a local copy and updating the
actual worker once done running, and ready to show the program summary.
There is no danger of the worker being concurrent, so we can trust that
no stale value is being seen by another thread.

This also gets rid of the unnecessary cache alignment hack; its not
worth it.
Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1477342613-9938-2-git-send-email-dave@stgolabs.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e2e1680f

Merge tag 'perf-core-for-mingo-20161024' of... · 76e2d261

由 Ingo Molnar 提交于 10月 24, 2016

Merge tag 'perf-core-for-mingo-20161024' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Dynamicly change verbosity level by pressing 'V' in the 'perf top/report'
  hists TUI browser (Alexis Berlemont)

- Implement 'perf trace --delay' in the same fashion as in 'perf record --delay',
  to skip sampling workload initialization events (Alexis Berlemont)

- Make vendor named events case insensitive in 'perf list', i.e.
  'perf list LONGEST_LAT' works just the same as  'perf list longest_lat' (Andi Kleen)

- Show instruction bytes and lenght in 'perf script' for Intel PT and BTS (Andi Kleen, Adrian Hunter)

   E.g:

    % perf record -e intel_pt// foo
    % perf script --itrace=i0ns -F ip,insn,insnlen
     ffffffff8101232f ilen: 5 insn: 0f 1f 44 00 00
     ffffffff81012334 ilen: 1 insn: 5b
     ffffffff81012335 ilen: 1 insn: 5d
     ffffffff81012336 ilen: 1 insn: c3
     ffffffff810123e3 ilen: 1 insn: 5b
     ffffffff810123e4 ilen: 2 insn: 41 5c
     ffffffff810123e6 ilen: 1 insn: 5d
     ffffffff810123e7 ilen: 1 insn: c3
     ffffffff810124a6 ilen: 2 insn: 31 c0
     ffffffff810124a8 ilen: 9 insn: 41 83 bc 24 a8 01 00 00 01
     ffffffff810124b1 ilen: 2 insn: 75 87

- Allow enabling the perf_event_attr.branch_type attribute member: (Andi Kleen)

  perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...

- Add unwinding support for jitdump (Stefano Sanfilippo)

Fixes:

- Use raw_syscall:sys_enter timestamp in 'perf trace' (Arnaldo Carvalho de Melo)

Infrastructure:

- Allow jitdump to be built without libdwarf (Maciej Debski)

- Sync x86's syscall table tools/ copy (Arnaldo Carvalho de Melo)

- Fixes to avoid calling die() in library fuctions already propagating other
  errors (Arnaldo Carvalho de Melo)

- Improvements to allow libtraceevent to be properly installed in distro
  packages (Jiri Olsa)

- Removing coresight miscellaneous debug output (Mathieu Poirier)

- Cache align the 'perf bench futex' worker struct (Sebastian Andrzej Siewior)

Documentation:

- Minor improvements on the documentation of event parameters (Andi Kleen)

- Add jitdump format specification document (Stephane Eranian)

Spelling fixes:

- Fix typo "No enough" to "Not enough" (Alexander Alemayhu)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

76e2d261

24 10月, 2016 37 次提交

perf coresight: Removing miscellaneous debug output · 04b553ad

由 Mathieu Poirier 提交于 10月 19, 2016

Printing the full path of the selected link is obviously not needed,
hence removing.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1476913323-6836-1-git-send-email-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

04b553ad

perf list: Make vendor event matching case insensitive · 38d14f0c

由 Andi Kleen 提交于 10月 19, 2016

Make the 'perf list' glob matching for vendor events case insensitive.
This allows to use the upper case vendor events with perf list too.

Now the following works:

  % perf list LONGEST_LAT

  ...

  cache:
    longest_lat_cache.miss
         [Core-originated cacheable demand requests missed LLC]
    longest_lat_cache.reference
         [Core-originated cacheable demand requests that refer to LLC]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Suggested-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1476899402-31460-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

38d14f0c

perf trace: Use the syscall raw_syscalls:sys_enter timestamp · ecf1e225

由 Arnaldo Carvalho de Melo 提交于 10月 18, 2016

Instead of the one when another syscall takes place while another is being
processed (in another CPU, but we show it serialized, so need to "interrupt"
the other), and also when finally showing the sys_enter + sys_exit + duration,
where we were showing the sample->time for the sys_exit, duh.

Before:

  # perf trace sleep 1
  <SNIP>
     0.373 (   0.001 ms): close(fd: 3                   ) = 0
  1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0
  1000.653 (   0.003 ms): close(fd: 1                   ) = 0
  1000.657 (   0.002 ms): close(fd: 2                   ) = 0
  1000.667 (   0.000 ms): exit_group(                   )
  #

After:

  # perf trace sleep 1
  <SNIP>
     0.336 (   0.001 ms): close(fd: 3                   ) = 0
     0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0
  1000.481 (   0.002 ms): close(fd: 1                   ) = 0
  1000.485 (   0.001 ms): close(fd: 2                   ) = 0
  1000.494 (   0.000 ms): exit_group(                   )
[root@jouet linux]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ecbzgmu2ni6glc6zkw8p1zmx@git.kernel.org
Fixes: 752fde44 ("perf trace: Support interrupted syscalls")
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ecf1e225

perf trace: Remove thread_trace->exit_time · 1f369460

由 Arnaldo Carvalho de Melo 提交于 10月 17, 2016

Not used at all, we need just the entry_time to calculate the syscall
duration.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-js6r09zdwlzecvaei7t4l3vd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

1f369460

perf bench futex: Cache align the worker struct · 34b75300

由 Sebastian Andrzej Siewior 提交于 10月 16, 2016

It popped up in perf testing that the worker consumes some amount of
CPU. It boils down to the increment of `ops` which causes cache line
bouncing between the individual threads.

This patch aligns the struct by 256 bytes to ensure that not a cache
line is shared among CPUs. 128 byte is the x86 worst case and grep says
that L1_CACHE_SHIFT is set to 8 on s390.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

34b75300

perf tools: Use normal error reporting when processing PERF_RECORD_READ events · 89973506

由 Arnaldo Carvalho de Melo 提交于 10月 14, 2016

We already have handling for errors when processing PERF_RECORD_ events,
so instead of calling die() when not being able to alloc, propagate the
error, so that the normal UI exit sequence can take place, the user be
warned and possibly the terminal be properly reset to a sane mode.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-r90je3c009a125dvs3525yge@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

89973506

perf tools: Normalize sq_quote_argv() error reporting · e7b32d12

由 Arnaldo Carvalho de Melo 提交于 10月 14, 2016

It already returns whatever strbuf_(grow|addch)() returns in case of
failure, so just return -ENOSPC in the only case where it was die()ing.
When it returns, its only caller will call die() anyway, so no need to
be so eager, die later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-as05b7mbogprlwi8iarwns8e@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e7b32d12

perf bench mem: Move boilerplate memory allocation to the infrastructure · 47b5757b

由 Arnaldo Carvalho de Melo 提交于 10月 14, 2016

Instead of having all tests perform alloc/free, do it in the code that
calls the do_cycles() and do_gettimeofday() functions.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-lywj4mbdb1m9x1z9asivwuuy@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

47b5757b

perf trace: Implement --delay · e36b7821

由 Alexis Berlemont 提交于 10月 10, 2016

In the perf wiki todo-list[1], there is an entry regarding initial-delay
and 'perf trace'; the following small patch tries to fulfill this point.
It has been generated against the branch tip/perf/core.

It has only been implemented in the "trace__run" case.

Ex.:

  $ sudo strace -- ./perf trace --delay 5 sleep 1 2>&1
  ...
  fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
  ioctl(7, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
  ioctl(11, PERF_EVENT_IOC_SET_OUTPUT, 0x7) = 0
  fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
  ioctl(11, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
  write(6, "\0", 1)                       = 1
  close(6)                                = 0
  nanosleep({0, 5000000}, NULL)           = 0  # DELAY OF 5 MS BEFORE ENABLING THE EVENTS
  ioctl(3, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(4, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(5, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ioctl(7, PERF_EVENT_IOC_ENABLE, 0)      = 0
  ...

[1]: https://perf.wiki.kernel.org/index.php/TodoSigned-off-by: NAlexis Berlemont <alexis.berlemont@gmail.com>
Suggested-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161010054328.4028-2-alexis.berlemont@gmail.com
[ Add entry to the manpage, cut'n'pasted from stat's and record's ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

e36b7821

perf hists browser: Dynamically change verbosity level · 21e8c810

由 Alexis Berlemont 提交于 10月 12, 2016

Here is a small patch which tries to fulfill a point in the perf todo
list:

* Make pressing 'V' multiple times to go on cycling thru various
  verbosity levels in 'perf top', so that info that is present in
  'perf top -v' can be obtained without having to restart the tool
  (acme).

After a small grep in the code, the max verbosity level seems 3; so,
we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant.
Signed-off-by: NAlexis Berlemont <alexis.berlemont@gmail.com>
Suggested-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161012214823.14324-2-alexis.berlemont@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

21e8c810

perf tools: Fix typo "No enough" to "Not enough" · 042cfb5f

由 Alexander Alemayhu 提交于 10月 13, 2016

The latter version occurs much more when running git grep.
Signed-off-by: NAlexander Alemayhu <alexander@alemayhu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161013161811.4939-1-alexander@alemayhu.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

042cfb5f

perf pmu: Only print Using CPUID message once · fb967063

由 Andi Kleen 提交于 10月 13, 2016

With uncore event aliases which are duplicated over multiple PMUs the
"Using CPUID" message with -v could be printed many times.  Only print
it once.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1476393332-20732-3-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

fb967063

perf jit: Add jitdump format specification document · b3151ea5

由 Stephane Eranian 提交于 10月 13, 2016

This patch adds a formal specification of the jitdump format. The goal
is to help jit runtime developers implement the jitdump support without
having to read the jvmti code.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-10-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b3151ea5

perf jit: Check JITHEADER_VERSION · 6760d77b

由 Stefano Sanfilippo 提交于 10月 13, 2016

Check the version number when opening a jitdump file.  Accept older
versions, but not newer ones.
Signed-off-by: NStefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: NRoss McIlroy <rmcilroy@chromium.org>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-9-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

6760d77b

perf jit: Generate .eh_frame/.eh_frame_hdr in DSO · 086f9f3d

由 Stefano Sanfilippo 提交于 10月 13, 2016

When the jit_buf_desc contains unwinding information, it is emitted as
eh_frame unwinding sections in the DSOs generated by perf inject.

The unwinding information is required to unwind of JITed code which do
not maintain the frame pointer register during function calls.  It can
be emitted by V8 / Chromium when the --perf_prof_unwinding_info is
passed to V8.

The eh_frame and eh_frame_hdr sections are emitted immediately after the
.text.

The .eh_frame is aligned at a 8-byte boundary, and .eh_frame_hdr at a
4-byte one. Since size of the .eh_frame is required to be a multiple of
the word size, which means there will never be additional padding
between it and the .eh_frame_hdr on machines where the word size is 4 or
8 bytes.

However, additional padding might be inserted between .text and
.eh_frame to reach the correct alignment, which will always be 8 bytes,
also on 32bit machines. The reasoning behind this choice is that 4 extra
bytes of padding worst case are not a large cost for the advantage of
removing word-size dependent offset calculations when emitting the
jitdump.
Signed-off-by: NStefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: NRoss McIlroy <rmcilroy@chromium.org>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-8-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

086f9f3d

perf jit: Add unwinding support · 0284fecd

由 Stefano Sanfilippo 提交于 10月 13, 2016

This record is intended to provide unwinding information in the
eh_frame format. This is required to unwind JITed code which
does not maintain the frame pointer register during function calls.

The eh_frame unwinding information can be emitted by V8 / Chromium
when the --perf_prof_unwinding_info is passed.

A record of type jr_code_unwinding_info comes before the jr_code_load
it referred to and contains both the .eh_frame and .eh_frame_hdr.

The fields in the header have the following meaning:

  * unwinding_size: size of the eh_frame and eh_frame_hdr, necessary
    for distinguishing the content from the padding.

  * eh_frame_hdr_size: as the name says.

  * mapped_size: size of the payload that was in memory at runtime.
    typically unwinding_size if the .eh_frame_hdr and .eh_frame were
    mapped, or 0 if they weren't. It should always be the former case,
    since the .eh_frame is guaranteed to be mapped in memory. However,
    certain JITs might want to inject an .eh_frame_hdr with an empty LUT
    to trigger fp-based unwinding fallback in libunwind. The only part
    of the .eh_frame_hdr that libunwind reads from remote memory is the
    LUT, and since there is none, mapping the unwinding info in memory
    is not necessary, and 0 in this field signifies that it wasn't.
    This practical hack allows to save bytes in code memory for those
    JIT compilers that might or might not maintain a valid frame pointer.

The payload that follows is assumed to contain first the .eh_frame and
then the .eh_header_hdr, with no padding between the two.
Signed-off-by: NStefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: NRoss McIlroy <rmcilroy@chromium.org>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-7-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

0284fecd

perf jit: Do not assume pgoff is zero · eac05af2

由 Stefano Sanfilippo 提交于 10月 13, 2016

When calculating .eh_frame_hdr base and LUT offsets do not always assume
that pgoff is zero.

The assumption is false for DSOs built from the jitdump by perf inject,
because the ELF header did not exist in memory at sampling time.
Signed-off-by: NStefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: NRoss McIlroy <rmcilroy@chromium.org>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-6-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

eac05af2

perf jit: Make perf skip unknown records · 7354ec7a

由 Stefano Sanfilippo 提交于 10月 13, 2016

The behavior before this commit was to skip the remaining portion of the
jitdump in case an unknown record was found, including those records
that perf could handle.

With this change, parsing a record with an unknown id will cause a
warning to be emitted, the record will be skipped and parsing will
resume from the next (valid) one.

The patch aims at making perf more future proof, by extracting as much
information as possible from jitdumps.
Signed-off-by: NStefano Sanfilippo <ssanfilippo@chromium.org>
Signed-off-by: NRoss McIlroy <rmcilroy@chromium.org>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-5-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

7354ec7a

perf jit: Remove unecessary padding in jitdump file · 13b9012a

由 Stephane Eranian 提交于 10月 13, 2016

This patch removes all the string padding generated in the jitdump file.
They are not necessary and were adding unnecessary complexity. Modern
processors can handle unaligned accesses quite well. The perf.data/
jitdump file are always post-processed, no need to add extra complexity
for no real gain.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-4-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

13b9012a

perf jit: Enable jitdump support without dwarf · 621cb4e7

由 Maciej Debski 提交于 10月 13, 2016

This patch modifies the build dependencies on the jitdump support in
perf. As it stands jitdump was wrongfully made dependent 100% on using
DWARF. However, the dwarf dependency, only exist if generating the
source line table in genelf_debug.c. The rest of the support does not
need DWARF.

This patch removes the dependency on DWARF for the entire jitdump
support. It keeps it only for the genelf_debug.c support.
Signed-off-by: NMaciej Debski <maciejd@google.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-3-git-send-email-eranian@google.com
Fixes: e12b202f ("perf jitdump: Build only on supported archs")
[ Make it build only if NO_LIBELF isn't defined, as jitdump.o will only be built in that case ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

621cb4e7

perf jit: Improve error messages from JVMTI · cdd75e3b

由 Stephane Eranian 提交于 10月 13, 2016

This patch improves the usefulness of error messages generated by the
JVMTI interfac.e This can help identify the root cause of a problem by
printing the actual error code. The patch adds a new helper function
called print_error().
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1476356383-30100-2-git-send-email-eranian@google.com
[ Handle failure to convert numeric error to a string in print_error() ]
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

cdd75e3b

perf jit: Add NT_GNU_BUILD_ID definition for older distros · 5fef5f3f

由 Arnaldo Carvalho de Melo 提交于 10月 13, 2016

Such as CentOS5, where such define is not present in elf.h.

This file, genelf.c, wasn't being built for several systems, because
it mistakenly was conditional on some DWARF features, now that it
is just needing libelf, after "perf jit: Enable jitdump support without
dwarf" it fails.

So, as preparation for "perf jit: Enable jitdump support without dwarf",
conditionally define it, if not available.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Maciej Debski <maciejd@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-k09qay1cmr0l3fzprmztzy3o@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5fef5f3f

perf jit: Avoid returning garbage for a ret variable · ef2c3e76

由 Arnaldo Carvalho de Melo 提交于 10月 13, 2016

When the loop body isn't executed at all, then the 'ret' local variable,
that is uninitialized will be used as the return value.

This triggers this error on Alpine Linux:

    CC	   /tmp/build/perf/util/demangle-java.o
    CC	   /tmp/build/perf/util/demangle-rust.o
    CC       /tmp/build/perf/util/jitdump.o
    CC       /tmp/build/perf/util/genelf.o
  util/jitdump.c: In function 'jit_process':
  util/jitdump.c:622:3: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     fprintf(stderr, "injected: %s (%d)\n", path, ret);
     ^
  util/jitdump.c:584:6: note: 'ret' was declared here
    int ret;
        ^
    FLEX     /tmp/build/perf/util/parse-events-flex.c

  / $ gcc -v
  Using built-in specs.
  COLLECT_GCC=gcc
  COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-alpine-linux-musl/5.3.0/lto-wrapper
  Target: x86_64-alpine-linux-musl
  Configured with: /home/buildozer/aports/main/gcc/src/gcc-5.3.0/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
  +--build=x86_64-alpine-linux-musl --host=x86_64-alpine-linux-musl --target=x86_64-alpine-linux-musl --with-pkgversion='Alpine 5.3.0' --enable-checking=release
  +--disable-fixed-point --disable-libstdcxx-pch --disable-multilib --disable-nls --disable-werror --disable-symvers --enable-__cxa_atexit --enable-esp
  +--enable-cloog-backend --enable-languages=c,c++,objc,java,fortran,ada --disable-libssp --disable-libmudflap --disable-libsanitizer --enable-shared
  +--enable-threads --enable-tls --with-system-zlib
  Thread model: posix
  gcc version 5.3.0 (Alpine 5.3.0)

But this so far got under the radar, not causing any build problem, till the
"perf jit: enable jitdump support without dwarf" gets applied, when the above
problem takes place, some combination of inlining or whatever, the problem
is real, so fix it by initializing the variable to zero.

Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maciej Debski <maciejd@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/20161013200437.GA12815@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ef2c3e76

perf tools: Implement branch_type event parameter · ac12f676

由 Andi Kleen 提交于 10月 12, 2016

It can be useful to specify branch type state per event, for example if
we want to collect both software trace points and last branch PMU events
in a single collection. Currently this doesn't work because the software
trace point errors out with -b.

There was already a branch-type parameter to configure branch sample
types per event in the parser, but it was stubbed out. This patch
implements the necessary plumbing to actually enable it.

Now:

  $ perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...

works.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1476306127-19721-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ac12f676

perf record: Improve documentation of event parameters · 84ee74af

由 Andi Kleen 提交于 10月 13, 2016

- Some editing (params -> parameters)
- Point to the now more complete list of parameters in the perf list
manpage.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1476381433-22959-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

84ee74af

perf header: Display feature name on write failure · 0c2aff4c

由 Jiri Olsa 提交于 10月 10, 2016

Display name of feature instead of just the number
during recording data.

Before:
  failed to write feature 13

Now:
  failed to write feature HEADER_CPU_TOPOLOGY
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-k9d9trozi5kkx737cy8n5xh5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

0c2aff4c

perf header: Display missing features · aabae165

由 Jiri Olsa 提交于 10月 10, 2016

Display missing features in header info, like:

  $ perf report --header-only
  # ========
  # captured on: Mon Oct 10 09:39:47 2016
  ...
  # missing features: HEADER_TRACING_DATA HEADER_CPU_TOPOLOGY ...

To help in diagnosing problems.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-bh5gp84gobdmyl345dcp64se@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

aabae165

perf report: Move captured info to generic header info · f45f5615

由 Jiri Olsa 提交于 10月 10, 2016

It's not displayed in TUI now, putting it into generic part.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-5fk88kejqgi50ye7xdkhiloz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f45f5615

tools lib: Add for_each_clear_bit macro · 02bc11de

由 Jiri Olsa 提交于 10月 10, 2016

Adding for_each_clear_bit macro plus all its the necessary backbone
functions. Taken from related kernel code. It will be used in following
patch.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-cayv2zbqi0nlmg5sjjxs1775@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

02bc11de

tools lib traceevent: Add version for traceevent shared object · fe316723

由 Jiri Olsa 提交于 7月 31, 2016

Adding version support for libtraceevent.so object.

Using the existing EVENT_PARSE_VERSION variable to construct
the .so object version string, which now consists of:

  $(EP_VERSION).$(EP_PATCHLEVEL).$(EP_EXTRAVERSION)

Looks like it was created for this purpose anyway.

The build will now produce following traeceevent libraries:

  $ ll libtraceevent*
  libtraceevent.a
  libtraceevent.so -> libtraceevent.so.1.1.0
  libtraceevent.so.1 -> libtraceevent.so.1.1.0
  libtraceevent.so.1.1.0

Also the install target will carry them:

  $ make DESTDIR=/tmp/krava prefix=/usr install
  INSTALL  trace_plugins
  INSTALL  libtraceevent.a
  INSTALL  libtraceevent.so.1.1.0

  $ find /tmp/krava/ | xargs ls -l
  ...
  /tmp/krava/usr/lib64:
  total 572
  libtraceevent.a
  libtraceevent.so -> libtraceevent.so.1.1.0
  libtraceevent.so.1 -> libtraceevent.so.1.1.0
  libtraceevent.so.1.1.0
  ...
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-v64z62fh0dwt0ueie5usrnac@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

fe316723

tools lib traceevent: Rename LIB_FILE to LIB_TARGET · 39944a76

由 Jiri Olsa 提交于 8月 01, 2016

To ease up following patch.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-zpv5gd8y7clwrhh6dq03ucd5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

39944a76

tools lib traceevent: Add do_install_mkdir Makefile function · c121bdbb

由 Jiri Olsa 提交于 7月 31, 2016

Decompose the do_install function to ease up
the following patch a little.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-zzs19yx8seyors532vuer37w@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

c121bdbb

tools lib traceevent: Add install_headers target · 722a4984

由 Jiri Olsa 提交于 7月 16, 2016

Adding install_headers target to install all headers
under 'include/traceevent' path, like:

  $ make DESTDIR=/tmp/krava prefix=/usr install_headers
  $ find /tmp/krava/ -type f
  /tmp/krava/usr/include/traceevent/kbuffer.h
  /tmp/krava/usr/include/traceevent/event-utils.h
  /tmp/krava/usr/include/traceevent/event-parse.h
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-if70lj3zhdc3csdqm5webjvc@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

722a4984

perf tools: Sync copy of x86's syscall table · b4974b40

由 Arnaldo Carvalho de Melo 提交于 10月 07, 2016

To get up to the recent compat pread/pwrite changes, that albeit not
being used by 'perf trace' due to some raw_syscalls tracepoint
limitations, trigger this warning when building perf:

  Warning: x86_64's syscall_64.tbl differs from kernel

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ilgqhxd9ubkg5f66bx0bht2t@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b4974b40

perf script: Support insn and insnlen · 224e2c97

由 Andi Kleen 提交于 10月 07, 2016

When looking at Intel PT traces with perf script it is useful to have
some indication of the instruction. Dump the instruction bytes and
instruction length, which can be used for simple pattern analysis in
scripts.

% perf record -e intel_pt// foo
% perf script --itrace=i0ns -F ip,insn,insnlen
 ffffffff8101232f ilen: 5 insn: 0f 1f 44 00 00
 ffffffff81012334 ilen: 1 insn: 5b
 ffffffff81012335 ilen: 1 insn: 5d
 ffffffff81012336 ilen: 1 insn: c3
 ffffffff810123e3 ilen: 1 insn: 5b
 ffffffff810123e4 ilen: 2 insn: 41 5c
 ffffffff810123e6 ilen: 1 insn: 5d
 ffffffff810123e7 ilen: 1 insn: c3
 ffffffff810124a6 ilen: 2 insn: 31 c0
 ffffffff810124a8 ilen: 9 insn: 41 83 bc 24 a8 01 00 00 01
 ffffffff810124b1 ilen: 2 insn: 75 87
...
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1475847747-30994-4-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

224e2c97

perf intel-pt/bts: Report instruction bytes and length in sample · faaa8768

由 Andi Kleen 提交于 10月 07, 2016

Change Intel PT and BTS to pass up the length and the instruction
bytes of the decoded or sampled instruction in the perf sample.

The decoder already knows this information, we just need to pass it
up. Since it is only a couple of movs it is not very expensive.

Handle instruction cache too. Make sure ilen is always initialized.

Used in the next patch.

[Adrian: re-base on top (and adjust for) instruction buffer size tidy-up]
[Adrian: add BTS support and adjust commit message accordingly]
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1475847747-30994-3-git-send-email-adrian.hunter@intel.comSigned-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

faaa8768

perf intel-pt/bts: Tidy instruction buffer size usage · 32f98aab

由 Adrian Hunter 提交于 10月 07, 2016

Tidy instruction buffer size usage in preparation for copying the
instruction bytes onto samples.

The instruction buffer is presently used for debugging, so rename its
size macro from INTEL_PT_INSN_DBG_BUF_SZ to INTEL_PT_INSN_BUF_SZ, and
use it everywhere.

Note that the maximum instruction size is 15 which is a less efficient size
to copy than 16, which is why a separate buffer size is used.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1475847747-30994-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

32f98aab

22 10月, 2016 1 次提交

Merge tag 'perf-c2c-for-mingo-20161021' of... · e9c84892

由 Ingo Molnar 提交于 10月 22, 2016

Merge tag 'perf-c2c-for-mingo-20161021' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull new 'perf c2c' tool from Arnaldo Carvalho de Melo:

- The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.

It allows you to track down cacheline contention. The tool is based
on x86's load latency and precise store facility events provided by
Intel CPUs.

It was tested by Joe Mario and has proven to be useful, finding some
cacheline contentions. Joe also wrote a blog about c2c tool with
examples:

https://joemario.github.io/blog/2016/09/01/c2c-blog/

Excerpt of the content on this site:

---
At a high level, “perf c2c” will show you:

* The cachelines where false sharing was detected.
* The readers and writers to those cachelines, and the offsets where those accesses occurred.
* The pid, tid, instruction addr, function name, binary object name for those readers and writers.
* The source file and line number for each reader and writer.
* The average load latency for the loads to those cachelines.
* Which numa nodes the samples a cacheline came from and which CPUs were involved.

Using perf c2c is similar to using the Linux perf tool today.
First collect data with “perf c2c record” Then generate a report output with “perf c2c report”
---

There one finds extensive details on using the tool, with tips on
reducing the volume of samples while still capturing enough to do
its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e9c84892