1. 09 8月, 2018 6 次提交
    • J
      perf annotate: Get rid of annotation__scnprintf_samples_period() · 0683d13c
      Jiri Olsa 提交于
      We have more current function tto get the title for annotation,
      which is hists__scnprintf_title. They both have same output as
      far as the annotation's header line goes.
      
      They differ in counting of the nr_samples, hists__scnprintf_title
      provides more accurate number based on the setup of the
      symbol_conf.filter_relative variable.
      
      Plus it also displays any uid/thread/dso/socket filters/zooms
      if there are set any, which annotation__scnprintf_samples_period
      does not.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20180804130521.11408-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0683d13c
    • J
      perf annotate: Make annotation_line__max_percent static · 5ecf7d30
      Jiri Olsa 提交于
      There's no outside user of it.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20180804130521.11408-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5ecf7d30
    • J
      perf annotate: Make symbol__annotate_fprintf2() local · 7a3e71e0
      Jiri Olsa 提交于
      There's no outside user of it.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lkml.kernel.org/r/20180804130521.11408-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a3e71e0
    • Y
      perf tools: Drop unneeded bitmap_zero() calls · 3c8b8186
      Yury Norov 提交于
      bitmap_zero() is called after bitmap_alloc() in perf code. But
      bitmap_alloc() internally uses calloc() which guarantees that allocated
      area is zeroed. So following bitmap_zero is unneeded. Drop it.
      
      This happened because of confusing name for bitmap allocator. It
      should has name bitmap_zalloc instead of bitmap_alloc.
      
      This series:
      
        https://lkml.org/lkml/2018/6/18/841
      
      introduces a new API for bitmap allocations in kernel, and functions
      there are named correctly. Following patch propogates the API to tools,
      and fixes naming issue.
      Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Link: http://lkml.kernel.org/r/20180623073502.16321-1-ynorov@caviumnetworks.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3c8b8186
    • T
      perf report: Add GUI report support for s390 auxiliary trace · 33d9e183
      Thomas Richter 提交于
      Add support for s390 auxiliary trace support.
      
      Use 'perf record -e rbd000 -- ls' to create the perf.data file.
      
      Use 'perf report' to display the auxiliary trace data.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf report --stdio
        0x128 [0x10]: failed to process type: 70
        Error:
        failed to process sample
        [root@s35lp76 perf]#
      
      Output after:
      
        [root@s35lp76 perf]# ./perf report --stdio
      
            18.21%    18.21%  ls     [kernel.kallsyms]       [k] ftrace_likely_update
             9.52%     9.52%  ls     [kernel.kallsyms]       [k] lock_acquire
             9.38%     9.38%  ls     [kernel.kallsyms]       [k] lock_release
             3.45%     3.45%  ls     [kernel.kallsyms]       [k] lock_acquired
             2.88%     2.88%  ls     [kernel.kallsyms]       [k] link_path_walk
             2.63%     2.63%  ls     [kernel.kallsyms]       [k] __d_lookup
             2.38%     2.38%  ls     [kernel.kallsyms]       [k] __d_lookup_rcu
             2.04%     2.04%  ls     [kernel.kallsyms]       [k] ___might_sleep
             1.83%     1.83%  ls     [kernel.kallsyms]       [k] debug_lockdep_rcu_enabled
             1.44%     1.44%  ls     [kernel.kallsyms]       [k] dput
           ....
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180802074622.13641-4-tmricht@linux.ibm.com
      [ Use PRI[xd]64 to fix the build on debian:experimental-x-mips (gcc 8.1.0) and others ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      33d9e183
    • T
      perf report: Add raw report support for s390 auxiliary trace · 2b1444f2
      Thomas Richter 提交于
      Add support for s390 auxiliary trace support.
      
      Use 'perf record -e rbd000' to create the perf.data file.  The event
      also has the symbolic name SF_CYCLES_BASIC_DIAG, using 'perf record -e
      SF_CYCLES_BASIC_DIAG' is equivalent.
      
      Use 'perf report -D' to display the auxiliary trace data.
      
      Output before:
      
       0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
                       offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
           Nothing else
      
      Output after:
      
       0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
                        offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
       .
       . ... s390 AUX data: size 262144 bytes
          [00000000] Basic   Def:0001 Inst:0000 TW   AS:3 ASN:0xffff IA:0x0000000000c2f1bc
      		CL:1 HPP:0x8000000000000000 GPP:000000000000000000
          [0x000020] Diag    Def:8005
          [0x0000bf] Basic   Def:0001 Inst:0000 TW   AS:3 ASN:0xffff IA:0x0000000000c2f1bc
      		CL:1 HPP:0x8000000000000000 GPP:000000000000000000
          [0x0000df] Diag    Def:8005
          [0x00017e] Basic   Def:0001 Inst:0000 TW   AS:3 ASN:0xffff IA:0x0000000000c2f1bc
      		CL:1 HPP:0x8000000000000000 GPP:000000000000000000
          ....
          [0x000fc0] Trailer F T bsdes:32 dsdes:159 Overflow:0 Time:0xd4ab59a8450fa108
      		C:1 TOD:0xd4ab4ec98ceb3832 1:0x8000000000000000 2:0xd4ab4ec98ceb3832
      
      This output is shown for every sampled data block. The
      output contains the
      
       - basic-sampling data entry
      
       - diagnostic-sampling data entry
      
       - trailer entry
      
      The basic sampling entry and diagnostic sampling entry sizes can be
      extracted using the trailer entries in the SDB.  On older hardware these
      values (bsdes and dsdes in the trailer entry) are reserved and zero.
      Older hardware use hard coded values based on the s390 machine type.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Link: http://lkml.kernel.org/r/20180802074622.13641-3-tmricht@linux.ibm.com
      Link: http://lkml.kernel.org/r/eda2632e-7919-5ffd-5f68-821e77d216fa@linux.ibm.com
      [ Merged a fix for a 'tipe puned' problem reported by Michael Ellerman see last Link tag. ]
      [ Removed __packed from two structs, they're already naturally packed and having that. ]
      [ attribute breaks the build in gcc 8.1.1 mips, 4.4.7 x86_64, 7.1.1 ARCompact ISA, etc) ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b1444f2
  2. 03 8月, 2018 1 次提交
    • T
      perf auxtrace: Support for perf report -D for s390 · b96e6615
      Thomas Richter 提交于
      Add initial support for s390 auxiliary traces using the CPU-Measurement
      Sampling Facility.
      
      Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data
      file. Later patches will show the contents of the auxiliary traces.
      
      Setup the auxtrace queues and data structures for s390.  A raw dump of
      the perf.data file now does not show an error when an auxtrace event is
      encountered.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace
        0x128 [0x10]: failed to process type: 70
        Error:
        failed to process sample
      
        0x128 [0x10]: event: 70
        .
        . ... raw event: size 16 bytes
        .  0000:  00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00  ...F............
      
        0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0
        [root@s35lp76 perf]#
      
      Output after:
      
         # ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE
        0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5
        0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000
      	   offset: 0  ref: 0  idx: 4  tid: -1  cpu: 4
        ....
      
      Additional notes about the underlying hardware and software
      implementation, provided by Hendrik Brueckner (see Link: below).
      
      =============================================================================
      
      The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain
      performance information on the mainframe.  Basically, it was introduced
      with System z10 years ago for the z/Architecture, that means, 64-bit.
      For Linux, there are two facilities of interest, counter facility and sampling
      facility.  The counter facility provides hardware counters for instructions,
      cycles, crypto-activities, and many more.
      
      The sampling facility is a hardware sampler that when started will write
      samples at a particular interval into a sampling buffer.  At some point,
      for example, if a sample block is full, it generates an interrupt to collect
      samples (while the sampler continues to run).
      
      Few years ago, I started to provide the a perf PMU to use the counter
      and sampling facilities.  Recently, the device driver was updated to also
      "export" the sampling buffer into the AUX area.  Thomas now completed the
      related perf work to interpret and process these AUX data.
      
      If people are more interested in the sampling facility, they can have a
      look into:
      
      - The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05
        http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
      
      and to learn how-to use it for Linux on Z, have look at chapter 54,
      "Using the CPU-measurement facilities" in the:
      
      - Device Drivers, Features, and Commands, SC33-8411-34
        http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf
      
      =============================================================================
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b96e6615
  3. 31 7月, 2018 8 次提交
  4. 25 7月, 2018 14 次提交
    • J
      perf stat: Get rid of extra clock display function · 0aa802a7
      Jiri Olsa 提交于
      There's no reason to have separate function to display clock events.
      It's only purpose was to convert the nanosecond value into microseconds.
      We do that now in generic code, if the unit and scale values are
      properly set, which this patch do for clock events.
      
      The output differs in the unit field being displayed in its columns
      rather than having it added as a suffix of the event name. Plus the
      value is rounded into 2 decimal numbers as for any other event.
      
      Before:
      
        # perf stat  -e cpu-clock,task-clock -C 0 sleep 3
      
         Performance counter stats for 'CPU(s) 0':
      
             3001.123137      cpu-clock (msec)          #    1.000 CPUs utilized
             3001.133250      task-clock (msec)         #    1.000 CPUs utilized
      
             3.001159813 seconds time elapsed
      
      Now:
      
        # perf stat  -e cpu-clock,task-clock -C 0 sleep 3
      
         Performance counter stats for 'CPU(s) 0':
      
                3,001.05 msec cpu-clock                 #    1.000 CPUs utilized
                3,001.05 msec task-clock                #    1.000 CPUs utilized
      
             3.001077794 seconds time elapsed
      
      There's a small difference in csv output, as we now output the unit
      field, which was empty before. It's in the proper spot, so there's no
      compatibility issue.
      
      Before:
      
        # perf stat  -e cpu-clock,task-clock -C 0 -x, sleep 3
        3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized
        3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized
      
        # perf stat  -e cpu-clock,task-clock -C 0 -x, sleep 3
        3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized
        3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized
      
      Add perf_evsel__is_clock to replace nsec_counter.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0aa802a7
    • J
      perf tools: Use perf_evsel__match instead of open coded equivalent · 2d6cae13
      Jiri Olsa 提交于
      Use perf_evsel__match() helper in perf_evsel__is_bpf_output().
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180720110036.32251-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d6cae13
    • J
      perf tools: Fix struct comm_str removal crash · 46b3722c
      Jiri Olsa 提交于
      We occasionaly hit following assert failure in 'perf top', when processing the
      /proc info in multiple threads.
      
        perf: ...include/linux/refcount.h:109: refcount_inc:
              Assertion `!(!refcount_inc_not_zero(r))' failed.
      
      The gdb backtrace looks like this:
      
        [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
        0x00007ffff50839fb in raise () from /lib64/libc.so.6
        (gdb)
        #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
        #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
        #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
        #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
        #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
            at ...include/linux/refcount.h:109
        #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
            at util/comm.c:24
        #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
            root=0xbed5c0 <comm_str_root>) at util/comm.c:72
        #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
            root=0xbed5c0 <comm_str_root>) at util/comm.c:95
        #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
            timestamp=0, exec=false) at util/comm.c:111
        #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
        #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
            threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
        #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
        ...
      
      The failing assertion is this one:
      
        REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
      
      The problem is that we keep global comm_str_root list, which
      is accessed by multiple threads during the 'perf top' startup
      and following 2 paths can race:
      
        thread 1:
          ...
          thread__new
            comm__new
              comm_str__findnew
                down_write(&comm_str_lock);
                __comm_str__findnew
                  comm_str__get
      
        thread 2:
          ...
          comm__override or comm__free
            comm_str__put
              refcount_dec_and_test
                down_write(&comm_str_lock);
                rb_erase(&cs->rb_node, &comm_str_root);
      
      Because thread 2 first decrements the refcnt and only after then it removes the
      struct comm_str from the list, the thread 1 can find this object on the list
      with refcnt equls to 0 and hit the assert.
      
      This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
      that already dropped the refcnt to 0. For the rest of the objects we take the
      refcnt before comparing its name and release it afterwards with comm_str__put,
      which can also release the object completely.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20180720101740.GA27176@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      46b3722c
    • J
      perf machine: Use last_match threads cache only in single thread mode · b57334b9
      Jiri Olsa 提交于
      There's an issue with using threads::last_match in multithread mode
      which is enabled during the perf top synthesize. It might crash with
      following assertion:
      
        perf: ...include/linux/refcount.h:109: refcount_inc:
              Assertion `!(!refcount_inc_not_zero(r))' failed.
      
      The gdb backtrace looks like this:
      
        0x00007ffff50839fb in raise () from /lib64/libc.so.6
        (gdb)
        #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
        #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
        #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
        #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
        #4  0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
            at ...include/linux/refcount.h:109
        #5  0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
            at util/thread.c:115
        #6  0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
            threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
        #7  0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
            pid=2, tid=2) at util/machine.c:489
        #8  0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
            pid=2, tid=2) at util/machine.c:499
        #9  0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
        ...
      
      The failing assertion is this one:
      
        REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
      
      the problem is that we don't serialize access to threads::last_match.
      We serialize the access to the threads tree, but we don't care how's
      threads::last_match being accessed. Both locked/unlocked paths use
      that data and can set it. In multithreaded mode we can end up with
      invalid object in thread__get call, like in following paths race:
      
        thread 1
          ...
          machine__findnew_thread
            down_write(&threads->lock);
            __machine__findnew_thread
              ____machine__findnew_thread
                th = threads->last_match;
                if (th->tid == tid) {
                  thread__get
      
        thread 2
          ...
          machine__find_thread
            down_read(&threads->lock);
            __machine__findnew_thread
              ____machine__findnew_thread
                th = threads->last_match;
                if (th->tid == tid) {
                  thread__get
      
        thread 3
          ...
          machine__process_fork_event
            machine__remove_thread
              __machine__remove_thread
                threads->last_match = NULL
                thread__put
            thread__put
      
      Thread 1 and 2 might got stale last_match, before thread 3 clears
      it. Thread 1 and 2 then race with thread 3's thread__put and they
      might trigger the refcnt == 0 assertion above.
      
      The patch is disabling the last_match cache for multiple thread
      mode. It was originally meant for single thread scenarios, where
      it's common to have multiple sequential searches of the same
      thread.
      
      In multithread mode this does not make sense, because top's threads
      processes different /proc entries and so the 'struct threads' object
      is queried for various threads. Moreover we'd need to add more locks
      to make it work.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b57334b9
    • J
      perf machine: Add threads__set_last_match function · 67fda0f3
      Jiri Olsa 提交于
      Separating threads::last_match cache set into separate
      threads__set_last_match function.  This will be useful in following
      patch.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      67fda0f3
    • J
      perf machine: Add threads__get_last_match function · f8b2ebb5
      Jiri Olsa 提交于
      Separating threads::last_match cache read/check into separate
      threads__get_last_match function. This will be useful in following
      patch.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180719143345.12963-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f8b2ebb5
    • J
      perf tools: Synthesize GROUP_DESC feature in pipe mode · e8fedff1
      Jiri Olsa 提交于
      Stephan reported, that pipe mode does not carry the group information
      and thus the piped report won't display the grouped output for following
      command:
      
        # perf record -e '{cycles,instructions,branches}' -a sleep 4 | perf report
      
      It has no idea about the group setup, so it will display events
      separately:
      
        # Overhead  Command          Shared Object             ...
        # ........  ...............  .......................
        #
             6.71%  swapper          [kernel.kallsyms]
             2.28%  offlineimap      libpython2.7.so.1.0
             0.78%  perf             [kernel.kallsyms]
        ...
      
      Fix GROUP_DESC feature record to be synthesized in pipe mode, so the
      report output is grouped if there are groups defined in record:
      
        #                 Overhead  Command          Shared    ...
        # ........................  ...............  .......
        #
             7.57%   0.16%   0.30%  swapper          [kernel
             1.87%   3.15%   2.46%  offlineimap      libpyth
             1.33%   0.00%   0.00%  perf             [kernel
        ...
      Reported-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180712135202.14774-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e8fedff1
    • S
      perf script: Show correct offsets for DWARF-based unwinding · 2a9d5050
      Sandipan Das 提交于
      When perf/data is recorded with the dwarf call-graph option, the
      callchain shown by 'perf script' still shows the binary offsets of the
      userspace symbols instead of their virtual addresses. Since the symbol
      offset calculation is based on using virtual address as the ip, we see
      incorrect offsets as well.
      
      The use of virtual addresses affects the ability to find out the
      line number in the corresponding source file to which an address
      maps to as described in commit 67540759 ("perf unwind: Use
      addr_location::addr instead of ip for entries").
      
      This has also been addressed by temporarily converting the virtual
      address to the correponding binary offset so that it can be mapped
      to the source line number correctly.
      
      This is a follow-up for commit 19610184 ("perf script: Show
      virtual addresses instead of offsets").
      
      This can be verified on a powerpc64le system running Fedora 27 as
      shown below:
      
        # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
        # perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1
      
      Before:
      
        # perf report --stdio --no-children -s sym,srcline -g address
      
        # Samples: 1  of event 'probe_libc:inet_pton'
        # Event count (approx.): 1
        #
        # Overhead  Symbol                Source:Line
        # ........  ....................  ...........
        #
           100.00%  [.] __GI___inet_pton  inet_pton.c
                    |
                    ---gaih_inet getaddrinfo.c:537 (inlined)
                       __GI_getaddrinfo getaddrinfo.c:2304 (inlined)
                       main ping.c:519
                       generic_start_main libc-start.c:308 (inlined)
                       __libc_start_main libc-start.c:102
        ...
      
        # perf script -F comm,ip,sym,symoff,srcline,dso
      
        ping
                          15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
          libc-2.26.so[ffff80004ca0af28]
                          10fa53 gaih_inet+0xffff000099160f43
          libc-2.26.so[ffff80004c9bfa53] (inlined)
                          1105b3 __GI_getaddrinfo+0xffff000099160163
          libc-2.26.so[ffff80004c9c05b3] (inlined)
                            2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
          ping[fffffffecf882d6f]
                           2369f generic_start_main+0xffff00009916013f
          libc-2.26.so[ffff80004c8d369f] (inlined)
                           23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
          libc-2.26.so[ffff80004c8d3897]
      
      After:
      
        # perf report --stdio --no-children -s sym,srcline -g address
      
        # Samples: 1  of event 'probe_libc:inet_pton'
        # Event count (approx.): 1
        #
        # Overhead  Symbol                Source:Line
        # ........  ....................  ...........
        #
           100.00%  [.] __GI___inet_pton  inet_pton.c
                    |
                    ---gaih_inet.constprop.7 getaddrinfo.c:537
                       getaddrinfo getaddrinfo.c:2304
                       main ping.c:519
                       generic_start_main.isra.0 libc-start.c:308
                       __libc_start_main libc-start.c:102
        ...
      
        # perf script -F comm,ip,sym,symoff,srcline,dso
      
        ping
                    7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
          inet_pton.c:68
                    7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
          getaddrinfo.c:537
                    7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
          getaddrinfo.c:2304
                       130782d6f main+0x3df (/usr/bin/ping)
          ping.c:519
                    7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
          libc-start.c:308
                    7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
          libc-start.c:102
      Signed-off-by: NSandipan Das <sandipan@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Fixes: 67540759 ("perf unwind: Use addr_location::addr instead of ip for entries")
      Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a9d5050
    • K
      perf trace arm64: Use generated syscall table · a7f660d6
      Kim Phillips 提交于
      This should speed up accessing new system calls introduced with the
      kernel rather than waiting for libaudit updates to include them.
      
      It also enables users to specify wildcards, for example, perf trace -e
      'open*', just like was already possible on x86, s390, and powerpc, which
      means arm64 can now pass the "Check open filename arg using perf trace +
      vfs_getname" test.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20180706163454.f714b9ab49ecc8566a0b3565@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a7f660d6
    • T
      perf stat: Add transaction flag (-T) support for s390 · 742d92ff
      Thomas Richter 提交于
      The 'perf stat' command line flag -T to display transaction counters is
      currently supported for x86 only.
      
      Add support for s390. It is based on the metrics flag -M transaction
      using the architecture dependent JSON files. This requires a metric
      named "transaction" in the JSON files for the platform.
      
      Introduce a new function metricgroup__has_metric() to check for the
      existence of a metric_name transaction.
      
      As suggested by Andi Kleen, this is the new approach to support
      transactions counters. Other architectures will follow.
      
      Output before:
      
        [root@p23lp27 perf]# ./perf stat -T -- sleep 1
        Cannot set up transaction events
        [root@p23lp27 perf]#
      
      Output after:
      
        [root@s35lp76 perf]# ./perf stat -T -- ~/mytesttx 1 >/tmp/111
      
         Performance counter stats for '/root/mytesttx 1':
      
                         1      tx_c_tend           #     13.0 transaction
                         1      tx_nc_tend
                        11      tx_nc_tabort
                         0      tx_c_tabort_special
                         0      tx_c_tabort_no_special
      
               0.001070109 seconds time elapsed
      
        [root@s35lp76 perf]#
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180626071701.58190-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      742d92ff
    • T
      Revert "perf list: Add s390 support for detailed/verbose PMU event description" · b8b5ab52
      Thomas Richter 提交于
      This reverts commit 038586c3.
      
      Fix the support of detailed/verbose PMU event description by using the
      "Unit": keyword in the json files to address event names refering to the
      /sys/devices/cpum_[cs]f devices.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180621080452.61012-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b8b5ab52
    • L
      perf cs-etm: Bail out immediately for instruction sample failure · 6cd4ac6a
      Leo Yan 提交于
      If the instruction sample failure has happened, it isn't necessary to
      execute to the end of the function cs_etm__flush().  This commit is to
      bail out immediately and return the error code.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1529298599-3876-3-git-send-email-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6cd4ac6a
    • L
      perf cs-etm: Introduce invalid address macro · 6abf0f45
      Leo Yan 提交于
      This patch introduces invalid address macro and uses it to replace dummy
      value '0xdeadbeefdeadbeefUL'.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1529298599-3876-2-git-send-email-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6abf0f45
    • A
      perf hists: Clarify callchain disabling when available · e9de7e2f
      Arnaldo Carvalho de Melo 提交于
      We want to allow having mixed events with/without callchains, not
      using a global flag to show callchains, but allowing supressing
      callchains when they are present.
      
      So invert the logic of the last parameter to hists__fprint() to
      that effect.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ohqyisr6qge79qa95ojslptx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9de7e2f
  5. 11 7月, 2018 3 次提交
    • J
      perf script python: Fix dict reference counting · db0ba84c
      Janne Huttunen 提交于
      The dictionaries are attached to the parameter tuple that steals the
      references and takes care of releasing them when appropriate.  The code
      should not decrement the reference counts explicitly.  E.g. if libpython
      has been built with reference debugging enabled, the superfluous DECREFs
      will trigger this error when running perf script:
      
        Fatal Python error: Objects/tupleobject.c:238 object at
        0x7f10f2041b40 has negative ref count -1
        Aborted (core dumped)
      
      If the reference debugging is not enabled, the superfluous DECREFs might
      cause the dict objects to be silently released while they are still in
      use. This may trigger various other assertions or just cause perf
      crashes and/or weird and unexpected data changes in the stored Python
      objects.
      Signed-off-by: NJanne Huttunen <janne.huttunen@nokia.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jaroslav Skarvada <jskarvad@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1531133990-17485-1-git-send-email-janne.huttunen@nokia.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      db0ba84c
    • K
      perf llvm-utils: Remove bashism from kernel include fetch script · f6432b9f
      Kim Phillips 提交于
      Like system(), popen() calls /bin/sh, which may/may not be bash.
      
      Script when run on dash and encounters the line, yields:
      
       exit: Illegal number: -1
      
      checkbashisms report on script content:
      
       possible bashism (exit|return with negative status code):
       exit -1
      
      Remove the bashism and use the more portable non-zero failure
      status code 1.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20180629124652.8d0af7e2281fd3fd8262cacc@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f6432b9f
    • J
      perf tools: Generate a Python script compatible with Python 2 and 3 · 877cc639
      Jeremy Cline 提交于
      When generating a Python script with "perf script -g python", produce
      one that is compatible with Python 2 and 3. The difference between the
      two generated scripts is:
      
        --- python2-perf-script.py	2018-05-08 15:35:00.865889705 -0400
        +++ python3-perf-script.py	2018-05-08 15:34:49.019789564 -0400
        @@ -7,6 +7,8 @@
         # be retrieved using Python functions of the form common_*(context).
         # See the perf-script-python Documentation for the list of available functions.
      
        +from __future__ import print_function
        +
         import os
         import sys
      
        @@ -18,10 +20,10 @@
      
         def trace_begin():
        -	print "in trace_begin"
        +	print("in trace_begin")
      
         def trace_end():
        -	print "in trace_end"
        +	print("in trace_end")
      
         def raw_syscalls__sys_enter(event_name, context, common_cpu,
         	common_secs, common_nsecs, common_pid, common_comm,
        @@ -29,26 +31,26 @@
         		print_header(event_name, common_cpu, common_secs, common_nsecs,
         			common_pid, common_comm)
      
        -		print "id=%d, args=%s" % \
        -		(id, args)
        +		print("id=%d, args=%s" % \
        +		(id, args))
      
        -		print 'Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}'
        +		print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')
      
         		for node in common_callchain:
         			if 'sym' in node:
        -				print "\t[%x] %s" % (node['ip'], node['sym']['name'])
        +				print("\t[%x] %s" % (node['ip'], node['sym']['name']))
         			else:
        -				print "	[%x]" % (node['ip'])
        +				print("	[%x]" % (node['ip']))
      
        -		print "\n"
        +		print()
      
         def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict):
        -		print get_dict_as_string(event_fields_dict)
        -		print 'Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}'
        +		print(get_dict_as_string(event_fields_dict))
        +		print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')
      
         def print_header(event_name, cpu, secs, nsecs, pid, comm):
        -	print "%-20s %5u %05u.%09u %8u %-20s " % \
        -	(event_name, cpu, secs, nsecs, pid, comm),
        +	print("%-20s %5u %05u.%09u %8u %-20s " % \
        +	(event_name, cpu, secs, nsecs, pid, comm), end="")
      
         def get_dict_as_string(a_dict, delimiter=' '):
         	return delimiter.join(['%s=%s'%(k,str(v))for k,v in sorted(a_dict.items())])
      Signed-off-by: NJeremy Cline <jeremy@jcline.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Herton Krzesinski <herton@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/0100016341a7278a-d178c724-2b0f-49ca-be93-80a7d51aaa0d-000000@email.amazonses.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      877cc639
  6. 25 6月, 2018 7 次提交
    • R
      perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE] · 92ead7ee
      Ravi Bangoria 提交于
      perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
      which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is
      used as an end marker for the perf report but it's unused for perf
      script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate,
      just like it is done in 'perf report'.
      
      Before:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        #
      
      After:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        ls 7031 4392.099856:  250000 cpu-clock:uhH:  7f5e0ce7cd60
        ls 7031 4392.100355:  250000 cpu-clock:uhH:  7f5e0c706ef7
        #
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 57b5de46 ("perf report: Support forced leader feature in pipe mode")
      Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92ead7ee
    • T
      perf stat: Remove duplicate event counting · 6dde6429
      Thomas Richter 提交于
      'perf stat' shows a mismatch in perf stat regarding counter names on
      s390:
      
      Run command:
      
         [root@s35lp76 perf]# ./perf stat -e tx_nc_tend  -v --
                      ~/mytesttx 1 >/tmp/111
         tx_nc_tend: 1 573146 573146
         tx_nc_tend: 1 573146 573146
      
         Performance counter stats for '/root/mytesttx 1':
      
                       3      tx_nc_tend
      
             0.001037252 seconds time elapsed
      
         [root@s35lp76 perf]#
      
      shows transaction counter tx_nc_tend with value 3 but it was triggered
      only once as seen by the output of mytesttx.
      
      When looking up the event name tx_nc_tend the following function
      sequence is called:
      
      parse_events_multi_pmu_add()
      +--> perf_pmu__scan() being called with NULL argument
           +--> pmu_read_sysfs() scans directory ../devices/ for
                                 all PMUs
                +--> perf_pmu__find() tries to find a PMU in the
                                 global pmu list.
                     +--> pmu_lookup() called to read all file
                                       entries when not in global
                                       list.
      
      pmu_lookup() causes the issue. It calls
      +---> pmu_aliases() to read all the entries in the PMU directory.
                          On s390 this is named
                          /sys/devices/cpum_cf/events.
            +--> pmu_aliases_parse() reads all files and creates an
                             alias for each file name.
      
                             So we end up with first entry created by
                             reading the sysfs file
                             [root@s35lp76 perf]# cat /sys/devices/cpum_cf
                                                      /events/TX_NC_TEND
                             event=0x008d
                             [root@s35lp76 perf]#
      
                             Debug output shows this entry
                             tx_nc_tend -> 'cpum_cf'/'event=0x008d
                             '/
                             After all files in this directory have been
                             read and aliases created this function is called:
            +--> pmu_add_cpu_aliases()
                             This function looks up the CPU tables
                             created by the json files.
                             With json files for s390 now available all
                             the aliases are added to
                             the PMU alias list a second time.
                             The second entry is added by
                             reading the json file converted by jevent
                             resulting in file pmu-events/pmu-events.c:
      
                             {
                               .name = "tx_nc_tend",
                               .event = "event=0x8d",
                               .desc = "Unit: cpum_cf Completed TEND \
                                        instructions \
                                        in non-constrained TX mode",
                               .topic = "extended",
                               .long_desc = "A TEND instruction has \
                                             completed  in a \
                                             non-constrained \
                                             transactional-execution mode",
                               .pmu = "cpum_cf",
                              },
      
                              Debug output shows this entry
                              tx_nc_tend -> 'cpum_cf'/'event=0x8d'/
      
      Function pmu_aliases_parse() and pmu_add_cpu_aliases() both use
      __perf_pmu__new_alias() to add an alias to the PMU alias list. There is
      no check if an alias already exist
      
      So we end up with 2 entries for tx_nc_tend in the PMU alias list.
      
      Having set up the PMU alias list for this PMU now
      parse_events_multi_add_pmu() reads the complete alias list and adds each
      alias with parse_events_add_pmu() to the global perfev_list.  This
      causes the alias to be added multiple times to the event list.
      
      Fix this by making __perf_pmu__new_alias() to merge alias definitions if
      an alias is already on the alias list.  Also print a debug message when
      the alias has mismatches in some fields.
      
      Output before:
      
        [root@s35lp76 perf]# ./perf stat -e tx_nc_tend  -v \
                              -- ~/mytesttx 1 >/tmp/111
        tx_nc_tend: 1 551446 551446
      
         Performance counter stats for '/root/mytesttx 1':
      
                         3      tx_nc_tend
      
               0.000961134 seconds time elapsed
      
        [root@s35lp76 perf]#
      
      Output after:
      
        [root@s35lp76 perf]#  ./perf stat -e tx_nc_tend  -v \
                              -- ~/mytesttx 1 >/tmp/111
        tx_nc_tend: 1 551446 551446
      
         Performance counter stats for '/root/mytesttx 1':
      
                         1      tx_nc_tend
      
               0.000961134 seconds time elapsed
      
        [root@s35lp76 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180615101105.47047-3-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6dde6429
    • T
      perf alias: Rebuild alias expression string to make it comparable · 0c24d6fb
      Thomas Richter 提交于
      PMU alias definitions in sysfs files may have spaces, newlines and
      numbers with leading zeroes. Some alias definitions may also appear in
      JSON files without spaces, etc.
      
      Scan alias definitions and remove leading zeroes, spaces, newlines, etc
      and rebuild string to make alias->str member comparable.
      
      s390 for example  has terms specified as event=0x0091 (read from files
      ../<PMU>/events/<FILE> and terms specified as event=0x91 (read from JSON
      files).
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180615101105.47047-2-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0c24d6fb
    • T
      perf alias: Remove trailing newline when reading sysfs files · ea23ac73
      Thomas Richter 提交于
      Remove a trailing newline when reading sysfs file contents such as
      /sys/devices/cpum_cf/events/TX_NC_TEND.  This shows when verbose option
      -v is used.
      
      Output before:
      
        tx_nc_tend -> 'cpum_cf'/'event=0x008d
        '/
      
      Output after:
      
        tx_nc_tend -> 'cpum_cf'/'event=0x8d'/
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180615101105.47047-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea23ac73
    • Y
      perf tools: Fix a clang 7.0 compilation error · c6555c14
      Yonghong Song 提交于
      Arnaldo reported the perf build failure with latest llvm/clang compiler
      (7.0).
      
         $ make LIBCLANGLLVM=1 -C tools/perf/
         <SNIP>
          CC       /tmp/tmp.t53Qo38zci/tests/kmod-path.o
         util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::SmallVectorImpl<char> >
             perf::getBPFObjectFromModule(llvm::Module*)’:
         util/c++/clang.cpp:150:43: error: no matching function for call to
             ‘llvm::TargetMachine::addPassesToEmitFile(llvm::legacy::PassManager&,
              llvm::raw_svector_ostream&, llvm::TargetMachine::CodeGenFileType)’
                     TargetMachine::CGFT_ObjectFile)) {
                                                   ^
         In file included from util/c++/clang.cpp:25:0:
         /usr/local/include/llvm/Target/TargetMachine.h:254:16: note: candidate:
             virtual bool llvm::TargetMachine::addPassesToEmitFile(
             llvm::legacy::PassManagerBase&, llvm::raw_pwrite_stream&,
             llvm::raw_pwrite_stream*, llvm::TargetMachine::CodeGenFileType, bool,
             llvm::MachineModuleInfo*)
           virtual bool addPassesToEmitFile(PassManagerBase &, raw_pwrite_stream &,
                        ^~~~~~~~~~~~~~~~~~~
        /usr/local/include/llvm/Target/TargetMachine.h:254:16: note:
            candidate expects 6 arguments, 3 provided
        mv: cannot stat '/tmp/tmp.t53Qo38zci/util/c++/.clang.o.tmp': No such file or directory
        make[7]: *** [/home/acme/git/perf/tools/build/Makefile.build:101:
            /tmp/tmp.t53Qo38zci/util/c++/clang.o] Error 1
        make[6]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: c++] Error 2
        make[5]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 2
        make[5]: *** Waiting for unfinished jobs....
          CC       /tmp/tmp.t53Qo38zci/tests/thread-map.o
      
      The function addPassesToEmitFile signature changed in llvm 7.0 and such
      a change caused the failure. This patch fixed the issue with using
      proper function signatures under different compiler versions.
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20180616174739.1076733-1-yhs@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c6555c14
    • A
      perf intel-pt: Fix packet decoding of CYC packets · 621a5a32
      Adrian Hunter 提交于
      Use a 64-bit type so that the cycle count is not limited to 32-bits.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1528371002-8862-1-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      621a5a32
    • T
      perf record: Support s390 random socket_id assignment · 01766229
      Thomas Richter 提交于
      On s390 the socket identifier assigned to a CPU identifier is random and
      (depending on the configuration of the LPAR) may be higher than the CPU
      identifier. This is currently not supported.
      
      Fix this by allowing arbitrary socket identifiers being assigned to
      CPU id.
      
      Output before:
      
        [root@p23lp27 perf]# ./perf report --header -I -v
        ...
        socket_id number is too big.You may need to upgrade the perf tool.
        Error:
        The perf.data file has no samples!
        # ========
        # captured on    : Tue May 29 09:29:57 2018
        # header version : 1
        ...
        # Core ID and Socket ID information is not available
        ...
        [root@p23lp27 perf]#
      
      Output after:
      
        [root@p23lp27 perf]# ./perf report --header -I -v
        ...
        Error:
        The perf.data file has no samples!
        # ========
        # captured on    : Tue May 29 09:29:57 2018
        # header version : 1
        ...
        # CPU 0: Core ID 0, Socket ID 6
        # CPU 1: Core ID 1, Socket ID 3
        # CPU 2: Core ID -1, Socket ID -1
        ...
        [root@p23lp27 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180611073153.15592-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01766229
  7. 16 6月, 2018 1 次提交