1. 17 5月, 2019 9 次提交
    • J
      perf stat: Support 'percore' event qualifier · 4fc4d8df
      Jin Yao 提交于
      With this patch, we can use the 'percore' event qualifier in perf-stat.
      
        root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
          1.000773050 S0-C0   98,352,832 cpu/event=0,umask=0x3,percore=1/  (50.01%)
          1.000773050 S0-C1  103,763,057 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C2  196,776,995 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C3  176,493,779 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 CPU0    47,699,641 cpu/event=0,umask=0x3/            (50.02%)
          1.000773050 CPU1    49,052,451 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU2   102,771,422 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU3   100,784,662 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU4    43,171,342 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU5    54,152,158 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU6    93,618,410 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU7    74,477,589 cpu/event=0,umask=0x3/            (49.99%)
      
      In this example, we count the event 'ref-cycles' per-core and per-CPU in
      one perf stat command-line. From the output, we can see:
      
        S0-C0 = CPU0 + CPU4
        S0-C1 = CPU1 + CPU5
        S0-C2 = CPU2 + CPU6
        S0-C3 = CPU3 + CPU7
      
      So the result is expected (tiny difference is ignored).
      
      Note that, the 'percore' event qualifier needs to use with option '-A'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4fc4d8df
    • J
      perf stat: Factor out aggregate counts printing · 40480a81
      Jin Yao 提交于
      Move the aggregate counts printing to a new function
      print_counter_aggrdata, which will be used in following patches.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40480a81
    • J
      perf tools: Add a 'percore' event qualifier · 064b4e82
      Jin Yao 提交于
      Add a 'percore' event qualifier, like cpu/event=0,umask=0x3,percore=1/,
      that sums up the event counts for both hardware threads in a core.
      
      We can already do this with --per-core, but it's often useful to do
      this together with other metrics that are collected per hardware thread.
      So we need to support this per-core counting on a event level.
      
      This can be implemented in only the user tool, no kernel support needed.
      
       v4:
       ---
       1. Add Arnaldo's patch which updates the documentation for
          this new qualifier.
       2. Rebase to latest perf/core branch
      
       v3:
       ---
       Simplify the code according to Jiri's comments.
       Before:
         "return term->val.percore ? true : false;"
       Now:
         "return term->val.percore;"
      
       v2:
       ---
       Change the qualifier name from 'coresum' to 'percore' according to
       comments from Jiri and Andi.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      064b4e82
    • T
      perf docs: Add description for stderr · 6cf62656
      Thomas Richter 提交于
      'perf report' displays recorded data on the screen and emits warnings
      and debug messages in the status line (last one on screen).
      
      perf also supports the possibility to write all debug messages to stderr
      (instead of writing them to the status line).
      
      This is achieved with the following command:
      
        # ./perf --debug stderr=1 report -vvvvv -i ~/fast.data 2>/tmp/2
        # ll /tmp/2
        -rw-rw-r-- 1 tmricht tmricht 5420835 May  7 13:46 /tmp/2
        #
      
      The usage of variable stderr=1 is not documented, so add it to the perf
      man page.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20190513080220.91966-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6cf62656
    • A
      perf intel-pt: Fix sample timestamp wrt non-taken branches · 1b6599a9
      Adrian Hunter 提交于
      The sample timestamp is updated to ensure that the timestamp represents
      the time of the sample and not a branch that the decoder is still
      walking towards. The sample timestamp is updated when the decoder
      returns, but the decoder does not return for non-taken branches. Update
      the sample timestamp then also.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b6599a9
    • A
      perf intel-pt: Fix improved sample timestamp · 61b6e08d
      Adrian Hunter 提交于
      The decoder uses its current timestamp in samples. Usually that is a
      timestamp that has already passed, but in some cases it is a timestamp
      for a branch that the decoder is walking towards, and consequently
      hasn't reached.
      
      The intel_pt_sample_time() function decides which is which, but was not
      handling TNT packets exactly correctly.
      
      In the case of TNT, the timestamp applies to the first branch, so the
      decoder must first walk to that branch.
      
      That means intel_pt_sample_time() should return true for TNT, and this
      patch makes that change. However, if the first branch is a non-taken
      branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
      for subsequent taken branches in the same TNT packet.
      
      To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
      distinguish the cases.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61b6e08d
    • A
      perf intel-pt: Fix instructions sampling rate · 7ba8fa20
      Adrian Hunter 提交于
      The timestamp used to determine if an instruction sample is made, is an
      estimate based on the number of instructions since the last known
      timestamp. A consequence is that it might go backwards, which results in
      extra samples. Change it so that a sample is only made when the
      timestamp goes forwards.
      
      Note this does not affect a sampling period of 0 or sampling periods
      specified as a count of instructions.
      
      Example:
      
       Before:
      
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         10 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          8 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          6 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          4 instructions:u:      7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16423 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222734:      12731 instructions:u:      7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      
       After:
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16479 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f4aa0819 ("perf tools: Add Intel PT decoder")
      Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ba8fa20
    • K
      perf regs x86: Add X86 specific arch__intr_reg_mask() · 6466ec14
      Kan Liang 提交于
      XMM registers can be collected on Icelake and later platforms.
      
      Add specific arch__intr_reg_mask(), which creating an event to check if
      the kernel and hardware can collect XMM registers.
      
      Test on Skylake which doesn't support XMM registers collection. There is
      nothing changed.
      
         #perf record -I?
         available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9
         R10 R11 R12 R13 R14 R15
      
         Usage: perf record [<options>] [<command>]
          or: perf record [<options>] -- <command> [<options>]
      
          -I, --intr-regs[=<any register>]
                                sample selected machine registers on
         interrupt, use '-I?' to list register names
      
         #perf record -I
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.905 MB perf.data (2520 samples) ]
      
         #perf evlist -v
         cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
         IP|TID|TIME|CPU|PERIOD|REGS_INTR, read_format: ID, disabled: 1,
         inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3,
         sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol:
         1, bpf_event: 1, sample_regs_intr: 0xff0fff
      
      Test on Icelake which support XMM registers collection.
      
         #perf record -I?
         available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
         R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9
         XMM10 XMM11 XMM12 XMM13 XMM14 XMM15
      
         Usage: perf record [<options>] [<command>]
          or: perf record [<options>] -- <command> [<options>]
      
          -I, --intr-regs[=<any register>]
                                sample selected machine registers on
         interrupt, use '-I?' to list register names
      
         #perf record -I
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.800 MB perf.data (318 samples) ]
      
         #perf evlist -v
         cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
         IP|TID|TIME|CPU|PERIOD|REGS_INTR, read_format: ID, disabled: 1,
         inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3,
         sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol:
         1, bpf_event: 1, sample_regs_intr: 0xffffffff00ff0fff
      
      Committer notes:
      
      Don't set attr.sample_period as a named struct init, as it is part of an
      unnamed union in 'struct perf_event_attr', and doing so breaks the build
      on older gcc versions, such as:
      
        gcc version 4.1.2 20080704 (Red Hat 4.1.2-55)
        gcc version 4.4.7 20120313 (Red Hat 4.4.7-23) (GCC)
      
        arch/x86/util/perf_regs.c: In function 'arch__intr_reg_mask':
        arch/x86/util/perf_regs.c:279: error: unknown field 'sample_period' specified in initializer
        cc1: warnings being treated as errors
        arch/x86/util/perf_regs.c:279: warning: missing braces around initializer
        arch/x86/util/perf_regs.c:279: warning: (near initialization for 'attr.<anonymous>')
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      [ Only on a lenovo t480s, a skylake machine, where the XMM registers didn't show up in -I?/--user-regs=? as expected ]
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6466ec14
    • K
      perf parse-regs: Add generic support for arch__intr/user_reg_mask() · af785e75
      Kan Liang 提交于
      There may be different register mask for use with intr or user on some
      platforms, e.g. Icelake.
      
      Add weak functions arch__intr_reg_mask() and arch__user_reg_mask() to
      return intr and user register mask respectively.
      
      Check mask before printing or comparing the register name.
      
      Generic code always return PERF_REGS_MASK. No functional change.
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af785e75
  2. 16 5月, 2019 31 次提交