1. 05 6月, 2019 8 次提交
    • A
      perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip · f3c98c4b
      Adrian Hunter 提交于
      To make it easier to add new code for different TIP cases, separate each
      case.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-11-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3c98c4b
    • A
      perf intel-pt: Record when decoding PSB+ packets · 9bc668e3
      Adrian Hunter 提交于
      In preparation for using MTC packets to count cycles, record whether
      decoding is between a PSB and PSBEND packets.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-10-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9bc668e3
    • A
      perf intel-pt: Add support for samples to contain IPC ratio · 5b1dc0fd
      Adrian Hunter 提交于
      Copy the incremental instruction count and cycle count onto 'instructions'
      and 'branches' samples.
      
      Because Intel PT does not update the cycle count on every branch or
      instruction, the incremental values will often be zero.
      
      When there are values, they will be the number of instructions and
      number of cycles since the last update, and thus represent the average
      IPC since the last IPC value.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-8-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b1dc0fd
    • A
      perf tools: Add IPC information to perf_sample · 61d276f4
      Adrian Hunter 提交于
      Add counts of instructions and cycles, in order to represent
      instructions-per-cycle (IPC).
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-7-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61d276f4
    • A
      perf intel-pt: Accumulate cycle count from CYC packets · 7b4b4f83
      Adrian Hunter 提交于
      In preparation for providing instructions-per-cycle (IPC) information,
      accumulate cycle count from CYC packets.
      
      Although CYC packets are optional (requires config term 'cyc' to enable
      cycle-accurate mode when recording), the simplest way to count cycles is
      with CYC packets.
      
      The first complication is that cycles must be counted only when also
      counting instructions.
      
      That means when control flow packet generation is enabled i.e. between
      TIP.PGE and TIP.PGD packets.
      
      Also, sampling the cycle count follows the same rules as sampling the
      timestamp, that is, not before the instruction to which the decoder is
      walking is reached.
      
      In addition, the cycle count is not accurate for any but the first
      branch of a TNT packet.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7b4b4f83
    • A
      perf intel-pt: Factor out intel_pt_update_sample_time · 948e9dc8
      Adrian Hunter 提交于
      To eliminate some duplication and make the code more understandable,
      factor out intel_pt_update_sample_time.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20190520113728.14389-5-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      948e9dc8
    • A
      perf record: Allow mixing --user-regs with --call-graph=dwarf · d194d8fc
      Alexey Budankov 提交于
      When DWARF stacks were requested and at the same time that the user
      specifies a register set using the --user-regs option the full register
      context was being captured on samples:
      
        $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- stack_test2.g.O3
      
        188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac
        ... FP chain: nr:0
        ... user regs: mask 0xff0fff ABI 64-bit
        .... AX    0x53b
        .... BX    0x7ffedbdd3cc0
        .... CX    0xffffffff
        .... DX    0x33d3a
        .... SI    0x7f09b74c38d0
        .... DI    0x0
        .... BP    0x401260
        .... SP    0x7ffedbdd3cc0
        .... IP    0x401236
        .... FLAGS 0x20a
        .... CS    0x33
        .... SS    0x2b
        .... R8    0x7f09b74c3800
        .... R9    0x7f09b74c2da0
        .... R10   0xfffffffffffff3ce
        .... R11   0x246
        .... R12   0x401070
        .... R13   0x7ffedbdd5db0
        .... R14   0x0
        .... R15   0x0
        ... ustack: size 1024, offset 0xe0
         . data_src: 0x5080021
         ... thread: stack_test2.g.O:23828
         ...... dso: /root/abudanko/stacks/stack_test2.g.O3
      
      I.e. the --user-regs=IP,SP,BP was being ignored, being overridden by the
      needs of --call-graph=dwarf.
      
      After applying the change in this patch the sample data contains the
      user specified register, but making sure that at least the minimal set
      of register needed for DWARF unwinding (DWARF_MINIMAL_REGS) is
      requested.
      
      The user is warned that DWARF unwinding may not work if extra registers
      end up being needed.
      
        -g call-graph dwarf,K                         full_regs
        --user-regs=user_regs                         user_regs
        -g call-graph dwarf,K --user-regs=user_regs	user_regs + DWARF_MINIMAL_REGS
      
        $ perf record -g --call-graph dwarf,1024 --user-regs=BP -- ls
        WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced.
        arch   COPYING	Documentation  include	Kbuild	 lbuild    MAINTAINERS	modules.builtin		 Module.symvers  perf.data.old	scripts   System.map  virt
        block  CREDITS	drivers        init	Kconfig  lib	   Makefile	modules.builtin.modinfo  net		 README		security  tools       vmlinux
        certs  crypto	fs	       ipc	kernel	 LICENSES  mm		modules.order		 perf.data	 samples	sound	  usr	      vmlinux.o
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]
      
        188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c
        ... FP chain: nr:0
        ... user regs: mask 0x1c0 ABI 64-bit
        .... BP    0x401260
        .... SP    0x7ffd3d85cc20
        .... IP    0x401236
        ... ustack: size 1024, offset 0x58
         . data_src: 0x5080021
      
      Committer notes:
      
      Detected build failures on arches where PERF_REGS_ is not available,
      such as debian:experimental-x-{mips,mips64,mipsel}, fedora 24 and 30 for
      ARC uClibc and glibc, reported to Alexey that provided a patch moving
      the DWARF_MINIMAL_REGS from evsel.c to util/perf_regs.h, where it is
      guarded by an HAVE_PERF_REGS_SUPPORT ifdef.
      
      Committer testing:
      
        # perf record --user-regs=bp,ax -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.955 MB perf.data (1773 samples) ]
        # perf script -F+uregs | grep AX: | head -5
           perf 1719 [000] 181.272398:    1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
           perf 1719 [000] 181.272402:    1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
           perf 1719 [000] 181.272403:    8 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
           perf 1719 [000] 181.272405:  181 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
           perf 1719 [000] 181.272406: 4405 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
        # perf record --call-graph=dwarf --user-regs=bp,ax -a sleep 1
        WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced.
        [ perf record: Woken up 55 times to write data ]
        [ perf record: Captured and wrote 24.184 MB perf.data (2841 samples) ]
        [root@quaco ~]# perf script --hide-call-graph -F+uregs | grep AX: | head -5
           perf 1729 [000] 211.268006:    1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
           perf 1729 [000] 211.268014:    1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
           perf 1729 [000] 211.268017:    5 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
           perf 1729 [000] 211.268020:   48 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
           perf 1729 [000] 211.268024:  490 cycles: ffffffffba00e471 intel_bts_enable_local+0x21 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
        #
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/e7fd37b1-af22-0d94-a0dc-5895e803bbfe@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d194d8fc
    • L
      perf symbols: Remove unused variable 'err' · e5f177a5
      Leo Yan 提交于
      Variable 'err' is defined but never used in function symsrc__init(),
      remove it and directly return -1 at the end of the function.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190530093801.20510-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e5f177a5
  2. 31 5月, 2019 2 次提交
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 · c942fddf
      Thomas Gleixner 提交于
      Based on 3 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version this program is distributed in the
        hope that it will be useful but without any warranty without even
        the implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [kishon] [vijay] [abraham]
        [i] [kishon]@[ti] [com] this program is distributed in the hope that
        it will be useful but without any warranty without even the implied
        warranty of merchantability or fitness for a particular purpose see
        the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [graeme] [gregory]
        [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
        [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
        [hk] [hemahk]@[ti] [com] this program is distributed in the hope
        that it will be useful but without any warranty without even the
        implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 1105 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Reviewed-by: NRichard Fontana <rfontana@redhat.com>
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c942fddf
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 · 1a59d1b8
      Thomas Gleixner 提交于
      Based on 1 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version this program is distributed in the
        hope that it will be useful but without any warranty without even
        the implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details you
        should have received a copy of the gnu general public license along
        with this program if not write to the free software foundation inc
        59 temple place suite 330 boston ma 02111 1307 usa
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 1334 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Reviewed-by: NRichard Fontana <rfontana@redhat.com>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a59d1b8
  3. 29 5月, 2019 17 次提交
  4. 28 5月, 2019 4 次提交
  5. 17 5月, 2019 7 次提交
    • J
      perf stat: Support 'percore' event qualifier · 4fc4d8df
      Jin Yao 提交于
      With this patch, we can use the 'percore' event qualifier in perf-stat.
      
        root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
          1.000773050 S0-C0   98,352,832 cpu/event=0,umask=0x3,percore=1/  (50.01%)
          1.000773050 S0-C1  103,763,057 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C2  196,776,995 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 S0-C3  176,493,779 cpu/event=0,umask=0x3,percore=1/  (50.02%)
          1.000773050 CPU0    47,699,641 cpu/event=0,umask=0x3/            (50.02%)
          1.000773050 CPU1    49,052,451 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU2   102,771,422 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU3   100,784,662 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU4    43,171,342 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU5    54,152,158 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU6    93,618,410 cpu/event=0,umask=0x3/            (49.98%)
          1.000773050 CPU7    74,477,589 cpu/event=0,umask=0x3/            (49.99%)
      
      In this example, we count the event 'ref-cycles' per-core and per-CPU in
      one perf stat command-line. From the output, we can see:
      
        S0-C0 = CPU0 + CPU4
        S0-C1 = CPU1 + CPU5
        S0-C2 = CPU2 + CPU6
        S0-C3 = CPU3 + CPU7
      
      So the result is expected (tiny difference is ignored).
      
      Note that, the 'percore' event qualifier needs to use with option '-A'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4fc4d8df
    • J
      perf stat: Factor out aggregate counts printing · 40480a81
      Jin Yao 提交于
      Move the aggregate counts printing to a new function
      print_counter_aggrdata, which will be used in following patches.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      40480a81
    • J
      perf tools: Add a 'percore' event qualifier · 064b4e82
      Jin Yao 提交于
      Add a 'percore' event qualifier, like cpu/event=0,umask=0x3,percore=1/,
      that sums up the event counts for both hardware threads in a core.
      
      We can already do this with --per-core, but it's often useful to do
      this together with other metrics that are collected per hardware thread.
      So we need to support this per-core counting on a event level.
      
      This can be implemented in only the user tool, no kernel support needed.
      
       v4:
       ---
       1. Add Arnaldo's patch which updates the documentation for
          this new qualifier.
       2. Rebase to latest perf/core branch
      
       v3:
       ---
       Simplify the code according to Jiri's comments.
       Before:
         "return term->val.percore ? true : false;"
       Now:
         "return term->val.percore;"
      
       v2:
       ---
       Change the qualifier name from 'coresum' to 'percore' according to
       comments from Jiri and Andi.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1555077590-27664-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      064b4e82
    • A
      perf intel-pt: Fix sample timestamp wrt non-taken branches · 1b6599a9
      Adrian Hunter 提交于
      The sample timestamp is updated to ensure that the timestamp represents
      the time of the sample and not a branch that the decoder is still
      walking towards. The sample timestamp is updated when the decoder
      returns, but the decoder does not return for non-taken branches. Update
      the sample timestamp then also.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b6599a9
    • A
      perf intel-pt: Fix improved sample timestamp · 61b6e08d
      Adrian Hunter 提交于
      The decoder uses its current timestamp in samples. Usually that is a
      timestamp that has already passed, but in some cases it is a timestamp
      for a branch that the decoder is walking towards, and consequently
      hasn't reached.
      
      The intel_pt_sample_time() function decides which is which, but was not
      handling TNT packets exactly correctly.
      
      In the case of TNT, the timestamp applies to the first branch, so the
      decoder must first walk to that branch.
      
      That means intel_pt_sample_time() should return true for TNT, and this
      patch makes that change. However, if the first branch is a non-taken
      branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
      for subsequent taken branches in the same TNT packet.
      
      To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
      distinguish the cases.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      61b6e08d
    • A
      perf intel-pt: Fix instructions sampling rate · 7ba8fa20
      Adrian Hunter 提交于
      The timestamp used to determine if an instruction sample is made, is an
      estimate based on the number of instructions since the last known
      timestamp. A consequence is that it might go backwards, which results in
      extra samples. Change it so that a sample is only made when the
      timestamp goes forwards.
      
      Note this does not affect a sampling period of 0 or sampling periods
      specified as a count of instructions.
      
      Example:
      
       Before:
      
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         10 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          8 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          6 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          4 instructions:u:      7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16423 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222734:      12731 instructions:u:      7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      
       After:
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16479 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f4aa0819 ("perf tools: Add Intel PT decoder")
      Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7ba8fa20
    • K
      perf parse-regs: Add generic support for arch__intr/user_reg_mask() · af785e75
      Kan Liang 提交于
      There may be different register mask for use with intr or user on some
      platforms, e.g. Icelake.
      
      Add weak functions arch__intr_reg_mask() and arch__user_reg_mask() to
      return intr and user register mask respectively.
      
      Check mask before printing or comparing the register name.
      
      Generic code always return PERF_REGS_MASK. No functional change.
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af785e75
  6. 16 5月, 2019 2 次提交
    • K
      perf parse-regs: Split parse_regs · aeea9062
      Kan Liang 提交于
      The available registers for --int-regs and --user-regs may be different,
      e.g. XMM registers.
      
      Split parse_regs into two dedicated functions for --int-regs and
      --user-regs respectively.
      
      Modify the warning message. "--user-regs=?" should be applied to show
      the available registers for --user-regs.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-1-git-send-email-kan.liang@linux.intel.com
      [ Changed docs as suggested by Ravi and agreed by Kan ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aeea9062
    • A
      perf report: Implement perf.data record decompression · cb62c6f1
      Alexey Budankov 提交于
      zstd_init(, comp_level = 0) initializes decompression part of API only
      hat now consists of zstd_decompress_stream() function.
      
      The perf.data PERF_RECORD_COMPRESSED records are decompressed using
      zstd_decompress_stream() function into a linked list of mmaped memory
      regions of mmap_comp_len size (struct decomp).
      
      After decompression of one COMPRESSED record its content is iterated and
      fetched for usual processing. The mmaped memory regions with
      decompressed events are kept in the linked list till the tool process
      termination.
      
      When dumping raw records (e.g., perf report -D --header) file offsets of
      events from compressed records are printed as zero.
      
      Committer notes:
      
      Since now we have support for processing PERF_RECORD_COMPRESSED, we see
      none, in raw form, like we saw in the previous patch commiter notes,
      they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
      records, we only see the stats for those PERF_RECORD_COMPRESSED events,
      and since I used the file generated in the commiter notes for the
      previous patch, there they are, 2 compressed records:
      
        $ perf report --header-only | grep cmdline
        # cmdline : /home/acme/bin/perf record -z2 sleep 1
        $ perf report -D | grep COMPRESS
              COMPRESSED events:          2
              COMPRESSED events:          0
        $ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 15  of event 'cycles:u'
        # Event count (approx.): 962227
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ...........................
        #
            46.99%  sleep    libc-2.28.so      [.] _dl_addr
            29.24%  sleep    [unknown]         [k] 0xffffffffaea00a67
            16.45%  sleep    libc-2.28.so      [.] __GI__IO_un_link.part.1
             5.92%  sleep    ld-2.28.so        [.] _dl_setup_hash
             1.40%  sleep    libc-2.28.so      [.] __nanosleep
             0.00%  sleep    [unknown]         [k] 0xffffffffaea00163
      
        #
        # (Tip: To see callchains in a more compact form: perf report -g folded)
        #
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb62c6f1