1. 11 9月, 2021 1 次提交
    • K
      perf report: Add support to print a textual representation of IBS raw sample data · 291dcb98
      Kim Phillips 提交于
      Perf records IBS (Instruction Based Sampling) extra sample data when
      'perf record --raw-samples' is used with an IBS-compatible event, on a
      machine that supports IBS.  IBS support is indicated in
      CPUID_Fn80000001_ECX bit #10.
      
      Up until now, users have been able to see the extra sample data solely
      in raw hex format using 'perf report --dump-raw-trace'.  From there,
      users could decode the data either manually, or by using an external
      script.
      
      Enable the built-in 'perf report --dump-raw-trace' to do the decoding of
      the extra sample data bits, so manual or external script decoding isn't
      necessary.
      
      Example usage:
      
        $ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace
      
      Stdout contains IBS Fetch samples, e.g.:
      
        ibs_fetch_ctl:	02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat     7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0
        IbsFetchLinAd:	000056016b2ead40
        IbsFetchPhysAd:	000000115cedfd40
        c_ibs_ext_ctl:	0000000000000000 IbsItlbRefillLat   0
      
      ..and IBS Op samples, e.g.:
      
        ibs_op_ctl:	0000009e009e8968 MaxCnt  10000000 En 1 Val 1 CntCtl 1=uOps CurCnt       158
        IbsOpRip:	000056016b2ea73d
        ibs_op_data:	00000000000b0002 CompToRetCtr     2 TagToRetCtr    11 BrnRet 0  RipInvalid 0 BrnFuse 0 Microcode 0
        ibs_op_data2:	0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache
        ibs_op_data3:	0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth  4 bytes OpDcMissOpenMemReqs  0 DcMissLat     0 TlbRefillLat     0
        IbsDCLinAd:	00007f133c319ce0
        IbsDCPhysAd:	0000000270485ce0
      
      Committer notes:
      
      Fixed up this:
      
        util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’:
        util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=]
          125 |                          " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1));
              |                                          ^~~~~~
        In file included from /usr/include/stdio.h:866,
                         from util/amd-sample-raw.c:7:
        /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21
           71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           72 |                                    __glibc_objsize (__s), __fmt,
              |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           73 |                                    __va_arg_pack ());
              |                                    ~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      As that %2d won't limit the number of chars to 2, just state that 2 is
      the minimal width:
      
        $ cat printf.c
        #include <stdio.h>
        #include <stdlib.h>
      
        int main(int argc, char *argv[])
        {
        	char bf[64];
        	int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1]));
      
        	printf("strlen(%s): %u\n", bf, len);
      
        	return 0;
        }
        $ ./printf 1
        strlen( 1): 2
        $ ./printf 12
        strlen(12): 2
        $ ./printf 123
        strlen(123): 3
        $ ./printf 1234
        strlen(1234): 4
        $ ./printf 12345
        strlen(12345): 5
        $ ./printf 123456
        strlen(123456): 6
        $
      
      And since we probably don't want that output to be truncated, just
      assume the worst case, as the compiler did, and add a few more chars to
      that buffer.
      
      Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to
      avoid bugs when changing one but not the other.
      
      I also had to change this:
      
        -#include <asm/amd-ibs.h>
        +#include "../../arch/x86/include/asm/amd-ibs.h"
      
      To make it build on other architectures, just like intel-pt does.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https //lore.kernel.org/r/20210817221509.88391-4-kim.phillips@amd.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      291dcb98
  2. 06 7月, 2021 1 次提交
  3. 02 7月, 2021 1 次提交
  4. 28 5月, 2021 1 次提交
  5. 10 5月, 2021 1 次提交
  6. 29 4月, 2021 4 次提交
    • J
      perf record: Create two hybrid 'cycles' events by default · b53a0755
      Jin Yao 提交于
      When evlist is empty, for example no '-e' specified in perf record,
      one default 'cycles' event is added to evlist.
      
      While on hybrid platform, it needs to create two default 'cycles'
      events. One is for cpu_core, the other is for cpu_atom.
      
      This patch actually calls evsel__new_cycles() two times to create
      two 'cycles' events.
      
        # ./perf record -vv -a -- sleep 1
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
        sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
        sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
        sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
        sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
        sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
        sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
        sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
        sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
        sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
        sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
        sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
        sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
        sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
        sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
        sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
        sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
        sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
        ------------------------------------------------------------
      
      We have to create evlist-hybrid.c otherwise due to the symbol
      dependency the perf test python would be failed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b53a0755
    • J
      perf parse-events: Create two hybrid hardware events · 9cbfa2f6
      Jin Yao 提交于
      Current hardware events has special perf types PERF_TYPE_HARDWARE.
      But it doesn't pass the PMU type in the user interface. For a hybrid
      system, the perf kernel doesn't know which PMU the events belong to.
      
      So now this type is extended to be PMU aware type. The PMU type ID
      is stored at attr.config[63:32].
      
      PMU type ID is retrieved from sysfs.
      
        root@lkp-adl-d01:/sys/devices/cpu_atom# cat type
        8
      
        root@lkp-adl-d01:/sys/devices/cpu_core# cat type
        4
      
      When enabling a hybrid hardware event without specified pmu, such as,
      'perf stat -e cycles -a', two events are created automatically. One
      is for atom, the other is for core.
      
        # perf stat -e cycles -a -vv -- sleep 1
        Control descriptor is not initialized
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
        cycles: 0: 836272 1001525722 1001525722
        cycles: 1: 628564 1001580453 1001580453
        cycles: 2: 872693 1001605997 1001605997
        cycles: 3: 70417 1001641369 1001641369
        cycles: 4: 88593 1001726722 1001726722
        cycles: 5: 470495 1001752993 1001752993
        cycles: 6: 484733 1001840440 1001840440
        cycles: 7: 1272477 1001593105 1001593105
        cycles: 8: 209185 1001608616 1001608616
        cycles: 9: 204391 1001633962 1001633962
        cycles: 10: 264121 1001661745 1001661745
        cycles: 11: 826104 1001689904 1001689904
        cycles: 12: 89935 1001728861 1001728861
        cycles: 13: 70639 1001756757 1001756757
        cycles: 14: 185266 1001784810 1001784810
        cycles: 15: 171094 1001825466 1001825466
        cycles: 0: 129624 1001854843 1001854843
        cycles: 1: 122533 1001840421 1001840421
        cycles: 2: 90055 1001882506 1001882506
        cycles: 3: 139607 1001896463 1001896463
        cycles: 4: 141791 1001907838 1001907838
        cycles: 5: 530927 1001883880 1001883880
        cycles: 6: 143246 1001852529 1001852529
        cycles: 7: 667769 1001872626 1001872626
        cycles: 6744979 16026956922 16026956922
        cycles: 1965552 8014991106 8014991106
      
         Performance counter stats for 'system wide':
      
                 6,744,979      cpu_core/cycles/
                 1,965,552      cpu_atom/cycles/
      
               1.001882711 seconds time elapsed
      
      0x4 in 0x400000000 indicates the cpu_core pmu.
      0x8 in 0x800000000 indicates the cpu_atom pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-9-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9cbfa2f6
    • J
      perf pmu: Save detected hybrid pmus to a global pmu list · 44462430
      Jin Yao 提交于
      We identify the cpu_core pmu and cpu_atom pmu by explicitly
      checking following files:
      
      For cpu_core, checks:
      "/sys/bus/event_source/devices/cpu_core/cpus"
      
      For cpu_atom, checks:
      "/sys/bus/event_source/devices/cpu_atom/cpus"
      
      If the 'cpus' file exists and it has data, the pmu exists.
      
      But in order not to hardcode the "cpu_core" and "cpu_atom",
      and make the code in a generic way.
      
      So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the
      hybrid pmu exists. All the detected hybrid pmus are linked to a global
      list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the
      list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44462430
    • N
      perf data: Add JSON export · d0713d4c
      Nicholas Fraser 提交于
      This adds a feature to export perf data to JSON.
      
      The resolved symbols are exported into the JSON so that external tools
      don't need to load the dsos themselves (or even have access to them at
      all.) This makes it easy to load and analyze perf data with standalone
      tools where direct perf or libbabeltrace integration is impractical.
      
      The exporter uses a minimal inline JSON encoding without any external
      dependencies. Currently it only outputs some headers and sample metadata
      but it's easily extensible.
      
      Use it like this:
      
        $ perf data convert --to-json out.json
      
      Committer notes:
      
      Fixup a __printf() bug that broke the build:
      
        util/data-convert-json.c:103:11: error: expected ‘)’ before numeric constant
          103 | __(printf, 5, 6)
              |           ^~
              |           )
        util/data-convert-json.c: In function ‘output_sample_callchain_entry’:
        util/data-convert-json.c:124:2: error: implicit declaration of function ‘output_json_key_format’; did you mean ‘output_json_format’? [-Werror=implicit-function-declaration]
          124 |  output_json_key_format(out, false, 5, "ip", "\"0x%" PRIx64 "\"", ip);
              |  ^~~~~~~~~~~~~~~~~~~~~~
              |  output_json_format
      
      Also had to add this patch to fix errors reported by various versions of
      clang:
      
        -       if (al && al->sym && al->sym->name && strlen(al->sym->name) > 0) {
        +       if (al && al->sym && al->sym->namelen) {
      
      al->sym->name is a zero sized array, to avoid one extra alloc in the
      symbol__new() constructor, sym->namelen carries its strlen.
      
      Committer testing:
      
        $ ls -la out.json
        ls: cannot access 'out.json': No such file or directory
        $ perf record sleep 0.1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
        $ perf report --stats | grep -w SAMPLE
                  SAMPLE events:          8
        $ perf data convert --to-json out.json
        [ perf data convert: Converted 'perf.data' into JSON data 'out.json' ]
        [ perf data convert: Converted and wrote 0.002 MB (8 samples) ]
        $ ls -la out.json
        -rw-rw-r--. 1 acme acme 2017 Apr 26 17:29 out.json
        $ cat out.json
        {
        	"linux-perf-json-version": 1,
        	"headers": {
        		"header-version": 1,
        		"captured-on": "2021-04-26T20:28:57Z",
        		"data-offset": 432,
        		"data-size": 1016,
        		"feat-offset": 1448,
        		"hostname": "five",
        		"os-release": "5.11.14-200.fc33.x86_64",
        		"arch": "x86_64",
        		"cpu-desc": "AMD Ryzen 9 3900X 12-Core Processor",
        		"cpuid": "AuthenticAMD,23,113,0",
        		"nrcpus-online": 24,
        		"nrcpus-avail": 24,
        		"perf-version": "5.12.gee134f3189bd",
        		"cmdline": [
        			"/home/acme/bin/perf",
        			"record",
        			"sleep",
        			"0.1"
        		]
        	},
        	"samples": [
        		{
        			"timestamp": 170517539043684,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6268827"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539048443,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa661359d"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539051018,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6311e18"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539053652,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb77b4812b",
        					"symbol": "_dl_start",
        					"dso": "ld-2.32.so"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539055306,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6269286"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539057590,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa62abd8b"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539067559,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb77b5e9e9",
        					"symbol": "__GI___tunables_init",
        					"dso": "ld-2.32.so"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539282452,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb779978d2",
        					"symbol": "getenv",
        					"dso": "libc-2.32.so"
        				}
        			]
        		}
        	]
        }
        $
      Signed-off-by: NNicholas Fraser <nfraser@codeweavers.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tan Xiaojun <tanxiaojun@huawei.com>
      Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
      Link: http://lore.kernel.org/lkml/3884969f-804d-2f53-c648-e2b0bd85edff@codeweavers.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0713d4c
  7. 20 4月, 2021 1 次提交
  8. 18 2月, 2021 1 次提交
    • F
      perf tools: Add OCaml demangling · cef7af25
      Fabian Hemmer 提交于
      Detect symbols generated by the OCaml compiler based on their prefix.
      
      Demangle OCaml symbols, returning a newly allocated string (like the
      existing Java demangling functionality).
      
      Move a helper function (hex) from tests/code-reading.c to util/string.c
      
      To test:
      
        echo 'Printf.printf "%d\n" (Random.int 42)' > test.ml
        perf record ocamlopt.opt test.ml
        perf report -d ocamlopt.opt
      Signed-off-by: NFabian Hemmer <copy@copy.sh>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LPU-Reference: 20210203211537.b25ytjb6dq5jfbwx@nyu
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cef7af25
  9. 21 1月, 2021 1 次提交
    • S
      perf stat: Enable counting events for BPF programs · fa853c4b
      Song Liu 提交于
      Introduce 'perf stat -b' option, which counts events for BPF programs, like:
      
        [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
           3.490341825             60,720      ref-cycles
           3.490341825             37,797      cycles
           4.491540887             37,120      ref-cycles
           4.491540887             31,963      cycles
      
      The example above counts 'cycles' and 'ref-cycles' of BPF program of id
      254.  This is similar to bpftool-prog-profile command, but more
      flexible.
      
      'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
      programs (monitor-progs) to the target BPF program (target-prog). The
      monitor-progs read perf_event before and after the target-prog, and
      aggregate the difference in a BPF map. Then the user space reads data
      from these maps.
      
      A new 'struct bpf_counter' is introduced to provide a common interface
      that uses BPF programs/maps to count perf events.
      
      Committer notes:
      
      Removed all but bpf_counter.h includes from evsel.h, not needed at all.
      
      Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
      buffer passed to the kernel libbpf_num_possible_cpus() entries, not
      evsel__nr_cpus(evsel), as the former uses
      /sys/devices/system/cpu/possible while the later uses
      /sys/devices/system/cpu/online, which may be less than the 'possible'
      number making the bpf map lookup overwrite memory and cause hard to
      debug memory corruption.
      
      We need to continue using evsel__nr_cpus(evsel) when accessing the
      perf_counts array tho, not to overwrite another are of memory :-)
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fa853c4b
  10. 15 10月, 2020 1 次提交
  11. 18 9月, 2020 1 次提交
  12. 14 8月, 2020 1 次提交
  13. 06 8月, 2020 1 次提交
  14. 02 7月, 2020 2 次提交
    • I
      perf parse-events: Disable a subset of bison warnings · 1f16fcad
      Ian Rogers 提交于
      Rather than disable all warnings with -w, disable specific warnings.
      
      Predicate enabling the warnings on a recent version of bison.
      
      Tested with GCC 9.3.0 and clang 9.0.1.
      
      Committer testing:
      
      The full set of compilers, gcc and clang that this will be tested on
      will be on the signed tag when this change goes upstream.
      
      Had to add -Wno-switch-enum to build on opensuse tumbleweed:
      
        /tmp/build/perf/util/parse-events-bison.c: In function 'yydestruct':
        /tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEMPTY' not handled in switch [-Werror=switch-enum]
         1200 |   switch (yykind)
              |   ^~~~~~
        /tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEOF' not handled in switch [-Werror=switch-enum]
      
      Also replace -Wno-error=implicit-function-declaration with -Wno-implicit-function-declaration.
      
      Also needed to check just the first two levels of the bison version, as
      the patch was assuming that all versions were of the form x.y.z, and
      there are several cases where it is just x.y, breaking the build.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200619043356.90024-11-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1f16fcad
    • I
      perf parse-events: Disable a subset of flex warnings · 304d7a90
      Ian Rogers 提交于
      Rather than disable all warnings with -w, disable specific warnings.
      
      Predicate enabling the warnings on more recent flex versions.
      
      Tested with GCC 9.3.0 and clang 9.0.1.
      
      Committer notes:
      
      The full set of compilers, gcc and clang that this will be tested on
      will be on the signed tag when this change goes upstream.
      
      Added -Wno-misleading-indentation to the flex_flags to overcome this on
      opensuse tumbleweed when building with clang:
      
          CC       /tmp/build/perf/util/parse-events-flex.o
          CC       /tmp/build/perf/util/pmu.o
        /tmp/build/perf/util/parse-events-flex.c:5038:13: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
                    if ( ! yyg->yy_state_buf )
                    ^
        /tmp/build/perf/util/parse-events-flex.c:5036:9: note: previous statement is here
                if ( ! yyg->yy_state_buf )
                ^
      
      And we need to use this to redirect stderr to stdin and then grep in a
      way that is acceptable for BusyBox shell:
      
        2>&1 |
      
      Previously I was using:
      
        |&
      
      Which seems to be bash specific.
      
      Added -Wno-sign-compare to overcome this on systems such as centos:7:
      
          CC       /tmp/build/perf/util/parse-events-flex.o
          CC       /tmp/build/perf/util/pmu.o
          CC       /tmp/build/perf/util/pmu-flex.o
        util/parse-events.l: In function 'parse_events_lex':
        /tmp/build/perf/util/parse-events-flex.c:193:36: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
                         for ( yyl = n; yyl < yyleng; ++yyl )\
                                            ^
        /tmp/build/perf/util/parse-events-flex.c:204:9: note: in expansion of macro 'YY_LESS_LINENO'
      
      Added -Wno-unused-parameter to overcome this in systems such as
      centos:7:
      
          CC       /tmp/build/perf/util/parse-events-flex.o
          CC       /tmp/build/perf/util/pmu.o
        /tmp/build/perf/util/parse-events-flex.c: In function 'yy_fatal_error':
        /tmp/build/perf/util/parse-events-flex.c:6265:58: error: unused parameter 'yyscanner' [-Werror=unused-parameter]
         static void yy_fatal_error (yyconst char* msg , yyscan_t yyscanner)
                                                                  ^
      Added -Wno-missing-declarations to build in systems such as centos:6:
      
        /tmp/build/perf/util/parse-events-flex.c:6313: error: no previous prototype for 'parse_events_get_column'
        /tmp/build/perf/util/parse-events-flex.c:6389: error: no previous prototype for 'parse_events_set_column'
      
      And -Wno-missing-prototypes to cover older compilers:
      
        -Wmissing-prototypes (C only)
        Warn if a global function is defined without a previous prototype declaration. This warning is issued even if the definition itself provides a prototype. The aim is to detect global functions that fail to be declared in header files.
        -Wmissing-declarations (C only)
        Warn if a global function is defined without a previous declaration. Do so even if the definition itself provides a prototype. Use this option to detect global functions that are not declared in header files.
      
      Older C compilers lack -Wno-misleading-indentation, check if it is
      available before using it.
      
      Also needed to check just the first two levels of the flex version, as
      the patch was assuming that all versions were of the form x.y.z, and
      there are several cases where it is just x.y, breaking the build.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200619043356.90024-8-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      304d7a90
  15. 23 6月, 2020 6 次提交
  16. 01 6月, 2020 1 次提交
  17. 30 5月, 2020 1 次提交
    • S
      perf tools: Add optional support for libpfm4 · 70943490
      Stephane Eranian 提交于
      This patch links perf with the libpfm4 library if it is available and
      LIBPFM4 is passed to the build. The libpfm4 library contains hardware
      event tables for all processors supported by perf_events. It is a helper
      library that helps convert from a symbolic event name to the event
      encoding required by the underlying kernel interface. This library is
      open-source and available from: http://perfmon2.sf.net.
      
      With this patch, it is possible to specify full hardware events by name.
      Hardware filters are also supported. Events must be specified via the
      --pfm-events and not -e option. Both options are active at the same time
      and it is possible to mix and match:
      
        $ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles ....
      
      One needs to explicitely ask for its inclusion by using the LIBPFM4 make
      command line option, ie its opt-in rather than opt-out of feature
      detection and build support.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lore.kernel.org/lkml/20200505182943.218248-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70943490
  18. 28 5月, 2020 1 次提交
    • I
      perf tools: Grab a copy of libbpf's hashmap · eee19501
      Ian Rogers 提交于
      Allow use of hashmap in perf. Modify perf's check-headers.sh script to
      check that the files are kept in sync, in the same way kernel headers
      are checked. This will warn if they are out of sync at the start of a
      perf build.
      
      Committer note:
      
      This starts out of synch as a fix went thru the bpf tree, namely the one
      removing the needless libbpf_internal.h include in hashmap.h.
      
      There is also another change related to __WORDSIZE, that as is in
      tools/lib/bpf/hashmap.h causes the tools/perf/ build to fail in systems
      such as Alpine Linus, that uses the Musl libc, so we need an alternative
      way of having __WORDSIZE available, use the one used by
      tools/include/linux/bitops.h, that builds in all the systems I have
      build containers for.
      
      These differences will be resolved at some point, so keep the warning in
      check-headers.sh as a reminder.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: kp singh <kpsingh@chromium.org>
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200515221732.44078-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eee19501
  19. 06 5月, 2020 2 次提交
  20. 10 3月, 2020 2 次提交
  21. 28 11月, 2019 2 次提交
  22. 07 11月, 2019 1 次提交
  23. 11 10月, 2019 1 次提交
    • J
      perf diff: Report noisy for cycles diff · cebf7d51
      Jin Yao 提交于
      This patch prints the stddev and hist for the cycles diff of program
      block. It can help us to understand if the cycles is noisy or not.
      
      This patch is inspired by Andi Kleen's patch:
      
        https://lwn.net/Articles/600471/
      
      We create new option '--cycles-hist'.
      
      Example:
      
        perf record -b ./div
        perf record -b ./div
        perf diff -c cycles
      
        # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
        # ........  .......................................................... ....  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0  libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0  libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0  div                [.] rand@plt
             0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr
      
      When we enable the option '--cycles-hist', the output is
      
        perf diff -c cycles --cycles-hist
      
        # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
        # ........  .......................................................... ....  .................  .................  ............................
        #
            46.72%                                      [div.c:40 -> div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
            46.72%                                      [div.c:42 -> div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
            46.72%                                      [div.c:42 -> div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
            20.54%                          [random_r.c:357 -> random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:357 -> random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:388]    0                     libc-2.27.so       [.] __random_r
            20.54%                          [random_r.c:388 -> random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
            17.04%                              [random.c:288 -> random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:291 -> random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:293 -> random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
            17.04%                              [random.c:295 -> random.c:295]    0                     libc-2.27.so       [.] __random
            17.04%                              [random.c:298 -> random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
             8.40%                                      [div.c:22 -> div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
             8.40%                                      [div.c:27 -> div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
             5.14%                                    [rand.c:26 -> rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
             5.14%                                    [rand.c:28 -> rand.c:28]    0                     libc-2.27.so       [.] rand
             2.15%                                  [rand@plt+0 -> rand@plt+0]    0                     div                [.] rand@plt
             0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
             0.00%                                [do_mmap+714 -> do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+737 -> do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
             0.00%                                [do_mmap+262 -> do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
             0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
             0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
             0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr
      
       v8:
       ---
       Rebase to perf/core branch
      
       v7:
       ---
       1. v6 got Jiri's ACK.
       2. Rebase to latest perf/core branch.
      
       v6:
       ---
       1. Jiri provides better code for using data__hpp_register() in ui_init().
          Use this code in v6.
      
       v5:
       ---
       1. Refine the use of data__hpp_register() in ui_init() according to
          Jiri's suggestion.
      
       v4:
       ---
       1. Rename the new option from '--noisy' to '--cycles-hist'
       2. Remove the option '-n'.
       3. Only update the spark value and stats when '--cycles-hist' is enabled.
       4. Remove the code of printing '..'.
      
       v3:
       ---
       1. Move the histogram to a separate column
       2. Move the svals[] out of struct stats
      
       v2:
       ---
       Jiri got a compile error,
      
        CC       builtin-diff.o
        builtin-diff.c: In function ‘compute_cycles_diff’:
        builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
        712 |          labs(pair->block_info->cycles_spark[i] -
            |          ^~~~
      
       Because the result of u64 - u64 is still u64. Now we change the type of
       cycles_spark[] to s64.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cebf7d51
  24. 26 9月, 2019 1 次提交
  25. 25 9月, 2019 1 次提交
  26. 20 9月, 2019 1 次提交
  27. 01 9月, 2019 1 次提交
  28. 26 8月, 2019 1 次提交