1. 01 9月, 2021 5 次提交
  2. 10 7月, 2021 2 次提交
  3. 04 6月, 2021 1 次提交
  4. 29 4月, 2021 5 次提交
    • J
      perf stat: Warn group events from different hybrid PMU · 660e533e
      Jin Yao 提交于
      If a group has events which are from different hybrid PMUs,
      shows a warning:
      
      "WARNING: events in group from different hybrid PMUs!"
      
      This is to remind the user not to put the core event and atom
      event into one group.
      
      Next, just disable grouping.
      
        # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
        WARNING: events in group from different hybrid PMUs!
        WARNING: grouped events cpus do not match, disabling group:
          anon group { cpu_core/cycles/, cpu_atom/cycles/ }
      
         Performance counter stats for 'system wide':
      
                 5,438,125      cpu_core/cycles/
                 3,914,586      cpu_atom/cycles/
      
               1.004250966 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-17-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      660e533e
    • J
      perf record: Create two hybrid 'cycles' events by default · b53a0755
      Jin Yao 提交于
      When evlist is empty, for example no '-e' specified in perf record,
      one default 'cycles' event is added to evlist.
      
      While on hybrid platform, it needs to create two default 'cycles'
      events. One is for cpu_core, the other is for cpu_atom.
      
      This patch actually calls evsel__new_cycles() two times to create
      two 'cycles' events.
      
        # ./perf record -vv -a -- sleep 1
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
        sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
        sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
        sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
        sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
        sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
        sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
        sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
        sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
        sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
        sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
        sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
        sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
        sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
        sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
        sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
        sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
        sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
        ------------------------------------------------------------
      
      We have to create evlist-hybrid.c otherwise due to the symbol
      dependency the perf test python would be failed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b53a0755
    • J
      perf stat: Uniquify hybrid event name · 12279429
      Jin Yao 提交于
      It would be useful to let user know the pmu which the event belongs to.
      perf-stat has supported '--no-merge' option and it can print the pmu
      name after the event name, such as:
      
      "cycles [cpu_core]"
      
      Now this option is enabled by default for hybrid platform but change
      the format to:
      
      "cpu_core/cycles/"
      
      If user configs the name, we still use the user specified name.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12279429
    • S
      perf stat: Introduce ':b' modifier · 01bd8efc
      Song Liu 提交于
      Introduce 'b' modifier to event parser, which means use BPF program to
      manage this event. This is the same as --bpf-counters option, but only
      applies to this event. For example,
      
        perf stat -e cycles:b,cs               # use bpf for cycles, but not cs
        perf stat -e cycles,cs --bpf-counters  # use bpf for both cycles and cs
      Suggested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/r/20210425214333.1090950-5-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01bd8efc
    • S
      perf stat: Introduce config stat.bpf-counter-events · 112cb561
      Song Liu 提交于
      Currently, to use BPF to aggregate perf event counters, the user uses
      --bpf-counters option. Enable "use bpf by default" events with a config
      option, stat.bpf-counter-events. Events with name in the option will use
      BPF.
      
      This also enables mixed BPF event and regular event in the same sesssion.
      For example:
      
         perf config stat.bpf-counter-events=instructions
         perf stat -e instructions,cs
      
      The second command will use BPF for "instructions" but not "cs".
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/r/20210425214333.1090950-4-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      112cb561
  5. 02 4月, 2021 1 次提交
  6. 24 3月, 2021 1 次提交
    • S
      perf stat: Introduce 'bperf' to share hardware PMCs with BPF · 7fac83aa
      Song Liu 提交于
      The perf tool uses performance monitoring counters (PMCs) to monitor
      system performance. The PMCs are limited hardware resources. For
      example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
      
      Modern data center systems use these PMCs in many different ways: system
      level monitoring, (maybe nested) container level monitoring, per process
      monitoring, profiling (in sample mode), etc. In some cases, there are
      more active perf_events than available hardware PMCs. To allow all
      perf_events to have a chance to run, it is necessary to do expensive
      time multiplexing of events.
      
      On the other hand, many monitoring tools count the common metrics
      (cycles, instructions). It is a waste to have multiple tools create
      multiple perf_events of "cycles" and occupy multiple PMCs.
      
      bperf tries to reduce such wastes by allowing multiple perf_events of
      "cycles" or "instructions" (at different scopes) to share PMUs. Instead
      of having each perf-stat session to read its own perf_events, bperf uses
      BPF programs to read the perf_events and aggregate readings to BPF maps.
      Then, the perf-stat session(s) reads the values from these BPF maps.
      
      Please refer to the comment before the definition of bperf_ops for the
      description of bperf architecture.
      
      bperf is off by default. To enable it, pass --bpf-counters option to
      perf-stat. bperf uses a BPF hashmap to share information about BPF
      programs and maps used by bperf. This map is pinned to bpffs. The
      default path is /sys/fs/bpf/perf_attr_map. The user could change the
      path with option --bpf-attr-map.
      
      Committer testing:
      
        # dmesg|grep "Performance Events" -A5
        [    0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
        [    0.225280] ... version:                0
        [    0.225280] ... bit width:              48
        [    0.225281] ... generic registers:      6
        [    0.225281] ... value mask:             0000ffffffffffff
        [    0.225281] ... max period:             00007fffffffffff
        #
        #  for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done
        [1] 2436231
        [2] 2436232
        [3] 2436233
        [4] 2436234
        [5] 2436235
        [6] 2436236
        # perf stat -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
               310,326,987      cycles                                                        (41.87%)
               236,143,290      instructions              #    0.76  insn per cycle           (41.87%)
      
               0.100800885 seconds time elapsed
      
        #
      
      We can see that the counters were enabled for this workload 41.87% of
      the time.
      
      Now with --bpf-counters:
      
        #  for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done
        [1] 2436514
        [2] 2436515
        [3] 2436516
        [4] 2436517
        [5] 2436518
        [6] 2436519
        [7] 2436520
        [8] 2436521
        [9] 2436522
        [10] 2436523
        [11] 2436524
        [12] 2436525
        [13] 2436526
        [14] 2436527
        [15] 2436528
        [16] 2436529
        [17] 2436530
        [18] 2436531
        [19] 2436532
        [20] 2436533
        [21] 2436534
        [22] 2436535
        [23] 2436536
        [24] 2436537
        [25] 2436538
        [26] 2436539
        [27] 2436540
        [28] 2436541
        [29] 2436542
        [30] 2436543
        [31] 2436544
        [32] 2436545
        #
        # ls -la /sys/fs/bpf/perf_attr_map
        -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map
        # bpftool map | grep bperf | wc -l
        64
        #
      
        # bpftool map | tail
        1265: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1266: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1267: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 996
        	pids perf(2436545)
        1268: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1269: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1270: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 997
        	pids perf(2436541)
        1285: array  name pid_iter.rodata  flags 0x480
        	key 4B  value 4B  max_entries 1  memlock 4096B
        	btf_id 1017  frozen
        	pids bpftool(2437504)
        1286: array  flags 0x0
        	key 4B  value 32B  max_entries 1  memlock 4096B
        #
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        8f f3 bc ca 00 00 00 00  80 fd 2a d1 4d 00 00 00
        80 fd 2a d1 4d 00 00 00
        value (CPU 22):
        7e d5 64 4d 00 00 00 00  a4 8a 2e ee 4d 00 00 00
        a4 8a 2e ee 4d 00 00 00
        value (CPU 23):
        a7 78 3e 06 01 00 00 00  b2 34 94 f6 4d 00 00 00
        b2 34 94 f6 4d 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        c6 8b d9 ca 00 00 00 00  20 c6 fc 83 4e 00 00 00
        20 c6 fc 83 4e 00 00 00
        value (CPU 22):
        9c b4 d2 4d 00 00 00 00  3e 0c df 89 4e 00 00 00
        3e 0c df 89 4e 00 00 00
        value (CPU 23):
        18 43 66 06 01 00 00 00  5b 69 ed 83 4e 00 00 00
        5b 69 ed 83 4e 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        f2 6e db ca 00 00 00 00  92 67 4c ba 4e 00 00 00
        92 67 4c ba 4e 00 00 00
        value (CPU 22):
        dc 8e e1 4d 00 00 00 00  d9 32 7a c5 4e 00 00 00
        d9 32 7a c5 4e 00 00 00
        value (CPU 23):
        bd 2b 73 06 01 00 00 00  7c 73 87 bf 4e 00 00 00
        7c 73 87 bf 4e 00 00 00
        Found 1 element
        #
      
        # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
             119,410,122      cycles
             152,105,479      instructions              #    1.27  insn per cycle
      
             0.101395093 seconds time elapsed
      
        #
      
      See? We had the counters enabled all the time.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fac83aa
  7. 07 3月, 2021 1 次提交
    • J
      perf stat: Fix wrong skipping for per-die aggregation · 034f7ee1
      Jin Yao 提交于
      Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
      --per-die aggregation yet.
      
      One issue is found in check_per_pkg() for uncore events running on AP
      system. On cascade Lake-AP, we have:
      
      S0-D0
      S0-D1
      S1-D0
      S1-D1
      
      But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the mask
      bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't check
      die_id. So the counting for S0-D1 and S1-D1 are set to zero.  That's not
      correct.
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001460963 S0-D0           1            1317376 Bytes llc_misses.mem_read
           1.001460963 S0-D1           1             998016 Bytes llc_misses.mem_read
           1.001460963 S1-D0           1             970496 Bytes llc_misses.mem_read
           1.001460963 S1-D1           1            1291264 Bytes llc_misses.mem_read
           2.003488021 S0-D0           1            1082048 Bytes llc_misses.mem_read
           2.003488021 S0-D1           1            1919040 Bytes llc_misses.mem_read
           2.003488021 S1-D0           1             890752 Bytes llc_misses.mem_read
           2.003488021 S1-D1           1            2380800 Bytes llc_misses.mem_read
           3.005613270 S0-D0           1            1126080 Bytes llc_misses.mem_read
           3.005613270 S0-D1           1            2898176 Bytes llc_misses.mem_read
           3.005613270 S1-D0           1             870912 Bytes llc_misses.mem_read
           3.005613270 S1-D1           1            3388608 Bytes llc_misses.mem_read
           4.007627598 S0-D0           1            1124608 Bytes llc_misses.mem_read
           4.007627598 S0-D1           1            3884416 Bytes llc_misses.mem_read
           4.007627598 S1-D0           1             921088 Bytes llc_misses.mem_read
           4.007627598 S1-D1           1            4451840 Bytes llc_misses.mem_read
           5.001479927 S0-D0           1             963328 Bytes llc_misses.mem_read
           5.001479927 S0-D1           1            4831936 Bytes llc_misses.mem_read
           5.001479927 S1-D0           1             895104 Bytes llc_misses.mem_read
           5.001479927 S1-D1           1            5496640 Bytes llc_misses.mem_read
      
      From above output, we can see S0-D1 and S1-D1 don't report the interval
      values, they are continued to grow. That's because check_per_pkg()
      wrongly decides to use zero counts for S0-D1 and S1-D1.
      
      So in check_per_pkg(), we should use hashmap(socket,die) to decide if
      the cpu counts needs to skip. Only considering socket is not enough.
      
      Now with this patch,
      
        root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
           1.001586691 S0-D0           1            1229440 Bytes llc_misses.mem_read
           1.001586691 S0-D1           1             976832 Bytes llc_misses.mem_read
           1.001586691 S1-D0           1             938304 Bytes llc_misses.mem_read
           1.001586691 S1-D1           1            1227328 Bytes llc_misses.mem_read
           2.003776312 S0-D0           1            1586752 Bytes llc_misses.mem_read
           2.003776312 S0-D1           1             875392 Bytes llc_misses.mem_read
           2.003776312 S1-D0           1             855616 Bytes llc_misses.mem_read
           2.003776312 S1-D1           1             949376 Bytes llc_misses.mem_read
           3.006512788 S0-D0           1            1338880 Bytes llc_misses.mem_read
           3.006512788 S0-D1           1             920064 Bytes llc_misses.mem_read
           3.006512788 S1-D0           1             877184 Bytes llc_misses.mem_read
           3.006512788 S1-D1           1            1020736 Bytes llc_misses.mem_read
           4.008895291 S0-D0           1             926592 Bytes llc_misses.mem_read
           4.008895291 S0-D1           1             906368 Bytes llc_misses.mem_read
           4.008895291 S1-D0           1             892224 Bytes llc_misses.mem_read
           4.008895291 S1-D1           1             987712 Bytes llc_misses.mem_read
           5.001590993 S0-D0           1             962624 Bytes llc_misses.mem_read
           5.001590993 S0-D1           1             912512 Bytes llc_misses.mem_read
           5.001590993 S1-D0           1             891200 Bytes llc_misses.mem_read
           5.001590993 S1-D1           1             978432 Bytes llc_misses.mem_read
      
      On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
      is not changed.
      Reported-by: NYing Huang <ying.huang@intel.com>
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Huang <ying.huang@intel.com>
      Link: http://lore.kernel.org/lkml/20210128013417.25597-1-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      034f7ee1
  8. 09 2月, 2021 1 次提交
    • K
      perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT · ea8d0ed6
      Kan Liang 提交于
      The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
      PERF_SAMPLE_WEIGHT sample type. Users can apply either the
      PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample
      type to retrieve the sample weight, but they cannot apply both sample
      types simultaneously.
      
      The new sample type shares the same space as the PERF_SAMPLE_WEIGHT
      sample type. The lower 32 bits are exactly the same for both sample
      type. The higher 32 bits may be different for different architecture.
      
      Add arch specific arch_evsel__set_sample_weight() to set the new sample
      type for X86. Only store the lower 32 bits for the sample->weight if the
      new sample type is applied. In practice, no memory access could last
      than 4G cycles. No data will be lost.
      
      If the kernel doesn't support the new sample type. Fall back to the
      PERF_SAMPLE_WEIGHT sample type.
      
      There is no impact for other architectures.
      
      Committer notes:
      
      Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core
      but not upstream yet.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea8d0ed6
  9. 21 1月, 2021 2 次提交
  10. 18 12月, 2020 1 次提交
    • A
      perf evsel: Emit warning about kernel not supporting the data page size sample_type bit · 456ef4c1
      Arnaldo Carvalho de Melo 提交于
      Before we had this unhelpful message:
      
        $ perf record --data-page-size sleep 1
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles:u).
        /bin/dmesg | grep -i perf may provide additional information.
        $
      
      Add support to the perf_missing_features variable to remember what
      caused evsel__open() to fail and then use that information in
      evsel__open_strerror().
      
        $ perf record --data-page-size sleep 1
        Error:
        Asking for the data page size isn't supported by this kernel.
        $
      
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lore.kernel.org/lkml/20201207170759.GB129853@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      456ef4c1
  11. 28 9月, 2020 1 次提交
  12. 03 7月, 2020 1 次提交
    • A
      perf record: Fix duplicated sideband events with Intel PT system wide tracing · 442ad225
      Adrian Hunter 提交于
      Commit 0a892c1c ("perf record: Add dummy event during system wide
      synthesis") reveals an issue with Intel PT system wide tracing.
      Specifically that Intel PT already adds a dummy tracking event, and it
      is not the first event.  Adding another dummy tracking event causes
      duplicated sideband events.  Fix by checking for an existing dummy
      tracking event first.
      
      Example showing duplicated switch events:
      
       Before:
      
         # perf record -a -e intel_pt//u uname
         Linux
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.895 MB perf.data ]
         # perf script --no-itrace --show-switch-events | head
                  swapper     0 [007]  6390.516222: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:    11/11
                  swapper     0 [007]  6390.516222: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:    11/11
                rcu_sched    11 [007]  6390.516223: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
                rcu_sched    11 [007]  6390.516224: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
                rcu_sched    11 [007]  6390.516227: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
                rcu_sched    11 [007]  6390.516227: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
                  swapper     0 [007]  6390.516228: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    11/11
                  swapper     0 [007]  6390.516228: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    11/11
                  swapper     0 [002]  6390.516415: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:  5556/5559
                  swapper     0 [002]  6390.516416: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:  5556/5559
      
       After:
      
         # perf record -a -e intel_pt//u uname
         Linux
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.868 MB perf.data ]
         #  perf script --no-itrace --show-switch-events | head
                  swapper     0 [005]  6450.567013: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:  7179/7181
                     perf  7181 [005]  6450.567014: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
                     perf  7181 [005]  6450.567028: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
                  swapper     0 [005]  6450.567029: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:  7179/7181
                  swapper     0 [005]  6450.571699: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:    11/11
                rcu_sched    11 [005]  6450.571700: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
                rcu_sched    11 [005]  6450.571702: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
                  swapper     0 [005]  6450.571703: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    11/11
                  swapper     0 [005]  6450.579703: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:    11/11
                rcu_sched    11 [005]  6450.579704: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200629091955.17090-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      442ad225
  13. 30 5月, 2020 1 次提交
    • S
      perf tools: Add optional support for libpfm4 · 70943490
      Stephane Eranian 提交于
      This patch links perf with the libpfm4 library if it is available and
      LIBPFM4 is passed to the build. The libpfm4 library contains hardware
      event tables for all processors supported by perf_events. It is a helper
      library that helps convert from a symbolic event name to the event
      encoding required by the underlying kernel interface. This library is
      open-source and available from: http://perfmon2.sf.net.
      
      With this patch, it is possible to specify full hardware events by name.
      Hardware filters are also supported. Events must be specified via the
      --pfm-events and not -e option. Both options are active at the same time
      and it is possible to mix and match:
      
        $ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles ....
      
      One needs to explicitely ask for its inclusion by using the LIBPFM4 make
      command line option, ie its opt-in rather than opt-out of feature
      detection and build support.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lore.kernel.org/lkml/20200505182943.218248-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70943490
  14. 28 5月, 2020 3 次提交
  15. 06 5月, 2020 14 次提交