1. 29 4月, 2021 38 次提交
    • J
      perf Documentation: Document intel-hybrid support · 2750ce1d
      Jin Yao 提交于
      Add some words and examples to help understanding of
      Intel hybrid perf support.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-27-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2750ce1d
    • J
      perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid · a37f3b88
      Jin Yao 提交于
      Currently we don't support shadow stat for hybrid.
      
        root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
            12,883,109,591      cpu_core/cycles/
             6,405,163,221      cpu_atom/cycles/
               555,553,778      cpu_core/instructions/
               841,158,734      cpu_atom/instructions/
      
               1.002644773 seconds time elapsed
      
      Now there is no shadow stat 'insn per cycle' reported. We will support
      it later and now just skip the 'perf stat metrics (shadow stat) test'.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-26-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a37f3b88
    • J
      perf tests: Support 'Convert perf time to TSC' test for hybrid · d9da6f70
      Jin Yao 提交于
      Since for "cycles:u' on hybrid platform, it creates two "cycles".  So
      the second evsel in evlist also needs initialization.
      
      With this patch,
      
        # ./perf test 71
        71: Convert perf time to TSC                                        : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-25-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d9da6f70
    • J
      perf tests: Support 'Session topology' test for hybrid · c1020388
      Jin Yao 提交于
      Force to create one event "cpu_core/cycles/" by default, otherwise in
      evlist__valid_sample_type, the checking of 'if (evlist->core.nr_entries
      == 1)' would be failed.
      
        # ./perf test 41
        41: Session topology                                                : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-24-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1020388
    • J
      perf tests: Support 'Parse and process metrics' test for hybrid · 6081e876
      Jin Yao 提交于
      Some events are not supported. Only pick up some cases for hybrid.
      
        # ./perf test 68
        68: Parse and process metrics                                       : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-23-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6081e876
    • J
      perf tests: Support 'Track with sched_switch' test for hybrid · 43eb05d0
      Jin Yao 提交于
      Since for "cycles:u' on hybrid platform, it creates two "cycles".
      So the number of events in evlist is not expected in next test
      steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.
      
        # ./perf test 35
        35: Track with sched_switch                                         : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-22-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      43eb05d0
    • J
      perf tests: Skip 'Setup struct perf_event_attr' test for hybrid · f15da0b1
      Jin Yao 提交于
      For hybrid, the attr.type consists of pmu type id + original type.
      There will be much changes for this test. Now we temporarily
      skip this test case and TODO in future.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-21-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f15da0b1
    • J
      perf tests: Add hybrid cases for 'Roundtrip evsel->name' test · afff9f31
      Jin Yao 提交于
      Since for one hw event, two hybrid events are created.
      
      For example,
      
      evsel->idx      evsel__name(evsel)
      0               cycles
      1               cycles
      2               instructions
      3               instructions
      ...
      
      So for comparing the evsel name on hybrid, the evsel->idx
      needs to be divided by 2.
      
        # ./perf test 14
        14: Roundtrip evsel->name                                           : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-20-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      afff9f31
    • J
      perf tests: Add hybrid cases for 'Parse event definition strings' test · 2541cb63
      Jin Yao 提交于
      Add basic hybrid test cases for 'Parse event definition strings' test.
      
        # perf test 6
         6: Parse event definition strings                                  : Ok
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-19-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2541cb63
    • J
      perf record: Uniquify hybrid event name · 91c0f5ec
      Jin Yao 提交于
      For perf-record, it would be useful to tell user the pmu which the
      event belongs to.
      
      For example,
      
        # perf record -a -- sleep 1
        # perf report
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 106  of event 'cpu_core/cycles/'
        # Event count (approx.): 22043448
        #
        # Overhead  Command       Shared Object            Symbol
        # ........  ............  .......................  ............................
        #
        ...
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-18-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91c0f5ec
    • J
      perf stat: Warn group events from different hybrid PMU · 660e533e
      Jin Yao 提交于
      If a group has events which are from different hybrid PMUs,
      shows a warning:
      
      "WARNING: events in group from different hybrid PMUs!"
      
      This is to remind the user not to put the core event and atom
      event into one group.
      
      Next, just disable grouping.
      
        # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
        WARNING: events in group from different hybrid PMUs!
        WARNING: grouped events cpus do not match, disabling group:
          anon group { cpu_core/cycles/, cpu_atom/cycles/ }
      
         Performance counter stats for 'system wide':
      
                 5,438,125      cpu_core/cycles/
                 3,914,586      cpu_atom/cycles/
      
               1.004250966 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-17-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      660e533e
    • J
      perf stat: Filter out unmatched aggregation for hybrid event · 92637cc7
      Jin Yao 提交于
      perf-stat has supported some aggregation modes, such as --per-core,
      --per-socket and etc. While for hybrid event, it may only available
      on part of cpus. So for --per-core, we need to filter out the
      unavailable cores, for --per-socket, filter out the unavailable
      sockets, and so on.
      
      Before:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            479,530      cpu_core/cycles/
        S0-D0-C4           2            175,007      cpu_core/cycles/
        S0-D0-C8           2            166,240      cpu_core/cycles/
        S0-D0-C12          2            704,673      cpu_core/cycles/
        S0-D0-C16          2            865,835      cpu_core/cycles/
        S0-D0-C20          2          2,958,461      cpu_core/cycles/
        S0-D0-C24          2            163,988      cpu_core/cycles/
        S0-D0-C28          2            164,729      cpu_core/cycles/
        S0-D0-C32          0      <not counted>      cpu_core/cycles/
        S0-D0-C33          0      <not counted>      cpu_core/cycles/
        S0-D0-C34          0      <not counted>      cpu_core/cycles/
        S0-D0-C35          0      <not counted>      cpu_core/cycles/
        S0-D0-C36          0      <not counted>      cpu_core/cycles/
        S0-D0-C37          0      <not counted>      cpu_core/cycles/
        S0-D0-C38          0      <not counted>      cpu_core/cycles/
        S0-D0-C39          0      <not counted>      cpu_core/cycles/
      
               1.003597211 seconds time elapsed
      
      After:
      
        # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
        S0-D0-C0           2            210,428      cpu_core/cycles/
        S0-D0-C4           2            444,830      cpu_core/cycles/
        S0-D0-C8           2            435,241      cpu_core/cycles/
        S0-D0-C12          2            423,976      cpu_core/cycles/
        S0-D0-C16          2            859,350      cpu_core/cycles/
        S0-D0-C20          2          1,559,589      cpu_core/cycles/
        S0-D0-C24          2            163,924      cpu_core/cycles/
        S0-D0-C28          2            376,610      cpu_core/cycles/
      
               1.003621290 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Co-developed-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      92637cc7
    • J
      perf stat: Add default hybrid events · ac2dc29e
      Jin Yao 提交于
      Previously if '-e' is not specified in perf stat, some software events
      and hardware events are added to evlist by default.
      
      Before:
      
        # perf stat -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
                 24,044.40 msec cpu-clock                 #   23.946 CPUs utilized
                        99      context-switches          #    4.117 /sec
                        24      cpu-migrations            #    0.998 /sec
                         3      page-faults               #    0.125 /sec
                 7,000,244      cycles                    #    0.000 GHz
                 2,955,024      instructions              #    0.42  insn per cycle
                   608,941      branches                  #   25.326 K/sec
                    31,991      branch-misses             #    5.25% of all branches
      
               1.004106859 seconds time elapsed
      
      Among the events, cycles, instructions, branches and branch-misses
      are hardware events.
      
      One hybrid platform, two hardware events are created for one
      hardware event.
      
      cpu_core/cycles/,
      cpu_atom/cycles/,
      cpu_core/instructions/,
      cpu_atom/instructions/,
      cpu_core/branches/,
      cpu_atom/branches/,
      cpu_core/branch-misses/,
      cpu_atom/branch-misses/
      
      These events would be added to evlist on hybrid platform.
      
      Since parse_events() has been supported to create two hardware events
      for one event on hybrid platform, so we just use parse_events(evlist,
      "cycles,instructions,branches,branch-misses") to create the default
      events and add them to evlist.
      
      After:
      
        # perf stat -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
                 24,043.99 msec cpu-clock                 #   23.991 CPUs utilized
                       139      context-switches          #    5.781 /sec
                        25      cpu-migrations            #    1.040 /sec
                         6      page-faults               #    0.250 /sec
                10,381,751      cpu_core/cycles/          #  431.782 K/sec
                 1,264,216      cpu_atom/cycles/          #   52.579 K/sec
                 3,406,958      cpu_core/instructions/    #  141.697 K/sec
                   414,588      cpu_atom/instructions/    #   17.243 K/sec
                   705,149      cpu_core/branches/        #   29.327 K/sec
                    82,358      cpu_atom/branches/        #    3.425 K/sec
                    40,821      cpu_core/branch-misses/   #    1.698 K/sec
                     9,086      cpu_atom/branch-misses/   #  377.891 /sec
      
               1.002228863 seconds time elapsed
      
      We can see two events are created for one hardware event.
      
      One TODO is, the shadow stats looks a bit different, now it's just
      'M/sec'.
      
      The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
      need to be improved in future if we want to get the original shadow
      stats.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-15-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ac2dc29e
    • J
      perf record: Create two hybrid 'cycles' events by default · b53a0755
      Jin Yao 提交于
      When evlist is empty, for example no '-e' specified in perf record,
      one default 'cycles' event is added to evlist.
      
      While on hybrid platform, it needs to create two default 'cycles'
      events. One is for cpu_core, the other is for cpu_atom.
      
      This patch actually calls evsel__new_cycles() two times to create
      two 'cycles' events.
      
        # ./perf record -vv -a -- sleep 1
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
        sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
        sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
        sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
        sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
        sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
        sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
        sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
        sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
        sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
        sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
        sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          freq                             1
          precise_ip                       3
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
        sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
        sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
        sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
        sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
        sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
        sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
        ------------------------------------------------------------
      
      We have to create evlist-hybrid.c otherwise due to the symbol
      dependency the perf test python would be failed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-14-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b53a0755
    • J
      perf parse-events: Support event inside hybrid pmu · 5e4edd1f
      Jin Yao 提交于
      On hybrid platform, user may want to enable events on one pmu.
      
      Following syntax are supported:
      
      cpu_core/<event>/
      cpu_atom/<event>/
      
      But the syntax doesn't work for cache event.
      
      Before:
      
        # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
        event syntax error: 'cpu_core/LLC-loads/'
                                      \___ unknown term 'LLC-loads' for pmu 'cpu_core'
      
      Cache events are a bit complex. We can't create aliases for them.
      We use another solution. For example, if we use "cpu_core/LLC-loads/",
      in parse_events_add_pmu(), term->config is "LLC-loads".
      
      Then we create a new parser to scan "LLC-loads". The
      parse_events_add_cache() would be called during parsing.
      The parse_state->hybrid_pmu_name is used to identify the pmu
      where the event should be enabled on.
      
      After:
      
        # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
      
         Performance counter stats for 'system wide':
      
                    24,593      cpu_core/LLC-loads/
      
               1.003911601 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-13-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5e4edd1f
    • J
      perf parse-events: Compare with hybrid pmu name · c93afadc
      Jin Yao 提交于
      On hybrid platform, user may want to enable event only on one pmu.
      Following syntax will be supported:
      
      cpu_core/<event>/
      cpu_atom/<event>/
      
      For hardware event, hardware cache event and raw event, two events
      are created by default. We pass the specified pmu name in parse_state
      and it would be checked before event creation. So next only the
      event with the specified pmu would be created.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-12-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c93afadc
    • J
      perf parse-events: Create two hybrid raw events · 94da591b
      Jin Yao 提交于
      On hybrid platform, same raw event is possible to be available
      on both cpu_core pmu and cpu_atom pmu. It's supported to create
      two raw events for one event encoding. For raw events, the
      attr.type is PMU type.
      
        # perf stat -e r3c -a -vv -- sleep 1
        Control descriptor is not initialized
        ------------------------------------------------------------
        perf_event_attr:
          type                             4
          size                             120
          config                           0x3c
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          type                             4
          size                             120
          config                           0x3c
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
        ------------------------------------------------------------
        perf_event_attr:
          type                             8
          size                             120
          config                           0x3c
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          type                             8
          size                             120
          config                           0x3c
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
        r3c: 0: 434449 1001412521 1001412521
        r3c: 1: 173162 1001482031 1001482031
        r3c: 2: 231710 1001524974 1001524974
        r3c: 3: 110012 1001563523 1001563523
        r3c: 4: 191517 1001593221 1001593221
        r3c: 5: 956458 1001628147 1001628147
        r3c: 6: 416969 1001715626 1001715626
        r3c: 7: 1047527 1001596650 1001596650
        r3c: 8: 103877 1001633520 1001633520
        r3c: 9: 70571 1001637898 1001637898
        r3c: 10: 550284 1001714398 1001714398
        r3c: 11: 1257274 1001738349 1001738349
        r3c: 12: 107797 1001801432 1001801432
        r3c: 13: 67471 1001836281 1001836281
        r3c: 14: 286782 1001923161 1001923161
        r3c: 15: 815509 1001952550 1001952550
        r3c: 0: 95994 1002071117 1002071117
        r3c: 1: 105570 1002142438 1002142438
        r3c: 2: 115921 1002189147 1002189147
        r3c: 3: 72747 1002238133 1002238133
        r3c: 4: 103519 1002276753 1002276753
        r3c: 5: 121382 1002315131 1002315131
        r3c: 6: 80298 1002248050 1002248050
        r3c: 7: 466790 1002278221 1002278221
        r3c: 6821369 16026754282 16026754282
        r3c: 1162221 8017758990 8017758990
      
         Performance counter stats for 'system wide':
      
                 6,821,369      cpu_core/r3c/
                 1,162,221      cpu_atom/r3c/
      
               1.002289965 seconds time elapsed
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-11-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      94da591b
    • J
      perf parse-events: Create two hybrid cache events · 30def61f
      Jin Yao 提交于
      For cache events, they have pre-defined configs. The kernel needs
      to know where the cache event comes from (e.g. from cpu_core pmu
      or from cpu_atom pmu). But the perf type PERF_TYPE_HW_CACHE
      can't carry pmu information.
      
      Now the type PERF_TYPE_HW_CACHE is extended to be PMU aware type.
      The PMU type ID is stored at attr.config[63:32].
      
      When enabling a hybrid cache event without specified pmu, such as,
      'perf stat -e LLC-loads -a', two events are created
      automatically. One is for atom, the other is for core.
      
        # perf stat -e LLC-loads -a -vv -- sleep 1
        Control descriptor is not initialized
        ------------------------------------------------------------
        perf_event_attr:
          type                             3
          size                             120
          config                           0x400000002
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          type                             3
          size                             120
          config                           0x400000002
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
        ------------------------------------------------------------
        perf_event_attr:
          type                             3
          size                             120
          config                           0x800000002
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          type                             3
          size                             120
          config                           0x800000002
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
        LLC-loads: 0: 1507 1001800280 1001800280
        LLC-loads: 1: 666 1001812250 1001812250
        LLC-loads: 2: 3353 1001813453 1001813453
        LLC-loads: 3: 514 1001848795 1001848795
        LLC-loads: 4: 627 1001952832 1001952832
        LLC-loads: 5: 4399 1001451154 1001451154
        LLC-loads: 6: 1240 1001481052 1001481052
        LLC-loads: 7: 478 1001520348 1001520348
        LLC-loads: 8: 691 1001551236 1001551236
        LLC-loads: 9: 310 1001578945 1001578945
        LLC-loads: 10: 1018 1001594354 1001594354
        LLC-loads: 11: 3656 1001622355 1001622355
        LLC-loads: 12: 882 1001661416 1001661416
        LLC-loads: 13: 506 1001693963 1001693963
        LLC-loads: 14: 3547 1001721013 1001721013
        LLC-loads: 15: 1399 1001734818 1001734818
        LLC-loads: 0: 1314 1001793826 1001793826
        LLC-loads: 1: 2857 1001752764 1001752764
        LLC-loads: 2: 646 1001830694 1001830694
        LLC-loads: 3: 1612 1001864861 1001864861
        LLC-loads: 4: 2244 1001912381 1001912381
        LLC-loads: 5: 1255 1001943889 1001943889
        LLC-loads: 6: 4624 1002021109 1002021109
        LLC-loads: 7: 2703 1001959302 1001959302
        LLC-loads: 24793 16026838264 16026838264
        LLC-loads: 17255 8015078826 8015078826
      
         Performance counter stats for 'system wide':
      
                    24,793      cpu_core/LLC-loads/
                    17,255      cpu_atom/LLC-loads/
      
               1.001970988 seconds time elapsed
      
      0x4 in 0x400000002 indicates the cpu_core pmu.
      0x8 in 0x800000002 indicates the cpu_atom pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-10-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      30def61f
    • J
      perf parse-events: Create two hybrid hardware events · 9cbfa2f6
      Jin Yao 提交于
      Current hardware events has special perf types PERF_TYPE_HARDWARE.
      But it doesn't pass the PMU type in the user interface. For a hybrid
      system, the perf kernel doesn't know which PMU the events belong to.
      
      So now this type is extended to be PMU aware type. The PMU type ID
      is stored at attr.config[63:32].
      
      PMU type ID is retrieved from sysfs.
      
        root@lkp-adl-d01:/sys/devices/cpu_atom# cat type
        8
      
        root@lkp-adl-d01:/sys/devices/cpu_core# cat type
        4
      
      When enabling a hybrid hardware event without specified pmu, such as,
      'perf stat -e cycles -a', two events are created automatically. One
      is for atom, the other is for core.
      
        # perf stat -e cycles -a -vv -- sleep 1
        Control descriptor is not initialized
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x400000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
        ------------------------------------------------------------
        ...
        ------------------------------------------------------------
        perf_event_attr:
          size                             120
          config                           0x800000000
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
          disabled                         1
          inherit                          1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
        cycles: 0: 836272 1001525722 1001525722
        cycles: 1: 628564 1001580453 1001580453
        cycles: 2: 872693 1001605997 1001605997
        cycles: 3: 70417 1001641369 1001641369
        cycles: 4: 88593 1001726722 1001726722
        cycles: 5: 470495 1001752993 1001752993
        cycles: 6: 484733 1001840440 1001840440
        cycles: 7: 1272477 1001593105 1001593105
        cycles: 8: 209185 1001608616 1001608616
        cycles: 9: 204391 1001633962 1001633962
        cycles: 10: 264121 1001661745 1001661745
        cycles: 11: 826104 1001689904 1001689904
        cycles: 12: 89935 1001728861 1001728861
        cycles: 13: 70639 1001756757 1001756757
        cycles: 14: 185266 1001784810 1001784810
        cycles: 15: 171094 1001825466 1001825466
        cycles: 0: 129624 1001854843 1001854843
        cycles: 1: 122533 1001840421 1001840421
        cycles: 2: 90055 1001882506 1001882506
        cycles: 3: 139607 1001896463 1001896463
        cycles: 4: 141791 1001907838 1001907838
        cycles: 5: 530927 1001883880 1001883880
        cycles: 6: 143246 1001852529 1001852529
        cycles: 7: 667769 1001872626 1001872626
        cycles: 6744979 16026956922 16026956922
        cycles: 1965552 8014991106 8014991106
      
         Performance counter stats for 'system wide':
      
                 6,744,979      cpu_core/cycles/
                 1,965,552      cpu_atom/cycles/
      
               1.001882711 seconds time elapsed
      
      0x4 in 0x400000000 indicates the cpu_core pmu.
      0x8 in 0x800000000 indicates the cpu_atom pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-9-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9cbfa2f6
    • J
      perf stat: Uniquify hybrid event name · 12279429
      Jin Yao 提交于
      It would be useful to let user know the pmu which the event belongs to.
      perf-stat has supported '--no-merge' option and it can print the pmu
      name after the event name, such as:
      
      "cycles [cpu_core]"
      
      Now this option is enabled by default for hybrid platform but change
      the format to:
      
      "cpu_core/cycles/"
      
      If user configs the name, we still use the user specified name.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12279429
    • J
      perf pmu: Add hybrid helper functions · c5a26ea4
      Jin Yao 提交于
      The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
      can be used to identify the hybrid platform and return the found
      hybrid cpu pmu. All the detected hybrid pmus have been saved in
      'perf_pmu__hybrid_pmus' list. So we just need to search this list.
      
      perf_pmu__hybrid_type_to_pmu converts the user specified string
      to hybrid pmu name. This is used to support the '--cputype' option
      in next patches.
      
      perf_pmu__has_hybrid checks the existing of hybrid pmu. Note that,
      we have to define it in pmu.c (make pmu-hybrid.c no more symbol
      dependency), otherwise perf test python would be failed.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-7-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5a26ea4
    • J
      perf pmu: Save detected hybrid pmus to a global pmu list · 44462430
      Jin Yao 提交于
      We identify the cpu_core pmu and cpu_atom pmu by explicitly
      checking following files:
      
      For cpu_core, checks:
      "/sys/bus/event_source/devices/cpu_core/cpus"
      
      For cpu_atom, checks:
      "/sys/bus/event_source/devices/cpu_atom/cpus"
      
      If the 'cpus' file exists and it has data, the pmu exists.
      
      But in order not to hardcode the "cpu_core" and "cpu_atom",
      and make the code in a generic way.
      
      So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the
      hybrid pmu exists. All the detected hybrid pmus are linked to a global
      list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the
      list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-6-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44462430
    • J
      perf pmu: Save pmu name · 32705de7
      Jin Yao 提交于
      On hybrid platform, one event is available on one pmu
      (such as, available on cpu_core or on cpu_atom).
      
      This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
      Then next we can know the pmu which the event can be enabled on.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-5-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32705de7
    • J
      perf pmu: Simplify arguments of __perf_pmu__new_alias · eab35953
      Jin Yao 提交于
      Simplify the arguments of __perf_pmu__new_alias() by passing the whole
      'struct pme_event' pointer.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-4-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eab35953
    • J
      perf jevents: Support unit value "cpu_core" and "cpu_atom" · 6b64833b
      Jin Yao 提交于
      For some Intel platforms, such as Alderlake, which is a hybrid platform
      and it consists of atom cpu and core cpu. Each cpu has dedicated event
      list. Part of events are available on core cpu, part of events are
      available on atom cpu.
      
      The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
      json is added with a new field "Unit" to indicate which pmu the event
      is available on.
      
      For example, one event in cache.json,
      
          {
              "BriefDescription": "Counts the number of load ops retired that",
              "CollectPEBSRecord": "2",
              "Counter": "0,1,2,3",
              "EventCode": "0xd2",
              "EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
              "PEBScounters": "0,1,2,3",
              "SampleAfterValue": "1000003",
              "UMask": "0x80",
              "Unit": "cpu_atom"
          },
      
      The unit "cpu_atom" indicates this event is only available on "cpu_atom".
      
      In generated pmu-events.c, we can see:
      
      {
              .name = "mem_load_uops_retired_misc.mmio",
              .event = "period=1000003,umask=0x80,event=0xd2",
              .desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
              .topic = "cache",
              .pmu = "cpu_atom",
      },
      
      But if without this patch, the "uncore_" prefix is added before "cpu_atom",
      such as:
              .pmu = "uncore_cpu_atom"
      
      That would be a wrong pmu.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-3-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6b64833b
    • J
      tools headers uapi: Update tools's copy of linux/perf_event.h · 41273611
      Jin Yao 提交于
      To get the changes in:
      
      Liang Kan's patch
      
        55bcf6ef ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
      
      Kan's patch is in the tip/perf/core branch.
      
      So the next perf tool patches need this interface for hybrid support.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427070139.25256-2-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      41273611
    • N
      perf report: Print percentage of each event statistics · 462f57db
      Namhyung Kim 提交于
      It's sometimes useful to see how many samples vs other events in the
      data file with percent values.
      
        $ perf report --stat
      
        Aggregated stats:
                   TOTAL events:      20064
                    MMAP events:        239  ( 1.2%)
                    COMM events:       1518  ( 7.6%)
                    EXIT events:          1  ( 0.0%)
                    FORK events:       1517  ( 7.6%)
                  SAMPLE events:       4015  (20.0%)
                   MMAP2 events:      12769  (63.6%)
          FINISHED_ROUND events:          2  ( 0.0%)
              THREAD_MAP events:          1  ( 0.0%)
                 CPU_MAP events:          1  ( 0.0%)
               TIME_CONV events:          1  ( 0.0%)
        cycles stats:
                  SAMPLE events:       2475
        instructions stats:
                  SAMPLE events:       1540
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-7-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      462f57db
    • N
      perf report: Make --skip-empty as default · 8f08cf33
      Namhyung Kim 提交于
      so that the compact output is shown by default.  Also add 'report.skip-empty'
      config option to override the default.  Users can also use --no-skip-empty
      command line option to change the behavior anytime.
      
      Committer testing:
      
        $ perf report --stat
      
        Aggregated stats:
                   TOTAL events:         19
                    COMM events:          2
                    EXIT events:          1
                  SAMPLE events:          8
                   MMAP2 events:          4
          FINISHED_ROUND events:          1
              THREAD_MAP events:          1
                 CPU_MAP events:          1
               TIME_CONV events:          1
        cycles:u stats:
                  SAMPLE events:          8
        $ perf config report.skip-empty=false
        $ perf report --stat
      
        Aggregated stats:
                   TOTAL events:         19
                    MMAP events:          0
                    LOST events:          0
                    COMM events:          2
                    EXIT events:          1
                THROTTLE events:          0
              UNTHROTTLE events:          0
                    FORK events:          0
                    READ events:          0
                  SAMPLE events:          8
                   MMAP2 events:          4
                     AUX events:          0
            ITRACE_START events:          0
            LOST_SAMPLES events:          0
                  SWITCH events:          0
         SWITCH_CPU_WIDE events:          0
              NAMESPACES events:          0
                 KSYMBOL events:          0
               BPF_EVENT events:          0
                  CGROUP events:          0
               TEXT_POKE events:          0
                    ATTR events:          0
              EVENT_TYPE events:          0
            TRACING_DATA events:          0
                BUILD_ID events:          0
          FINISHED_ROUND events:          1
                ID_INDEX events:          0
           AUXTRACE_INFO events:          0
                AUXTRACE events:          0
          AUXTRACE_ERROR events:          0
              THREAD_MAP events:          1
                 CPU_MAP events:          1
             STAT_CONFIG events:          0
                    STAT events:          0
              STAT_ROUND events:          0
            EVENT_UPDATE events:          0
               TIME_CONV events:          1
                 FEATURE events:          0
              COMPRESSED events:          0
        cycles:u stats:
                  SAMPLE events:          8
        $ perf config report.skip-empty
        report.skip-empty=false
        $
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-6-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8f08cf33
    • N
      perf report: Add --skip-empty option to suppress 0 event stat · 2775de0b
      Namhyung Kim 提交于
      To make the output more readable, I think it's better to remove 0's in
      the output.  Also the dummy event has no event stats so it just wasts
      the space.  Let's use the --skip-empty option to suppress it.
      
        $ perf report --stat --skip-empty
      
        Aggregated stats:
                   TOTAL events:      16530
                    MMAP events:        226
                    COMM events:       1596
                    EXIT events:          2
                THROTTLE events:        121
              UNTHROTTLE events:        117
                    FORK events:       1595
                  SAMPLE events:        719
                   MMAP2 events:      12147
                  CGROUP events:          2
          FINISHED_ROUND events:          2
              THREAD_MAP events:          1
                 CPU_MAP events:          1
               TIME_CONV events:          1
        cycles stats:
                  SAMPLE events:        719
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-5-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2775de0b
    • N
      perf report: Show event sample counts in --stat output · 55f75444
      Namhyung Kim 提交于
      To make the output identical with perf report -D, it needs to show
      per-event sample counts along with the aggregated stat  at the end.
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      55f75444
    • N
      perf hists: Split hists_stats from events_stats · 0f0abbac
      Namhyung Kim 提交于
      Each struct hists have events_stats but most of the fields were not
      used.  It's to count number of samples and periods whether filtered or
      not.  And other fields are used only by evlist.
      
      So it'd be better to split hists_stats and events_stats to reduce
      wasted memory in the struct hists.  This makes the output of event
      statistics in the perf report compact by skipping 0 events in each
      evsel/hists.
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f0abbac
    • N
      perf top: Use evlist->events_stat to count events · bf8f8587
      Namhyung Kim 提交于
      It's mainly to count lost events for the warning so it should be ok
      to use the evlist->stats instead.  This is needed for changes in the
      next commit.
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210427013717.1651674-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bf8f8587
    • N
      perf data: Add JSON export · d0713d4c
      Nicholas Fraser 提交于
      This adds a feature to export perf data to JSON.
      
      The resolved symbols are exported into the JSON so that external tools
      don't need to load the dsos themselves (or even have access to them at
      all.) This makes it easy to load and analyze perf data with standalone
      tools where direct perf or libbabeltrace integration is impractical.
      
      The exporter uses a minimal inline JSON encoding without any external
      dependencies. Currently it only outputs some headers and sample metadata
      but it's easily extensible.
      
      Use it like this:
      
        $ perf data convert --to-json out.json
      
      Committer notes:
      
      Fixup a __printf() bug that broke the build:
      
        util/data-convert-json.c:103:11: error: expected ‘)’ before numeric constant
          103 | __(printf, 5, 6)
              |           ^~
              |           )
        util/data-convert-json.c: In function ‘output_sample_callchain_entry’:
        util/data-convert-json.c:124:2: error: implicit declaration of function ‘output_json_key_format’; did you mean ‘output_json_format’? [-Werror=implicit-function-declaration]
          124 |  output_json_key_format(out, false, 5, "ip", "\"0x%" PRIx64 "\"", ip);
              |  ^~~~~~~~~~~~~~~~~~~~~~
              |  output_json_format
      
      Also had to add this patch to fix errors reported by various versions of
      clang:
      
        -       if (al && al->sym && al->sym->name && strlen(al->sym->name) > 0) {
        +       if (al && al->sym && al->sym->namelen) {
      
      al->sym->name is a zero sized array, to avoid one extra alloc in the
      symbol__new() constructor, sym->namelen carries its strlen.
      
      Committer testing:
      
        $ ls -la out.json
        ls: cannot access 'out.json': No such file or directory
        $ perf record sleep 0.1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
        $ perf report --stats | grep -w SAMPLE
                  SAMPLE events:          8
        $ perf data convert --to-json out.json
        [ perf data convert: Converted 'perf.data' into JSON data 'out.json' ]
        [ perf data convert: Converted and wrote 0.002 MB (8 samples) ]
        $ ls -la out.json
        -rw-rw-r--. 1 acme acme 2017 Apr 26 17:29 out.json
        $ cat out.json
        {
        	"linux-perf-json-version": 1,
        	"headers": {
        		"header-version": 1,
        		"captured-on": "2021-04-26T20:28:57Z",
        		"data-offset": 432,
        		"data-size": 1016,
        		"feat-offset": 1448,
        		"hostname": "five",
        		"os-release": "5.11.14-200.fc33.x86_64",
        		"arch": "x86_64",
        		"cpu-desc": "AMD Ryzen 9 3900X 12-Core Processor",
        		"cpuid": "AuthenticAMD,23,113,0",
        		"nrcpus-online": 24,
        		"nrcpus-avail": 24,
        		"perf-version": "5.12.gee134f3189bd",
        		"cmdline": [
        			"/home/acme/bin/perf",
        			"record",
        			"sleep",
        			"0.1"
        		]
        	},
        	"samples": [
        		{
        			"timestamp": 170517539043684,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6268827"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539048443,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa661359d"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539051018,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6311e18"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539053652,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb77b4812b",
        					"symbol": "_dl_start",
        					"dso": "ld-2.32.so"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539055306,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa6269286"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539057590,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0xffffffffa62abd8b"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539067559,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb77b5e9e9",
        					"symbol": "__GI___tunables_init",
        					"dso": "ld-2.32.so"
        				}
        			]
        		},
        		{
        			"timestamp": 170517539282452,
        			"pid": 375844,
        			"tid": 375844,
        			"comm": "sleep",
        			"callchain": [
        				{
        					"ip": "0x7fdb779978d2",
        					"symbol": "getenv",
        					"dso": "libc-2.32.so"
        				}
        			]
        		}
        	]
        }
        $
      Signed-off-by: NNicholas Fraser <nfraser@codeweavers.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tan Xiaojun <tanxiaojun@huawei.com>
      Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
      Link: http://lore.kernel.org/lkml/3884969f-804d-2f53-c648-e2b0bd85edff@codeweavers.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0713d4c
    • S
      perf stat: Introduce bpf_counter_ops->disable() · 5508c9da
      Song Liu 提交于
      Introduce bpf_counter_ops->disable(), which is used stop counting the
      event.
      
      Committer notes:
      
      Added a dummy bpf_counter__disable() to the python binding to avoid
      having 'perf test python' failing.
      
      bpf_counter isn't supported in the python binding.
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: kernel-team@fb.com
      Link: https://lore.kernel.org/r/20210425214333.1090950-6-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5508c9da
    • S
      perf stat: Introduce ':b' modifier · 01bd8efc
      Song Liu 提交于
      Introduce 'b' modifier to event parser, which means use BPF program to
      manage this event. This is the same as --bpf-counters option, but only
      applies to this event. For example,
      
        perf stat -e cycles:b,cs               # use bpf for cycles, but not cs
        perf stat -e cycles,cs --bpf-counters  # use bpf for both cycles and cs
      Suggested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/r/20210425214333.1090950-5-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01bd8efc
    • S
      perf stat: Introduce config stat.bpf-counter-events · 112cb561
      Song Liu 提交于
      Currently, to use BPF to aggregate perf event counters, the user uses
      --bpf-counters option. Enable "use bpf by default" events with a config
      option, stat.bpf-counter-events. Events with name in the option will use
      BPF.
      
      This also enables mixed BPF event and regular event in the same sesssion.
      For example:
      
         perf config stat.bpf-counter-events=instructions
         perf stat -e instructions,cs
      
      The second command will use BPF for "instructions" but not "cs".
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/r/20210425214333.1090950-4-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      112cb561
    • S
      perf bpf: check perf_attr_map is compatible with the perf binary · fe3dd826
      Song Liu 提交于
      perf_attr_map could be shared among different version of perf binary. Add
      bperf_attr_map_compatible() to check whether the existing attr_map is
      compatible with current perf binary.
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: kernel-team@fb.com
      Link: https://lore.kernel.org/r/20210425214333.1090950-3-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fe3dd826
    • S
      perf util: Move bpf_perf definitions to a libperf header · ec8149fb
      Song Liu 提交于
      By following the same protocol, other tools can share hardware PMCs with
      perf. Move perf_event_attr_map_entry and BPF_PERF_DEFAULT_ATTR_MAP_PATH to
      bpf_perf.h for other tools to use.
      Signed-off-by: NSong Liu <song@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: kernel-team@fb.com
      Link: https://lore.kernel.org/r/20210425214333.1090950-2-song@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec8149fb
  2. 26 4月, 2021 2 次提交