1. 14 8月, 2022 5 次提交
    • I
      perf pmu-events: Hide the pmu_events · 1ba3752a
      Ian Rogers 提交于
      Hide that the pmu_event structs are an array with a new wrapper struct.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-12-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1ba3752a
    • I
      perf pmu-events: Don't assume pmu_event is an array · 660842e4
      Ian Rogers 提交于
      The current code assumes that a struct pmu_event can be iterated over
      forward until a NULL pmu_event is encountered.
      
      This makes it difficult to refactor pmu_event.
      
      Add a loop function taking a callback function that's passed the struct
      pmu_event.
      
      This way the pmu_event is only needed for one element and not an entire
      array.
      
      Switch existing code iterating over the pmu_event arrays to use the new
      loop function pmu_events_table_for_each_event.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-11-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      660842e4
    • I
      perf pmu-events: Hide pmu_events_map · 29be2fe0
      Ian Rogers 提交于
      Move usage of the table to pmu-events.c so it may be hidden. By
      abstracting the table the implementation can later be changed.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-8-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29be2fe0
    • I
      perf pmu-events: Avoid passing pmu_events_map · eeac7730
      Ian Rogers 提交于
      Preparation for hiding pmu_events_map as an implementation detail. While
      the map is passed, the table of events is all that is normally wanted.
      
      While modifying the function's types, rename pmu_events_map__find to
      pmu_events_table__find to match later encapsulation. Similarly rename
      pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eeac7730
    • I
      perf pmu-events: Hide pmu_sys_event_tables · 2519db2a
      Ian Rogers 提交于
      Move usage of the table to pmu-events.c so it may be hidden. By
      abstracting the table the implementation can later be changed.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2519db2a
  2. 13 8月, 2022 2 次提交
  3. 12 8月, 2022 9 次提交
    • M
      perf build-id: Print debuginfod queries if -v option is used · a072a7a0
      Martin Liška 提交于
      When ending a 'perf record' session, the querying of a debuginfod server
      can take quite some time. Inform a user about it when -v options is
      used.
      Signed-off-by: NMartin Liška <mliska@suse.cz>
      Link: http://lore.kernel.org/lkml/325871cf-b71f-6237-8793-82182272ece8@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a072a7a0
    • M
      34575ded
    • L
      perf mem: Add statistics for peer snooping · e843dec5
      Leo Yan 提交于
      Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
      from peer cache line, it can come from a peer core, a peer cluster, or
      a remote NUMA node.
      
      This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
      take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
      with cache level statistics.  Therefore, we account the load operations
      for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
      peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.
      
      So three new metrics are introduced: 'lcl_peer' is for local cache
      access, the metric 'rmt_peer' is for remote access (includes remote DRAM
      and any caches in remote node), and the metric 'tot_peer' is accounting
      the sum value of 'lcl_peer' and 'rmt_peer'.
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-5-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e843dec5
    • A
      perf arm-spe: Use SPE data source for neoverse cores · 4e6430cb
      Ali Saidi 提交于
      When synthesizing data from SPE, augment the type with source information
      for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
      the same encoding. I can't find encoding information for any other SPE
      implementations to unify their choices with Arm's thus that is left for
      future work.
      
      This change populates the mem_lvl_num for Neoverse cores as well as the
      deprecated mem_lvl namespace.
      Reviewed-by: NGerman Gomez <german.gomez@arm.com>
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NAli Saidi <alisaidi@amazon.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-4-leo.yan@linaro.orgSigned-off-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4e6430cb
    • L
      perf mem: Print snoop peer flag · f78d6250
      Leo Yan 提交于
      Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
      it is set.
      
      Before:
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      
      After:
      
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      Reviewed-by: NAli Saidi <alisaidi@amazon.com>
      Reviewed-by: NKajol Jain <kjain@linux.ibm.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NAli Saidi <alisaidi@amazon.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f78d6250
    • L
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h... · 4a88c4ec
      Leo Yan 提交于
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h when building arm_spe.h
      
      This cures a current problem where tools/perf/util/arm-spe.c isn't
      finding a ARM64 specific asm header, so lets add it for now to make
      progress.
      
      Adding a .o specific rule seems clunky, lets try and find if this is
      really the right solution.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Reported-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Link: https://lore.kernel.org/lkml/20220811124825.GA868014@leoy-huanghe.lanSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4a88c4ec
    • N
      perf offcpu: Track child processes · d2347763
      Namhyung Kim 提交于
      When -p option used or a workload is given, it needs to handle child
      processes.  The perf_event can inherit those task events
      automatically.  We can add a new BPF program in task_newtask
      tracepoint to track child processes.
      
      Before:
        $ sudo perf record --off-cpu -- perf bench sched messaging
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:        1
      
      After:
        $ sudo perf record -a --off-cpu -- perf bench sched messaging
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:      856
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2347763
    • N
      perf offcpu: Parse process id separately · d6f415ca
      Namhyung Kim 提交于
      The current target code uses thread id for tracking tasks because
      perf_events need to be opened for each task.  But we can use tgid in
      BPF maps and check it easily.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d6f415ca
    • N
      perf offcpu: Check process id for the given workload · 07fc958b
      Namhyung Kim 提交于
      Current task filter checks task->pid which is different for each
      thread.  But we want to profile all the threads in the process.  So
      let's compare process id (or thread-group id: tgid) instead.
      
      Before:
        $ sudo perf record --off-cpu -- perf bench sched messaging -t
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:        2
      
      After:
        $ sudo perf record --off-cpu -- perf bench sched messaging -t
        $ sudo perf report --stat | grep -A1 offcpu
        offcpu-time stats:
                  SAMPLE events:      850
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220811185456.194721-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      07fc958b
  4. 11 8月, 2022 2 次提交
  5. 10 8月, 2022 4 次提交
    • A
      perf machine: Fix missing free of machine->kallsyms_filename · b39c9e1b
      Adrian Hunter 提交于
      Add missing free of machine->kallsyms_filename to machine__exit().
      
      Fixes: a5367ecb ("perf tools: Automatically use guest kcore_dir if present")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220809130758.12800-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b39c9e1b
    • C
      perf probe: Fix an error handling path in 'parse_perf_probe_command()' · 4bf6dcaa
      Christophe JAILLET 提交于
      If a memory allocation fail, we should branch to the error handling path
      in order to free some resources allocated a few lines above.
      
      Fixes: 15354d54 ("perf probe: Generate event name with line number")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: kernel-janitors@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/b71bcb01fa0c7b9778647235c3ab490f699ba278.1659797452.git.christophe.jaillet@wanadoo.frSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4bf6dcaa
    • B
      perf inject jit: Ignore memfd and anonymous mmap events if jitdump present · 46f7bd5e
      Brian Robbins 提交于
      Some processes store jitted code in memfd mappings to avoid having rwx
      mappings.  These processes map the code with a writeable mapping and a
      read-execute mapping.  They write the code using the writeable mapping
      and then unmap the writeable mapping.  All subsequent execution is
      through the read-execute mapping.
      
      perf inject --jit ignores //anon* mappings for each process where a
      jitdump is present because it expects to inject mmap events for each
      jitted code range, and said jitted code ranges will overlap with the
      //anon* mappings.
      
      Ignore /memfd: and [anon:* mappings so that jitted code contained in
      /memfd: and [anon:* mappings is treated the same way as jitted code
      contained in //anon* mappings.
      Signed-off-by: NBrian Robbins <brianrob@linux.microsoft.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220805220645.95855-1-brianrob@linux.microsoft.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      46f7bd5e
    • C
      perf stat: Add JSON output option · df936cad
      Claire Jensen 提交于
      CSV output is tricky to format and column layout changes are susceptible
      to breaking parsers. New JSON-formatted output has variable names to
      identify fields that are consistent and informative, making the output
      parseable.
      
      CSV output example:
      
        1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
        0,,context-switches:u,1204272,100.00,0.000,/sec
        0,,cpu-migrations:u,1204272,100.00,0.000,/sec
        70,,page-faults:u,1204272,100.00,58.126,K/sec
      
      JSON output example:
      
        {"counter-value" : "3805.723968", "unit" : "msec", "event" :
        "cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
        : 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "6166.000000", "unit" : "", "event" :
        "context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
        : 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
        {"counter-value" : "466.000000", "unit" : "", "event" :
        "cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
        : 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
        {"counter-value" : "208.000000", "unit" : "", "event" :
        "page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
        : 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
      
      Also added documentation for JSON option.
      
      There is some tidy up of CSV code including a potential memory over run
      in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
      
      Committer notes:
      
      Fixed up using PRIu64 to format u64 values, not %lu.
      
      Committer testing:
      
        ⬢[acme@toolbox perf]$ perf stat -j sleep 1
        {"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
        {"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
        {"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
        {"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
        {"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
        {"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
        {"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
        {"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
        ⬢[acme@toolbox perf]$
      Signed-off-by: NClaire Jensen <cjense@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alyssa Ross <hi@alyssa.is>
      Cc: Claire Jensen <clairej735@gmail.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.comSigned-off-by: NIan Rogers <irogers@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      df936cad
  6. 03 8月, 2022 7 次提交
  7. 02 8月, 2022 3 次提交
    • A
      tools perf: Fix compilation error with new binutils · 83aa0120
      Andres Freund 提交于
      binutils changed the signature of init_disassemble_info(), which now causes
      compilation failures for tools/perf/util/annotate.c, e.g. on debian
      unstable.
      
      Relevant binutils commit:
      
        https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
      
      Wire up the feature test and switch to init_disassemble_info_compat(),
      which were introduced in prior commits, fixing the compilation failure.
      
      I verified that perf can still disassemble bpf programs by using bpftrace
      under load, recording a perf trace, and then annotating the bpf "function"
      with and without the changes. With old binutils there's no change in output
      before/after this patch. When comparing the output from old binutils (2.35)
      to new bintuils with the patch (upstream snapshot) there are a few output
      differences, but they are unrelated to this patch. An example hunk is:
      
             1.15 :   55:mov    %rbp,%rdx
             0.00 :   58:add    $0xfffffffffffffff8,%rdx
             0.00 :   5c:xor    %ecx,%ecx
        -    1.03 :   5e:callq  0xffffffffe12aca3c
        +    1.03 :   5e:call   0xffffffffe12aca3c
             0.00 :   63:xor    %eax,%eax
        -    2.18 :   65:leaveq
        -    2.82 :   66:retq
        +    2.18 :   65:leave
        +    2.82 :   66:ret
      Signed-off-by: NAndres Freund <andres@anarazel.de>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Ben Hutchings <benh@debian.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: bpf@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
      Link: https://lore.kernel.org/r/20220801013834.156015-5-andres@anarazel.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83aa0120
    • J
      perf tools: Rework prologue generation code · 5f4e821c
      Jiri Olsa 提交于
      Some functions we use for bpf prologue generation are going to be
      deprecated. This change reworks current code not to use them.
      
      We need to replace following functions/struct:
         bpf_program__set_prep
         bpf_program__nth_fd
         struct bpf_prog_prep_result
      
      Currently we use bpf_program__set_prep to hook perf callback before
      program is loaded and provide new instructions with the prologue.
      
      We replace this function/ality by taking instructions for specific
      program, attaching prologue to them and load such new ebpf programs
      with prologue using separate bpf_prog_load calls (outside libbpf
      load machinery).
      
      Before we can take and use program instructions, we need libbpf to
      actually load it. This way we get the final shape of its instructions
      with all relocations and verifier adjustments).
      
      There's one glitch though.. perf kprobe program already assumes
      generated prologue code with proper values in argument registers,
      so loading such program directly will fail in the verifier.
      
      That's where the fallback pre-load handler fits in and prepends
      the initialization code to the program. Once such program is loaded
      we take its instructions, cut off the initialization code and prepend
      the prologue.
      
      I know.. sorry ;-)
      
      To have access to the program when loading this patch adds support to
      register 'fallback' section handler to take care of perf kprobe programs.
      The fallback means that it handles any section definition besides the
      ones that libbpf handles.
      
      The handler serves two purposes:
        - allows perf programs to have special arguments in section name
        - allows perf to use pre-load callback where we can attach init
          code (zeroing all argument registers) to each perf program
      Suggested-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/20220616202214.70359-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5f4e821c
    • J
      perf bpf: Convert legacy map definition to BTF-defined · 8b1e1a03
      Jiri Olsa 提交于
      The libbpf is switching off support for legacy map definitions [1],
      which will break the perf llvm tests.
      
      Moving the base source map definition to BTF-defined, so we need
      to use -g compile option for to add debug/BTF info.
      
      [1] https://lore.kernel.org/bpf/20220627211527.2245459-1-andrii@kernel.org/Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220704152721.352046-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b1e1a03
  8. 01 8月, 2022 3 次提交
    • I
      perf symbol: Fail to read phdr workaround · 6d518ac7
      Ian Rogers 提交于
      The perf jvmti agent doesn't create program headers, in this case
      fallback on section headers as happened previously.
      
      Committer notes:
      
      To test this, from a public post by Ian:
      
      1) download a Java workload dacapo-9.12-MR1-bach.jar from
      https://sourceforge.net/projects/dacapobench/
      
      2) build perf such as "make -C tools/perf O=/tmp/perf NO_LIBBFD=1" it
      should detect Java and create /tmp/perf/libperf-jvmti.so
      
      3) run perf with the jvmti agent:
      
        perf record -k 1 java -agentpath:/tmp/perf/libperf-jvmti.so -jar dacapo-9.12-MR1-bach.jar -n 10 fop
      
      4) run perf inject:
      
        perf inject -i perf.data -o perf-injected.data -j
      
      5) run perf report
      
        perf report -i perf-injected.data | grep org.apache.fop
      
      With this patch reverted I see lots of symbols like:
      
           0.00%  java             jitted-388040-4656.so  [.] org.apache.fop.fo.FObj.bind(org.apache.fop.fo.PropertyList)
      
      With the patch (2d86612a ("perf symbol: Correct address for bss
      symbols")) I see lots of:
      
        dso__load_sym_internal: failed to find program header for symbol:
        Lorg/apache/fop/fo/FObj;bind(Lorg/apache/fop/fo/PropertyList;)V
        st_value: 0x40
      
      Fixes: 2d86612a ("perf symbol: Correct address for bss symbols")
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20220731164923.691193-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d518ac7
    • N
      perf lock: Implement cpu and task filters for BPF · 6fda2405
      Namhyung Kim 提交于
      Add -a/--all-cpus and -C/--cpu options for cpu filtering.  Also -p/--pid
      and --tid options are added for task filtering.  The short -t option is
      taken for --threads already.  Tracking the command line workload is
      possible as well.
      
        $ sudo perf lock contention -a -b sleep 1
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220729200756.666106-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6fda2405
    • N
      perf lock: Use BPF for lock contention analysis · 407b36f6
      Namhyung Kim 提交于
      Add -b/--use-bpf option to use BPF to collect lock contention stats.
      For simplicity it now runs system-wide and requires C-c to stop.
      Upcoming changes will add the usual filtering.
      
        $ sudo perf lock con -b
        ^C
         contended   total wait     max wait     avg wait         type   caller
      
                42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
                23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
                 6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
                 3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
                 1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
                 1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
                 2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
                 1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
                 2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
                 1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220729200756.666106-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      407b36f6
  9. 30 7月, 2022 4 次提交
    • Z
      perf stat: Add topdown metrics in the default perf stat on the hybrid machine · 9a0b3626
      Zhengjun Xing 提交于
      Topdown metrics are missed in the default perf stat on the hybrid machine,
      add Topdown metrics in default perf stat for hybrid systems.
      
      Currently, we support the perf metrics Topdown for the p-core PMU in the
      perf stat default, the perf metrics Topdown support for e-core PMU will be
      implemented later separately. Refactor the code adds two x86 specific
      functions. Widen the size of the event name column by 7 chars, so that all
      metrics after the "#" become aligned again.
      
      The perf metrics topdown feature is supported on the cpu_core of ADL. The
      dedicated perf metrics counter and the fixed counter 3 are used for the
      topdown events. Adding the topdown metrics doesn't trigger multiplexing.
      
      Before:
      
       # ./perf  stat  -a true
      
       Performance counter stats for 'system wide':
      
                   53.70 msec cpu-clock                 #   25.736 CPUs utilized
                      80      context-switches          #    1.490 K/sec
                      24      cpu-migrations            #  446.951 /sec
                      52      page-faults               #  968.394 /sec
               2,788,555      cpu_core/cycles/          #   51.931 M/sec
                 851,129      cpu_atom/cycles/          #   15.851 M/sec
               2,974,030      cpu_core/instructions/    #   55.385 M/sec
                 416,919      cpu_atom/instructions/    #    7.764 M/sec
                 586,136      cpu_core/branches/        #   10.916 M/sec
                  79,872      cpu_atom/branches/        #    1.487 M/sec
                  14,220      cpu_core/branch-misses/   #  264.819 K/sec
                   7,691      cpu_atom/branch-misses/   #  143.229 K/sec
      
             0.002086438 seconds time elapsed
      
      After:
      
       # ./perf stat  -a true
      
       Performance counter stats for 'system wide':
      
                   61.39 msec cpu-clock                        #   24.874 CPUs utilized
                      76      context-switches                 #    1.238 K/sec
                      24      cpu-migrations                   #  390.968 /sec
                      52      page-faults                      #  847.097 /sec
               2,753,695      cpu_core/cycles/                 #   44.859 M/sec
                 903,899      cpu_atom/cycles/                 #   14.725 M/sec
               2,927,529      cpu_core/instructions/           #   47.690 M/sec
                 428,498      cpu_atom/instructions/           #    6.980 M/sec
                 581,299      cpu_core/branches/               #    9.470 M/sec
                  83,409      cpu_atom/branches/               #    1.359 M/sec
                  13,641      cpu_core/branch-misses/          #  222.216 K/sec
                   8,008      cpu_atom/branch-misses/          #  130.453 K/sec
              14,761,308      cpu_core/slots/                  #  240.466 M/sec
               3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
               1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
               5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
               4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
                 646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
               1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
               3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
               1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
      
             0.002467839 seconds time elapsed
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a0b3626
    • K
      perf x86 evlist: Add default hybrid events for perf stat · cdb204ad
      Kan Liang 提交于
      Provide a new solution to replace the reverted commit ac2dc29e
      ("perf stat: Add default hybrid events")
      
      For the default software attrs, nothing is changed.
      
      For the default hardware attrs, create a new evsel for each hybrid pmu.
      
      With the new solution, adding a new default attr will not require the
      special support for the hybrid platform anymore.
      
      Also, the "--detailed" is supported on the hybrid platform
      
      With the patch,
      
        $ perf stat -a -ddd sleep 1
      
         Performance counter stats for 'system wide':
      
               32,231.06 msec cpu-clock                 #   32.056 CPUs utilized
                     529      context-switches          #   16.413 /sec
                      32      cpu-migrations            #    0.993 /sec
                      69      page-faults               #    2.141 /sec
             176,754,151      cpu_core/cycles/          #    5.484 M/sec          (41.65%)
             161,695,280      cpu_atom/cycles/          #    5.017 M/sec          (49.92%)
              48,595,992      cpu_core/instructions/    #    1.508 M/sec          (49.98%)
              32,363,337      cpu_atom/instructions/    #    1.004 M/sec          (58.26%)
              10,088,639      cpu_core/branches/        #  313.010 K/sec          (58.31%)
               6,390,582      cpu_atom/branches/        #  198.274 K/sec          (58.26%)
                 846,201      cpu_core/branch-misses/   #   26.254 K/sec          (66.65%)
                 676,477      cpu_atom/branch-misses/   #   20.988 K/sec          (58.27%)
              14,290,070      cpu_core/L1-dcache-loads/ #  443.363 K/sec          (66.66%)
               9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 K/sec          (58.27%)
                 740,725      cpu_core/L1-dcache-load-misses/ #   22.982 K/sec    (66.66%)
         <not supported>      cpu_atom/L1-dcache-load-misses/
                 480,441      cpu_core/LLC-loads/       #   14.906 K/sec          (66.67%)
                 326,570      cpu_atom/LLC-loads/       #   10.132 K/sec          (58.27%)
                     329      cpu_core/LLC-load-misses/ #   10.208 /sec           (66.68%)
                       0      cpu_atom/LLC-load-misses/ #    0.000 /sec           (58.32%)
         <not supported>      cpu_core/L1-icache-loads/
              21,982,491      cpu_atom/L1-icache-loads/ #  682.028 K/sec          (58.43%)
               4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 K/sec    (33.34%)
               4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 K/sec    (50.08%)
              13,713,090      cpu_core/dTLB-loads/      #  425.462 K/sec          (33.34%)
               9,384,727      cpu_atom/dTLB-loads/      #  291.170 K/sec          (50.08%)
                 157,387      cpu_core/dTLB-load-misses/ #    4.883 K/sec         (33.33%)
                 108,328      cpu_atom/dTLB-load-misses/ #    3.361 K/sec         (50.08%)
         <not supported>      cpu_core/iTLB-loads/
         <not supported>      cpu_atom/iTLB-loads/
                  37,655      cpu_core/iTLB-load-misses/ #    1.168 K/sec         (33.32%)
                  61,661      cpu_atom/iTLB-load-misses/ #    1.913 K/sec         (50.03%)
         <not supported>      cpu_core/L1-dcache-prefetches/
         <not supported>      cpu_atom/L1-dcache-prefetches/
         <not supported>      cpu_core/L1-dcache-prefetch-misses/
         <not supported>      cpu_atom/L1-dcache-prefetch-misses/
      
               1.005466919 seconds time elapsed
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-5-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cdb204ad
    • K
      perf evlist: Always use arch_evlist__add_default_attrs() · a9c1ecda
      Kan Liang 提交于
      Current perf stat uses the evlist__add_default_attrs() to add the
      generic default attrs, and uses arch_evlist__add_default_attrs() to add
      the Arch specific default attrs, e.g., Topdown for x86.
      
      It works well for the non-hybrid platforms. However, for a hybrid
      platform, the hard code generic default attrs don't work.
      
      Uses arch_evlist__add_default_attrs() to replace the
      evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
      modified to invoke the same __evlist__add_default_attrs() for the
      generic default attrs. No functional change.
      
      Add default_null_attrs[] to indicate the arch specific attrs.
      No functional change for the arch specific default attrs either.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-4-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9c1ecda
    • K
      perf evsel: Add arch_evsel__hw_name() · ff4207f7
      Kan Liang 提交于
      The commit 55bcf6ef ("perf: Extend PERF_TYPE_HARDWARE and
      PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for
      a hybrid system. However, current evsel__hw_name doesn't take the PMU
      type into account. It mistakenly returns the "unknown-hardware" for the
      hardware event with a specific PMU type.
      
      Add an arch specific arch_evsel__hw_name() to specially handle the PMU
      aware hardware event.
      
      Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only
      supported by X86. Only implement the specific arch_evsel__hw_name() for
      X86 in the patch.
      
      Nothing is changed for the other archs.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-3-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff4207f7
  10. 27 7月, 2022 1 次提交
    • I
      perf bpf: Remove undefined behavior from bpf_perf_object__next() · 9a241805
      Ian Rogers 提交于
      bpf_perf_object__next() folded the last element in the list test with the
      empty list test. However, this meant that offsets were computed against
      null and that a struct list_head was compared against a 'struct
      bpf_perf_object'.
      
      Working around this with clang's undefined behavior sanitizer required
      -fno-sanitize=null and -fno-sanitize=object-size.
      
      Remove the undefined behavior by using the regular Linux list APIs and
      handling the starting case separately from the end testing case.
      
      Looking at uses like bpf_perf_object__for_each(), as the constant NULL
      or non-NULL argument can be constant propagated, the code is no less
      efficient.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Christy Lee <christylee@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: bpf@vger.kernel.org
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20220726220921.2567761-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a241805