1. 13 1月, 2022 15 次提交
    • I
      perf cpumap: Simplify equal function name · 3ac23d19
      Ian Rogers 提交于
      Rename cpu_map__compare_aggr_cpu_id() to aggr_cpu_id__equal(), the
      cpu_map part of the name is misleading. Equal better describes the
      function than compare.
      
      Switch to const pointer rather than value as struct given the number of
      variables in aggr_cpu_id().
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-14-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ac23d19
    • I
      perf cpumap: Remove unused cpu_map__socket() · 63e0fa87
      Ian Rogers 提交于
      Unused function so remove.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-13-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63e0fa87
    • I
      perf cpumap: Add comments to aggr_cpu_id() · 49679da3
      Ian Rogers 提交于
      This code is already tested in topology.c.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-12-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      49679da3
    • I
      perf cpumap: Remove map+index get_node() · 86d94048
      Ian Rogers 提交于
      Migrate final users to appropriate cpu variant.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-11-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      86d94048
    • I
      perf cpumap: Remove map+index get_core() · 3f6233dc
      Ian Rogers 提交于
      Migrate final users to appropriate cpu variant.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-10-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f6233dc
    • I
      perf cpumap: Remove map+index get_die() · 1cdae3d6
      Ian Rogers 提交于
      Migrate final users to appropriate cpu variant.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-9-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1cdae3d6
    • I
      perf cpumap: Remove map+index get_socket() · 448a69d9
      Ian Rogers 提交于
      Migrate final users to appropriate cpu variant.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-8-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      448a69d9
    • I
      perf cpumap: Switch cpu_map__build_map() to cpu function · eff54c24
      Ian Rogers 提交于
      Avoid error prone cpu_map + idx variant. Remove now unused functions.
      
      Committer notes:
      
      Remove by now unused perf_env__get_cpu().
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-7-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eff54c24
    • I
      perf stat: Switch to cpu version of cpu_map__get() · 88031a0d
      Ian Rogers 提交于
      Avoid possible bugs where the wrong index is passed with the cpu_map.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-6-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      88031a0d
    • I
      perf stat: Switch aggregation to use for_each loop · a023283f
      Ian Rogers 提交于
      Tidy up the use of cpu and index to hopefully make the code less error
      prone. Avoid unused warnings with (void) which will be removed in a
      later patch.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-5-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a023283f
    • I
      perf stat: Correct aggregation CPU map · 01843ca0
      Ian Rogers 提交于
      Switch the perf_cpu_map in aggr_update_shadow from
      the evlist to the counter's cpu map, so the index is appropriate. This
      addresses a problem where uncore counts, with a cpumap like:
      $ cat /sys/devices/uncore_imc_0/cpumask
      0,18
      Don't aggregate counts in CPUs based on the index of those values in the
      cpumap (0 and 1) but on the actual CPU (0 and 18). Thereby correcting
      metric calculations in per-socket mode for counters without a full
      cpumask.
      
      On a SkylakeX with a tweaked DRAM_BW_Use metric, to remove unnecessary
      scaling, this gives:
      
      Before:
      $ /perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001102293 S0        1              27.01 MiB  uncore_imc/cas_count_write/ #   103.00 DRAM_BW_Use
           1.001102293 S0        1              30.22 MiB  uncore_imc/cas_count_read/
           1.001102293 S0        1      1,001,102,293 ns   duration_time
           1.001102293 S1        1              20.10 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           1.001102293 S1        1              32.74 MiB  uncore_imc/cas_count_read/
           1.001102293 S1        0      <not counted> ns   duration_time
           2.003517973 S0        1              83.04 MiB  uncore_imc/cas_count_write/ #   920.00 DRAM_BW_Use
           2.003517973 S0        1             145.95 MiB  uncore_imc/cas_count_read/
           2.003517973 S0        1      1,002,415,680 ns   duration_time
           2.003517973 S1        1             302.45 MiB  uncore_imc/cas_count_write/ #     0.00 DRAM_BW_Use
           2.003517973 S1        1             290.99 MiB  uncore_imc/cas_count_read/
           2.003517973 S1        0      <not counted> ns   duration_time
      
      After:
      $ perf stat --per-socket -M DRAM_BW_Use -I 1000
           1.001080840 S0        1              24.96 MiB  uncore_imc/cas_count_write/ #    54.00 DRAM_BW_Use
           1.001080840 S0        1              33.64 MiB  uncore_imc/cas_count_read/
           1.001080840 S0        1      1,001,080,840 ns   duration_time
           1.001080840 S1        1              42.43 MiB  uncore_imc/cas_count_write/ #    84.00 DRAM_BW_Use
           1.001080840 S1        1              47.05 MiB  uncore_imc/cas_count_read/
           1.001080840 S1        0      <not counted> ns   duration_time
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      01843ca0
    • I
      perf stat: Add aggr creators that are passed a cpu · ca2c9b76
      Ian Rogers 提交于
      The cpu_map and index can get confused. Add variants of the cpu_map__get
      routines that are passed a cpu. Make the existing cpu_map__get routines
      use the new functions with a view to remove them when no longer used.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-3-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ca2c9b76
    • I
      libperf: Add comments to 'struct perf_cpu_map' · 818ab78c
      Ian Rogers 提交于
      A particular observed problem is confusing the index with the CPU value,
      documentation should hopefully reduce this type of problem.
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Reviewed-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Vineet Singh <vineet.singh@intel.com>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220105061351.120843-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      818ab78c
    • I
      perf evsel: Improve error message for uncore events · dcffc5eb
      Ian Rogers 提交于
      When a group has multiple events and the leader fails it can yield
      errors like:
      
        $ perf stat -e '{uncore_imc/cas_count_read/},instructions' /bin/true
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_imc/cas_count_read/).
        /bin/dmesg | grep -i perf may provide additional information.
      
      However, when not the group leader <not supported> is given:
      
        $ perf stat -e '{instructions,uncore_imc/cas_count_read/}' /bin/true
        ...
                 1,619,057      instructions
           <not supported> MiB  uncore_imc/cas_count_read/
      
      This is necessary because get_group_fd will fail if the leader fails and
      is the direct result of the check on line 750 of builtin-stat.c in
      stat_handle_error that returns COUNTER_SKIP for the latter case.
      
      This patch improves the error message to:
      
        $ perf stat -e '{uncore_imc/cas_count_read/},instructions' /bin/true
        Error:
        Invalid event (uncore_imc/cas_count_read/) in per-thread mode, enable system wide with '-a'.
      
      v2. Changed the test to use !target__has_cpu as suggested by Namhyung Kim.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20211223183948.3423989-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dcffc5eb
    • A
      perf script: Fix hex dump character output · 62942e9f
      Adrian Hunter 提交于
      Using grep -C with perf script -D can give erroneous results as grep loses
      lines due to non-printable characters, for example, below the 0020, 0060
      and 0070 lines are missing:
      
       $ perf script -D | grep -C10 AUX | head
       .  0010:  08 00 00 00 00 00 00 00 1f 00 00 00 00 00 00 00  ................
       .  0030:  01 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00  ................
       .  0040:  00 08 00 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
       .  0050:  00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
       .  0080:  02 00 00 00 00 00 00 00 1b 00 00 00 00 00 00 00  ................
       .  0090:  00 00 00 00 00 00 00 00                          ........
      
       0 0 0x450 [0x98]: PERF_RECORD_AUXTRACE_INFO type: 1
         PMU Type            8
         Time Shift          31
      
      perf's isprint() is a custom implementation from the kernel, but the
      kernel's _ctype appears to include characters from Latin-1 Supplement which
      is not compatible with, for example, UTF-8. Fix by checking also isascii().
      
      After:
      
       $ tools/perf/perf script -D | grep -C10 AUX | head
       .  0010:  08 00 00 00 00 00 00 00 1f 00 00 00 00 00 00 00  ................
       .  0020:  03 84 32 2f 00 00 00 00 63 7c 4f d2 fa ff ff ff  ..2/....c|O.....
       .  0030:  01 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00  ................
       .  0040:  00 08 00 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
       .  0050:  00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
       .  0060:  00 02 00 00 00 00 00 00 00 c0 03 00 00 00 00 00  ................
       .  0070:  e2 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
       .  0080:  02 00 00 00 00 00 00 00 1b 00 00 00 00 00 00 00  ................
       .  0090:  00 00 00 00 00 00 00 00                          ........
      
      Fixes: 3052ba56 ("tools perf: Move from sane_ctype.h obtained from git to the Linux's original")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20220112085057.277205-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      62942e9f
  2. 12 1月, 2022 1 次提交
  3. 11 1月, 2022 3 次提交
  4. 08 1月, 2022 2 次提交
    • A
      Revert "libtraceevent: Increase libtraceevent logging when verbose" · dc9f2dd5
      Arnaldo Carvalho de Melo 提交于
      This reverts commit 08efcb4a.
      
      This breaks the build as it will prefer using libbpf-devel header files,
      even when not using LIBBPF_DYNAMIC=1, breaking the build.
      
      This was detected on OpenSuSE Tumbleweed with libtraceevent-devel 1.3.0,
      as described by Jiri Slaby:
      
      =======================================================================
      It breaks build with LIBTRACEEVENT_DYNAMIC and version 1.3.0:
      > util/debug.c: In function ‘perf_debug_option’:
      > util/debug.c:243:17: error: implicit declaration of function
      ‘tep_set_loglevel’ [-Werror=implicit-function-declaration]
      >   243 |                 tep_set_loglevel(TEP_LOG_INFO);
      >       |                 ^~~~~~~~~~~~~~~~
      > util/debug.c:243:34: error: ‘TEP_LOG_INFO’ undeclared (first use in this
      function); did you mean ‘TEP_PRINT_INFO’?
      >   243 |                 tep_set_loglevel(TEP_LOG_INFO);
      >       |                                  ^~~~~~~~~~~~
      >       |                                  TEP_PRINT_INFO
      > util/debug.c:243:34: note: each undeclared identifier is reported only once
      for each function it appears in
      > util/debug.c:245:34: error: ‘TEP_LOG_DEBUG’ undeclared (first use in this
      function)
      >   245 |                 tep_set_loglevel(TEP_LOG_DEBUG);
      >       |                                  ^~~~~~~~~~~~~
      > util/debug.c:247:34: error: ‘TEP_LOG_ALL’ undeclared (first use in this
      function)
      >   247 |                 tep_set_loglevel(TEP_LOG_ALL);
      >       |                                  ^~~~~~~~~~~
      
      It is because the gcc's command line looks like:
      gcc
      ...
      -I/home/abuild/rpmbuild/BUILD/tools/lib/
      ...
      -DLIBTRACEEVENT_VERSION=65790
      ...
      =======================================================================
      
      The proper way to fix this is more involved and so not suitable for this
      late in the 5.16-rc stage.
      Reported-by: NJiri Slaby <jirislaby@kernel.org>
      Link: https://lore.kernel.org/lkml/bc2b0786-8965-1bcd-2316-9d9bb37b9c31@kernel.org
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: https://lore.kernel.org/lkml/YddGjjmlMZzxUZbN@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dc9f2dd5
    • J
      perf trace: Avoid early exit due to running SIGCHLD handler before it makes sense to · f06a82f9
      Jiri Olsa 提交于
      When running 'perf trace' with an BPF object like:
      
        # perf trace -e openat,tools/perf/examples/bpf/hello.c
      
      the event parsing eventually calls llvm__get_kbuild_opts() that runs a
      script and that ends up with SIGCHLD delivered to the 'perf trace'
      handler, which assumes the workload process is done and quits 'perf
      trace'.
      
      Move the SIGCHLD handler setup directly to trace__run(), where the event
      is parsed and the object is already compiled.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Christy Lee <christyc.y.lee@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20220106222030.227499-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f06a82f9
  5. 07 1月, 2022 3 次提交
  6. 06 1月, 2022 1 次提交
  7. 02 1月, 2022 2 次提交
    • Y
      perf top: Fix TUI exit screen refresh race condition · 64f18d2d
      yaowenbin 提交于
      When the following command is executed several times, a coredump file is
      generated.
      
      	$ timeout -k 9 5 perf top -e task-clock
      	*******
      	*******
      	*******
      	0.01%  [kernel]                  [k] __do_softirq
      	0.01%  libpthread-2.28.so        [.] __pthread_mutex_lock
      	0.01%  [kernel]                  [k] __ll_sc_atomic64_sub_return
      	double free or corruption (!prev) perf top --sort comm,dso
      	timeout: the monitored command dumped core
      
      When we terminate "perf top" using sending signal method,
      SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen
      management routines by freeing all memory allocated while it was active.
      
      However SLsmg_reinit_smg() maybe be called by another thread.
      
      SLsmg_reinit_smg() will free the same memory accessed by
      SLsmg_reset_smg(), thus it results in a double free.
      
      SLsmg_reinit_smg() is called already protected by ui__lock, so we fix
      the problem by adding pthread_mutex_trylock of ui__lock when calling
      SLsmg_reset_smg().
      Signed-off-by: NWenyu Liu <liuwenyu7@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: wuxu.wu@huawei.com
      Link: http://lore.kernel.org/lkml/a91e3943-7ddc-f5c0-a7f5-360f073c20e6@huawei.comSigned-off-by: NHewenliang <hewenliang4@huawei.com>
      Signed-off-by: Nyaowenbin <yaowenbin1@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      64f18d2d
    • J
      perf pmu: Fix alias events list · e0257a01
      John Garry 提交于
      Commit 0e0ae874 ("perf list: Display hybrid PMU events with cpu
      type") changes the event list for uncore PMUs or arm64 heterogeneous CPU
      systems, such that duplicate aliases are incorrectly listed per PMU
      (which they should not be), like:
      
        # perf list
        ...
        unc_cbo_cache_lookup.any_es
        [Unit: uncore_cbox L3 Lookup any request that access cache and found
        line in E or S-state]
        unc_cbo_cache_lookup.any_es
        [Unit: uncore_cbox L3 Lookup any request that access cache and found
        line in E or S-state]
        unc_cbo_cache_lookup.any_i
        [Unit: uncore_cbox L3 Lookup any request that access cache and found
        line in I-state]
        unc_cbo_cache_lookup.any_i
        [Unit: uncore_cbox L3 Lookup any request that access cache and found
        line in I-state]
        ...
      
      Notice how the events are listed twice.
      
      The named commit changed how we remove duplicate events, in that events
      for different PMUs are not treated as duplicates. I suppose this is to
      handle how "Each hybrid pmu event has been assigned with a pmu name".
      
      Fix PMU alias listing by restoring behaviour to remove duplicates for
      non-hybrid PMUs.
      
      Fixes: 0e0ae874 ("perf list: Display hybrid PMU events with cpu type")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Tested-by: NZhengjun Xing <zhengjun.xing@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/1640103090-140490-1-git-send-email-john.garry@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e0257a01
  8. 01 1月, 2022 2 次提交
    • J
      selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature · 5e75d0b2
      Jianguo Wu 提交于
      As Paolo pointed out, the result of ping IPv6 address depends on
      the running distro. So explicitly checking the available ping feature,
      as e.g. do the bareudp.sh self-tests.
      
      Fixes: 8b3170e0 ("selftests: net: using ping6 for IPv6 in udpgro_fwd.sh")
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Link: https://lore.kernel.org/r/825ee22b-4245-dbf7-d2f7-a230770d6e21@163.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5e75d0b2
    • M
      userfaultfd/selftests: fix hugetlb area allocations · f5c73297
      Mike Kravetz 提交于
      Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
      or any environment where there are 'just enough' hugetlb pages will
      always fail with:
      
        testing events (fork, remap, remove):
      		ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)
      
      The ENOMEM error code implies there are not enough hugetlb pages.
      However, there are free hugetlb pages but they are all reserved.  There
      is a basic problem with the way the test allocates hugetlb pages which
      has existed since the test was originally written.
      
      Due to the way 'cleanup' was done between different phases of the test,
      this issue was masked until recently.  The issue was uncovered by commit
      8ba6e864 ("userfaultfd/selftests: reinitialize test context in each
      test").
      
      For the hugetlb test, src and dst areas are allocated as PRIVATE
      mappings of a hugetlb file.  This means that at mmap time, pages are
      reserved for the src and dst areas.  At the start of event testing (and
      other tests) the src area is populated which results in allocation of
      huge pages to fill the area and consumption of reserves associated with
      the area.  Then, a child is forked to fault in the dst area.  Note that
      the dst area was allocated in the parent and hence the parent owns the
      reserves associated with the mapping.  The child has normal access to
      the dst area, but can not use the reserves created/owned by the parent.
      Thus, if there are no other huge pages available allocation of a page
      for the dst by the child will fail.
      
      Fix by not creating reserves for the dst area.  In this way the child
      can use free (non-reserved) pages.
      
      Also, MAP_PRIVATE of a file only makes sense if you are interested in
      the contents of the file before making a COW copy.  The test does not do
      this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
      hugetlb mapping.  There is no need to create a hugetlb file in the
      non-shared case.
      
      Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5c73297
  9. 30 12月, 2021 3 次提交
  10. 29 12月, 2021 4 次提交
  11. 26 12月, 2021 1 次提交
  12. 24 12月, 2021 1 次提交
  13. 22 12月, 2021 2 次提交
    • K
      tools headers UAPI: Add new macros for mem_hops field to perf_event.h · 7fbddf40
      Kajol Jain 提交于
      Add new macros for mem_hops field which can be used to represent
      remote-node, socket and board level details.
      
      Currently the code had macro for HOPS_0 which, corresponds to data
      coming from another core but same node.  Add new macros for HOPS_1 to
      HOPS_3 to represent remote-node, socket and board level data.
      
      Also add corresponding strings in the mem_hops array to represent
      mem_hop field data in perf_mem__lvl_scnprintf function
      
      Incase mem_hops field is used, PERF_MEM_LVLNUM field also need to be set
      inorder to represent the data source. Hence printing data source via
      PERF_MEM_LVL field can be skip in that scenario.
      
      For ex: Encodings for mem_hops fields with L2 cache:
      
        L2                      - local L2
        L2 | REMOTE | HOPS_0    - remote core, same node L2
        L2 | REMOTE | HOPS_1    - remote node, same socket L2
        L2 | REMOTE | HOPS_2    - remote socket, same board L2
        L2 | REMOTE | HOPS_3    - remote board L2
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20211206091749.87585-3-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7fbddf40
    • A
      perf arm64: Inject missing frames when using 'perf record --call-graph=fp' · b9f6fbb3
      Alexandre Truong 提交于
      When unwinding using frame pointers on ARM64, the return address of the
      current function may not have been pushed into the stack when a function
      was interrupted, which makes perf show an incorrect call graph to the
      user.
      
      Consider the following example program:
      
        void leaf() {
            /* long computation */
        }
      
        void parent() {
            // (1)
            leaf();
            // (2)
        }
      
        ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):
      
        leaf:
            /* long computation */
            nop
            ret
        parent:
            // (1)
            stp     x29, x30, [sp, -16]!
            mov     x29, sp
            bl      parent
            nop
            ldp     x29, x30, [sp], 16
            // (2)
            ret
      
      If the program is interrupted at (1), (2), or any point in "leaf:", the
      call graph will skip the callers of the current function. We can unwind
      using the dwarf info and check if the return addr is the same as the LR
      register, and inject the missing frame into the call graph.
      
      Before this patch, the above example shows the following call-graph when
      recording using "--call-graph fp" mode in ARM64:
      
        # Children      Self  Command   Shared Object     Symbol
        # ........  ........  ........  ................  ......................
        #
            99.86%    99.86%  program3  program3          [.] leaf
        	    |
        	    ---_start
        	       __libc_start_main
        	       main
        	       leaf
      
      As can be seen, the "parent" function is missing. This is specially
      problematic in "leaf" because for leaf functions the compiler may always
      omit pushing the return addr into the stack. After this patch, it shows
      the correct graph:
      
        # Children      Self  Command   Shared Object     Symbol
        # ........  ........  ........  ................  ......................
        #
            99.86%    99.86%  program3  program3          [.] leaf
        	    |
        	    ---_start
        	       __libc_start_main
        	       main
        	       parent
        	       leaf
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NAlexandre Truong <alexandre.truong@arm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20211217154521.80603-7-german.gomez@arm.comSigned-off-by: NGerman Gomez <german.gomez@arm.com>
      [ Rename machine__normalize_is() to machine__normalized_is(), as suggested by James Clark ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b9f6fbb3