1. 14 1月, 2020 1 次提交
  2. 06 1月, 2020 3 次提交
  3. 21 12月, 2019 2 次提交
  4. 11 12月, 2019 3 次提交
    • M
      perf header: Fix false warning when there are no duplicate cache entries · 28707826
      Michael Petlan 提交于
      Before this patch, perf expected that there might be NPROC*4 unique
      cache entries at max, however, it also expected that some of them would
      be shared and/or of the same size, thus the final number of entries
      would be reduced to be lower than NPROC*4. In case the number of entries
      hadn't been reduced (was NPROC*4), the warning was printed.
      
      However, some systems might have unusual cache topology, such as the
      following two-processor KVM guest:
      
      	cpu  level  shared_cpu_list  size
      	  0     1         0           32K
      	  0     1         0           64K
      	  0     2         0           512K
      	  0     3         0           8192K
      	  1     1         1           32K
      	  1     1         1           64K
      	  1     2         1           512K
      	  1     3         1           8192K
      
      This KVM guest has 8 (NPROC*4) unique cache entries, which used to make
      perf printing the message, although there actually aren't "way too many
      cpu caches".
      
      v2: Removing unused argument.
      
      v3: Unifying the way we obtain number of cpus.
      
      v4: Removed '& UINT_MAX' construct which is redundant.
      Signed-off-by: NMichael Petlan <mpetlan@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      LPU-Reference: 20191208162056.20772-1-mpetlan@redhat.com
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      28707826
    • K
      perf metricgroup: Fix printing event names of metric group with multiple events · eb573e74
      Kajol Jain 提交于
      Commit f01642e4 ("perf metricgroup: Support multiple events for
      metricgroup") introduced support for multiple events in a metric group.
      But with the current upstream, metric events names are not printed
      properly
      
      In power9 platform:
      
      command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
           1.000208486
           2.000368863
           2.001400558
      
      Similarly in skylake platform:
      
      command:./perf stat --metric-only -M Power -I 1000
           1.000579994
           2.002189493
      
      With current upstream version, issue is with event name comparison logic
      in find_evsel_group(). Current logic is to compare events belonging to a
      metric group to the events in perf_evlist.  Since the break statement is
      missing in the loop used for comparison between metric group and
      perf_evlist events, the loop continues to execute even after getting a
      pattern match, and end up in discarding the matches.
      
      Incase of single metric event belongs to metric group, its working fine,
      because in case of single event once it compare all events it reaches to
      end of perf_evlist.
      
      Example for single metric event in power9 platform:
      
      command:# ./perf stat --metric-only  -M branches_per_inst -I 1000 sleep 1
           1.000094653                  0.2
           1.001337059                  0.0
      
      This patch fixes the issue by making sure once we found all events
      belongs to that metric event matched in find_evsel_group(), we
      successfully break from that loop by adding corresponding condition.
      
      With this patch:
      In power9 platform:
      
      command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
      result:#
                  time  derat_4k_miss_rate_percent  derat_4k_miss_ratio derat_miss_ratio derat_64k_miss_rate_percent  derat_64k_miss_ratio dslb_miss_rate_percent islb_miss_rate_percent
           1.000135672                         0.0                  0.3              1.0                         0.0                   0.2                    0.0                    0.0
           2.000380617                         0.0                  0.0              0.0                         0.0                   0.0                    0.0                    0.0
      
      command:# ./perf stat --metric-only -M Power -I 1000
      
      Similarly in skylake platform:
      result:#
                  time    Turbo_Utilization    C3_Core_Residency  C6_Core_Residency  C7_Core_Residency    C2_Pkg_Residency  C3_Pkg_Residency     C6_Pkg_Residency   C7_Pkg_Residency
           1.000563580                  0.3                  0.0                2.6               44.2                21.9               0.0                  0.0               0.0
           2.002235027                  0.4                  0.0                2.7               43.0                20.7               0.0                  0.0               0.0
      
      Committer testing:
      
        Before:
      
        [root@seventh ~]# perf stat --metric-only -M Power -I 1000
        #           time
             1.000383223
             2.001168182
             3.001968545
             4.002741200
             5.003442022
        ^C     5.777687244
      
        [root@seventh ~]#
      
        After the patch:
      
        [root@seventh ~]# perf stat --metric-only -M Power -I 1000
        #           time    Turbo_Utilization    C3_Core_Residency    C6_Core_Residency    C7_Core_Residency     C2_Pkg_Residency     C3_Pkg_Residency     C6_Pkg_Residency     C7_Pkg_Residency
             1.000406577                  0.4                  0.1                  1.4                 97.0                  0.0                  0.0                  0.0                  0.0
             2.001481572                  0.3                  0.0                  0.6                 97.9                  0.0                  0.0                  0.0                  0.0
             3.002332585                  0.2                  0.0                  1.0                 97.5                  0.0                  0.0                  0.0                  0.0
             4.003196624                  0.2                  0.0                  0.3                 98.6                  0.0                  0.0                  0.0                  0.0
             5.004063851                  0.3                  0.0                  0.7                 97.7                  0.0                  0.0                  0.0                  0.0
        ^C     5.471260276                  0.2                  0.0                  0.5                 49.3                  0.0                  0.0                  0.0                  0.0
      
        [root@seventh ~]#
        [root@seventh ~]# dmesg | grep -i skylake
        [    0.187807] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
        [root@seventh ~]#
      
      Fixes: f01642e4 ("perf metricgroup: Support multiple events for metricgroup")
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Reviewed-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20191120084059.24458-1-kjain@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eb573e74
    • A
      perf arch: Make the default get_cpuid() return compatible error · 05267c7e
      Arnaldo Carvalho de Melo 提交于
      Some of the functions calling get_cpuid() propagate back the error it
      returns, and all are using errno (positive) values, make the weak
      default get_cpuid() function return ENOSYS to be consistent and to allow
      checking if this is an arch not providing this function or if a provided
      one is having trouble getting the cpuid, to decide if the warning should
      be provided to the user or just a debug message should be emitted.
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: John Garry <john.garry@huawei.com> # arm64
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/n/tip-lxwjr0cd2eggzx04a780ffrv@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05267c7e
  5. 04 12月, 2019 2 次提交
  6. 02 12月, 2019 1 次提交
    • A
      perf bench: Update the copies of x86's mem{cpy,set}_64.S · bd5c6b81
      Arnaldo Carvalho de Melo 提交于
      And update linux/linkage.h, which requires in turn that we make these
      files switch from ENTRY()/ENDPROC() to SYM_FUNC_START()/SYM_FUNC_END():
      
        tools/perf/arch/arm64/tests/regs_load.S
        tools/perf/arch/arm/tests/regs_load.S
        tools/perf/arch/powerpc/tests/regs_load.S
        tools/perf/arch/x86/tests/regs_load.S
      
      We also need to switch SYM_FUNC_START_LOCAL() to SYM_FUNC_START() for
      the functions used directly by 'perf bench', and update
      tools/perf/check_headers.sh to ignore those changes when checking if the
      kernel original files drifted from the copies we carry.
      
      This is to get the changes from:
      
        6dcc5627 ("x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*")
        ef1e0315 ("x86/asm: Make some functions local")
        e9b9d020 ("x86/asm: Annotate aliases")
      
      And address these tools/perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
        diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
        Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
        diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-tay3l8x8k11p7y3qcpqh9qh5@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bd5c6b81
  7. 30 11月, 2019 1 次提交
  8. 29 11月, 2019 7 次提交
  9. 28 11月, 2019 3 次提交
    • A
      perf affinity: Add infrastructure to save/restore affinity · 267ed5d8
      Andi Kleen 提交于
      The kernel perf subsystem has to IPI to the target CPU for many
      operations. On systems with many CPUs and when managing many events the
      overhead can be dominated by lots of IPIs.
      
      An alternative is to set up CPU affinity in the perf tool, then set up
      all the events for that CPU, and then move on to the next CPU.
      
      Add some affinity management infrastructure to enable such a model.
      Used in followon patches.
      
      Committer notes:
      
      Use zfree() in some places, add missing stdbool.h header, some minor
      coding style changes.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191121001522.180827-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      267ed5d8
    • A
      perf pmu: Use file system cache to optimize sysfs access · d9664582
      Andi Kleen 提交于
      pmu.c does a lot of redundant /sys accesses while parsing aliases
      and probing for PMUs. On large systems with a lot of PMUs this
      can get expensive (>2s):
      
        % time     seconds  usecs/call     calls    errors syscall
        ------ ----------- ----------- --------- --------- ----------------
         27.25    1.227847           8    160888     16976 openat
         26.42    1.190481           7    164224    164077 stat
      
      Add a cache to remember if specific file names exist or don't
      exist, which eliminates most of this overhead.
      
      Also optimize some stat() calls to be slightly cheaper access()
      
      Resulting in:
      
          0.18    0.004166           2      1851       305 open
          0.08    0.001970           2       829       622 access
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191121001522.180827-2-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d9664582
    • A
      perf regs: Make perf_reg_name() return "unknown" instead of NULL · 5b596e0f
      Arnaldo Carvalho de Melo 提交于
      To avoid breaking the build on arches where this is not wired up, at
      least all the other features should be made available and when using
      this specific routine, the "unknown" should point the user/developer to
      the need to wire this up on this particular hardware architecture.
      
      Detected in a container mipsel debian cross build environment, where it
      shows up as:
      
        In file included from /usr/mipsel-linux-gnu/include/stdio.h:867,
                         from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
                         from util/session.c:13:
        In function 'printf',
            inlined from 'regs_dump__printf' at util/session.c:1103:3,
            inlined from 'regs__printf' at util/session.c:1131:2:
        /usr/mipsel-linux-gnu/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
          107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      cross compiler details:
      
        mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909
      
      Also on mips64:
      
        In file included from /usr/mips64-linux-gnuabi64/include/stdio.h:867,
                         from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
                         from util/session.c:13:
        In function 'printf',
            inlined from 'regs_dump__printf' at util/session.c:1103:3,
            inlined from 'regs__printf' at util/session.c:1131:2,
            inlined from 'regs_user__printf' at util/session.c:1139:3,
            inlined from 'dump_sample' at util/session.c:1246:3,
            inlined from 'machines__deliver_event' at util/session.c:1421:3:
        /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
          107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In function 'printf',
            inlined from 'regs_dump__printf' at util/session.c:1103:3,
            inlined from 'regs__printf' at util/session.c:1131:2,
            inlined from 'regs_intr__printf' at util/session.c:1147:3,
            inlined from 'dump_sample' at util/session.c:1249:3,
            inlined from 'machines__deliver_event' at util/session.c:1421:3:
        /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
          107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      cross compiler details:
      
        mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909
      
      Fixes: 2bcd355b ("perf tools: Add interface to arch registers sets")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-95wjyv4o65nuaeweq31t7l1s@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5b596e0f
  10. 26 11月, 2019 10 次提交
  11. 22 11月, 2019 7 次提交