1. 18 12月, 2018 30 次提交
  2. 22 11月, 2018 10 次提交
    • K
      perf pmu: Move *_cpuid_str() weak functions to header.c · f4a0742b
      Kan Liang 提交于
      The weak functions, strcmp_cpuid_str() and get_cpuid_str(), are defined
      in pmu.c.
      
      Most of the cpuid related functions, including *_cpuid_str()'s
      declaration and platform specific definition, are in header.c/h.
      
      To make the declaration and definition of all cpuid related functions in
      a consistent place, move the weak functions to header.c.
      
      There is no functional change.
      Suggested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20181121164939.13482-1-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f4a0742b
    • E
      perf symbols: Fix slowness due to -ffunction-section · 1e628569
      Eric Saint-Etienne 提交于
      Perf can take minutes to parse an image when -ffunction-section is used.
      This is especially true with the kernel image when it is compiled this
      way, which is the arm64 default since the patcheset "Enable deadcode
      elimination at link time".
      
      Perf organize maps using a rbtree. Whenever perf finds a new symbols, it
      first searches this rbtree for the map it belongs to, by strcmp()'aring
      section names.  When it finds the map with the right name, it uses it to
      add the symbol. With a usual image there aren't so many maps but when
      using -ffunction-section there's basically one map per function.  With
      the kernel image that's north of 40,000 maps. For most symbols perf has
      to parses the entire rbtree to eventually create a new map and add it.
      Consequently perf spends most of the time browsing a rbtree that keeps
      getting larger.
      
      This performance fix introduces a secondary rbtree that indexes maps
      based on the section name.
      Signed-off-by: NEric Saint-Etienne <eric.saint.etienne@oracle.com>
      Reviewed-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: NDavid Aldridge <david.aldridge@oracle.com>
      Reviewed-by: NRob Gardner <rob.gardner@oracle.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1542822679-25591-1-git-send-email-eric.saint.etienne@oracle.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1e628569
    • J
      perf jvmti: Separate jvmti cmlr check · dd1d0044
      Jiri Olsa 提交于
      The Compiled Method Load Record (cmlr) is JDK specific interface to
      access JVM stack info. This makes the jvmti agent code not compile under
      another jdk, which does not support that.
      
      Separating jvmti cmlr check into special feature check, and adding
      HAVE_JVMTI_CMLR macro to indicate that.
      
      Mark cmlr code in jvmti/libjvmti.c with HAVE_JVMTI_CMLR, so we can
      compile it on system without cmlr support.
      
      This change makes the jvmti compile with java-1.8.0-ibm package. It's
      without the line numbers support, but the rest works.
      
      Adding NO_JVMTI_CMLR compile variable for testing.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20181121154341.21521-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dd1d0044
    • K
      perf vendor events: Add JSON metrics for Cascadelake server · ecd94f1b
      Kan Liang 提交于
      Add JSON metrics (based on event list v1) for Cascadelake server
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/3ab97c73-c197-8555-1a35-b54636e667e6@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ecd94f1b
    • K
      perf vendor events: Add stepping in CPUID string for x86 · 3b54411a
      Kan Liang 提交于
      The perf tools cannot find the proper event list for the Cascadelake
      server.  Because the Cascadelake server and the Skylake server have the
      same CPU model number, which are used by the perf tools to find the
      event list.
      
      The stepping for Skylake server is up to 4.
      
      The stepping for Cascadelake server starts from 5.
      
      The stepping can be used to distinguish between them.
      
      The stepping is added in get_cpuid_str().
      
      The stepping information for Skylake server is updated in mapfile.csv.
      
      A x86 specific strcmp_cpuid_cmp() function is added to handle two CPUID
      formats in mapfile.csv, "vendor-family-model-stepping" and
      "vendor-family-model":
      
      - If a cpuid-regular-expression from the mapfile.csv using the new
        stepping format, a cpuid-string generated on the machine must include
        stepping. Otherwise, it is a mismatch.
      
      - If the cpuid-regular-expression using the old non-stepping format,
        the stepping in the cpuid-string will be ignored.
      
      The script, using environment string "PERF_CPUID" without stepping on
      Skylake server, will be broken. If so, users must fix their scripts.
      
      Committer notes:
      
      Fixed this build error on centos:6 and debian:7:
      
        arch/x86/util/header.c: In function 'is_full_cpuid':
        arch/x86/util/header.c:82:39: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
        arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
        arch/x86/util/header.c: In function 'strcmp_cpuid_str':
        arch/x86/util/header.c:98:56: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
        arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
        cc1: all warnings being treated as errors
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20181114212416.15665-1-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b54411a
    • R
      perf stat: Use perf_evsel__is_clocki() for clock events · eb08d006
      Ravi Bangoria 提交于
      We already have function to check if a given event is either
      SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: yuzhoujian@didichuxing.com
      Link: http://lkml.kernel.org/r/20181115095533.16930-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eb08d006
    • B
      perf pmu: Suppress potential format-truncation warning · 11a64a05
      Ben Hutchings 提交于
      Depending on which functions are inlined in util/pmu.c, the snprintf()
      calls in perf_pmu__parse_{scale,unit,per_pkg,snapshot}() might trigger a
      warning:
      
        util/pmu.c: In function 'pmu_aliases':
        util/pmu.c:178:31: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
          snprintf(path, PATH_MAX, "%s/%s.unit", dir, name);
                                     ^~
      
      I found this when trying to build perf from Linux 3.16 with gcc 8.
      However I can reproduce the problem in mainline if I force
      __perf_pmu__new_alias() to be inlined.
      
      Suppress this by using scnprintf() as has been done elsewhere in perf.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181111184524.fux4taownc6ndbx6@decadent.org.ukSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      11a64a05
    • P
      perf tools: Add Hygon Dhyana support · 4787eff3
      Pu Wen 提交于
      The tool perf is useful for the performance analysis on the Hygon Dhyana
      platform. But right now there is no Hygon support for it to analyze the
      KVM guest os data. So add Hygon Dhyana support to it by checking vendor
      string to share the code path of AMD.
      Signed-off-by: NPu Wen <puwen@hygon.cn>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1542008451-31735-1-git-send-email-puwen@hygon.cnSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4787eff3
    • D
      perf bench: Add epoll_ctl(2) benchmark · 231457ec
      Davidlohr Bueso 提交于
      Benchmark the various operations allowed for epoll_ctl(2).  The idea is
      to concurrently stress a single epoll instance doing add/mod/del
      operations.
      
      Committer testing:
      
        # perf bench epoll ctl
        # Running 'epoll/ctl' benchmark:
        Run summary [PID 20344]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.
      
        [thread  0] fdmap: 0x21a46b0 ... 0x21a47ac [ add: 1680960 ops; mod: 1680960 ops; del: 1680960 ops ]
        [thread  1] fdmap: 0x21a4960 ... 0x21a4a5c [ add: 1685440 ops; mod: 1685440 ops; del: 1685440 ops ]
        [thread  2] fdmap: 0x21a4c10 ... 0x21a4d0c [ add: 1674368 ops; mod: 1674368 ops; del: 1674368 ops ]
        [thread  3] fdmap: 0x21a4ec0 ... 0x21a4fbc [ add: 1677568 ops; mod: 1677568 ops; del: 1677568 ops ]
      
        Averaged 1679584 ADD operations (+- 0.14%)
        Averaged 1679584 MOD operations (+- 0.14%)
        Averaged 1679584 DEL operations (+- 0.14%)
        #
      
      Lets measure those calls with 'perf trace' to get a glympse at what this
      benchmark is doing in terms of syscalls:
      
        # perf trace -m32768 -s perf bench epoll ctl
        # Running 'epoll/ctl' benchmark:
        Run summary [PID 20405]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.
      
        [thread  0] fdmap: 0x21764e0 ... 0x21765dc [ add: 1100480 ops; mod: 1100480 ops; del: 1100480 ops ]
        [thread  1] fdmap: 0x2176790 ... 0x217688c [ add: 1250176 ops; mod: 1250176 ops; del: 1250176 ops ]
        [thread  2] fdmap: 0x2176a40 ... 0x2176b3c [ add: 1022464 ops; mod: 1022464 ops; del: 1022464 ops ]
        [thread  3] fdmap: 0x2176cf0 ... 0x2176dec [ add: 705472 ops; mod: 705472 ops; del: 705472 ops ]
      
        Averaged 1019648 ADD operations (+- 11.27%)
        Averaged 1019648 MOD operations (+- 11.27%)
        Averaged 1019648 DEL operations (+- 11.27%)
      
        Summary of events:
      
        epoll-ctl (20405), 1264 events, 0.0%
      
         syscall            calls    total       min       avg       max      stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
         --------------- -------- --------- --------- --------- ---------     ------
         eventfd2             256     9.514     0.001     0.037     5.243     68.00%
         clone                  4     1.245     0.204     0.311     0.531     24.13%
         mprotect              66     0.345     0.002     0.005     0.021      7.43%
         openat                45     0.313     0.004     0.007     0.073     21.93%
         mmap                  88     0.302     0.002     0.003     0.013      5.02%
         futex                  4     0.160     0.002     0.040     0.140     83.43%
         sched_setaffinity      4     0.124     0.005     0.031     0.070     49.39%
         read                  44     0.103     0.001     0.002     0.013     15.54%
         fstat                 40     0.052     0.001     0.001     0.003      5.43%
         close                 39     0.039     0.001     0.001     0.001      1.48%
         stat                   9     0.034     0.003     0.004     0.006      7.30%
         access                 3     0.023     0.007     0.008     0.008      4.25%
         open                   2     0.021     0.008     0.011     0.013     22.60%
         getdents               4     0.019     0.001     0.005     0.009     37.15%
         write                  2     0.013     0.004     0.007     0.009     38.48%
         munmap                 1     0.010     0.010     0.010     0.010      0.00%
         brk                    3     0.006     0.001     0.002     0.003     26.34%
         rt_sigprocmask         2     0.004     0.001     0.002     0.003     43.95%
         rt_sigaction           3     0.004     0.001     0.001     0.002     16.07%
         prlimit64              3     0.004     0.001     0.001     0.001      5.39%
         prctl                  1     0.003     0.003     0.003     0.003      0.00%
         epoll_create           1     0.003     0.003     0.003     0.003      0.00%
         lseek                  2     0.002     0.001     0.001     0.001     11.42%
         sched_getaffinity        1     0.002     0.002     0.002     0.002      0.00%
         arch_prctl             1     0.002     0.002     0.002     0.002      0.00%
         set_tid_address        1     0.001     0.001     0.001     0.001      0.00%
         getpid                 1     0.001     0.001     0.001     0.001      0.00%
         set_robust_list        1     0.001     0.001     0.001     0.001      0.00%
         execve                 1     0.000     0.000     0.000     0.000      0.00%
      
       epoll-ctl (20406), 1245480 events, 14.6%
      
         syscall            calls    total       min       avg       max      stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
         --------------- -------- --------- --------- --------- ---------     ------
         epoll_ctl         619511  1034.927     0.001     0.002     6.691      0.67%
         nanosleep           3226   616.114     0.006     0.191    10.376      7.57%
         futex                  2    11.336     0.002     5.668    11.334     99.97%
         set_robust_list        1     0.001     0.001     0.001     0.001      0.00%
         clone                  1     0.000     0.000     0.000     0.000      0.00%
      
       epoll-ctl (20407), 1243151 events, 14.5%
      
         syscall            calls    total       min       avg       max      stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
         --------------- -------- --------- --------- --------- ---------     ------
         epoll_ctl         618350  1042.181     0.001     0.002     2.512      0.40%
         nanosleep           3220   366.261     0.012     0.114    18.162      9.59%
         futex                  4     5.463     0.001     1.366     5.427     99.12%
         set_robust_list        1     0.002     0.002     0.002     0.002      0.00%
      
       epoll-ctl (20408), 1801690 events, 21.1%
      
         syscall            calls    total       min       avg       max      stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
         --------------- -------- --------- --------- --------- ---------     ------
         epoll_ctl         896174  1540.581     0.001     0.002     6.987      0.74%
         nanosleep           4667   783.393     0.006     0.168    10.419      7.10%
         futex                  2     4.682     0.002     2.341     4.681     99.93%
         set_robust_list        1     0.002     0.002     0.002     0.002      0.00%
         clone                  1     0.000     0.000     0.000     0.000      0.00%
      
       epoll-ctl (20409), 4254890 events, 49.8%
      
         syscall            calls    total       min       avg       max      stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
         --------------- -------- --------- --------- --------- ---------     ------
         epoll_ctl        2116416  3768.097     0.001     0.002     9.956      0.41%
         nanosleep          11023  1141.778     0.006     0.104     9.447      4.95%
         futex                  3     0.037     0.002     0.012     0.029     70.50%
         set_robust_list        1     0.008     0.008     0.008     0.008      0.00%
         madvise                1     0.005     0.005     0.005     0.005      0.00%
         clone                  1     0.000     0.000     0.000     0.000      0.00%
        #
      
      Committer notes:
      
      Fix build on fedora:24-x-ARC-uClibc, debian:experimental-x-mips,
      debian:experimental-x-mipsel, ubuntu:16.04-x-arm and ubuntu:16.04-x-powerpc
      
          CC       /tmp/build/perf/bench/epoll-ctl.o
        bench/epoll-ctl.c: In function 'init_fdmaps':
        bench/epoll-ctl.c:214:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
          for (i = 0; i < nfds; i+=inc) {
                        ^
        bench/epoll-ctl.c: In function 'bench_epoll_ctl':
        bench/epoll-ctl.c:377:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
          for (i = 0; i < nthreads; i++) {
                        ^
        bench/epoll-ctl.c:388:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
          for (i = 0; i < nthreads; i++) {
                        ^
        cc1: all warnings being treated as errors
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Link: http://lkml.kernel.org/r/20181106152226.20883-3-dave@stgolabs.net
      [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
      [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      231457ec
    • D
      perf bench: Add epoll parallel epoll_wait benchmark · 121dd9ea
      Davidlohr Bueso 提交于
      This program benchmarks concurrent epoll_wait(2) for file descriptors
      that are monitored with with EPOLLIN along various semantics, by a
      single epoll instance. Such conditions can be found when using
      single/combined or multiple queuing when load balancing.
      
      Each thread has a number of private, nonblocking file descriptors,
      referred to as fdmap. A writer thread will constantly be writing to the
      fdmaps of all threads, minimizing each threads's chances of epoll_wait
      not finding any ready read events and blocking as this is not what we
      want to stress. Full details in the start of the C file.
      
      Committer testing:
      
        # perf bench
        Usage:
      	perf bench [<common options>] <collection> <benchmark> [<options>]
      
              # List of all available benchmark collections:
      
               sched: Scheduler and IPC benchmarks
                 mem: Memory access benchmarks
                numa: NUMA scheduling and MM benchmarks
               futex: Futex stressing benchmarks
               epoll: Epoll stressing benchmarks
                 all: All benchmarks
      
        # perf bench epoll
      
              # List of available benchmarks for collection 'epoll':
      
                wait: Benchmark epoll concurrent epoll_waits
                 all: Run all futex benchmarks
      
        # perf bench epoll wait
        # Running 'epoll/wait' benchmark:
        Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
      
        [thread  0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
        [thread  1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
        [thread  2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
      
        Averaged 353786 operations/sec (+- 4.35%), total secs = 8
        #
      
      Committer notes:
      
      Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
      and others:
      
          CC       /tmp/build/perf/bench/epoll-wait.o
        bench/epoll-wait.c: In function 'writerfn':
        bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
          printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~
        bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
          do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
                                       ^~~
        cc1: all warnings being treated as errors
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
      Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
      Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
      [ Applied above fixup as per Davidlohr's request ]
      [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
      [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      121dd9ea