1. 10 8月, 2022 1 次提交
    • C
      perf stat: Add JSON output option · df936cad
      Claire Jensen 提交于
      CSV output is tricky to format and column layout changes are susceptible
      to breaking parsers. New JSON-formatted output has variable names to
      identify fields that are consistent and informative, making the output
      parseable.
      
      CSV output example:
      
        1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
        0,,context-switches:u,1204272,100.00,0.000,/sec
        0,,cpu-migrations:u,1204272,100.00,0.000,/sec
        70,,page-faults:u,1204272,100.00,58.126,K/sec
      
      JSON output example:
      
        {"counter-value" : "3805.723968", "unit" : "msec", "event" :
        "cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
        : 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "6166.000000", "unit" : "", "event" :
        "context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
        : 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
        {"counter-value" : "466.000000", "unit" : "", "event" :
        "cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
        : 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
        {"counter-value" : "208.000000", "unit" : "", "event" :
        "page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
        : 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
      
      Also added documentation for JSON option.
      
      There is some tidy up of CSV code including a potential memory over run
      in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
      
      Committer notes:
      
      Fixed up using PRIu64 to format u64 values, not %lu.
      
      Committer testing:
      
        ⬢[acme@toolbox perf]$ perf stat -j sleep 1
        {"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
        {"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
        {"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
        {"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
        {"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
        {"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
        {"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
        {"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
        {"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
        ⬢[acme@toolbox perf]$
      Signed-off-by: NClaire Jensen <cjense@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alyssa Ross <hi@alyssa.is>
      Cc: Claire Jensen <clairej735@gmail.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.comSigned-off-by: NIan Rogers <irogers@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      df936cad
  2. 03 8月, 2022 7 次提交
  3. 02 8月, 2022 3 次提交
    • A
      tools perf: Fix compilation error with new binutils · 83aa0120
      Andres Freund 提交于
      binutils changed the signature of init_disassemble_info(), which now causes
      compilation failures for tools/perf/util/annotate.c, e.g. on debian
      unstable.
      
      Relevant binutils commit:
      
        https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
      
      Wire up the feature test and switch to init_disassemble_info_compat(),
      which were introduced in prior commits, fixing the compilation failure.
      
      I verified that perf can still disassemble bpf programs by using bpftrace
      under load, recording a perf trace, and then annotating the bpf "function"
      with and without the changes. With old binutils there's no change in output
      before/after this patch. When comparing the output from old binutils (2.35)
      to new bintuils with the patch (upstream snapshot) there are a few output
      differences, but they are unrelated to this patch. An example hunk is:
      
             1.15 :   55:mov    %rbp,%rdx
             0.00 :   58:add    $0xfffffffffffffff8,%rdx
             0.00 :   5c:xor    %ecx,%ecx
        -    1.03 :   5e:callq  0xffffffffe12aca3c
        +    1.03 :   5e:call   0xffffffffe12aca3c
             0.00 :   63:xor    %eax,%eax
        -    2.18 :   65:leaveq
        -    2.82 :   66:retq
        +    2.18 :   65:leave
        +    2.82 :   66:ret
      Signed-off-by: NAndres Freund <andres@anarazel.de>
      Acked-by: NQuentin Monnet <quentin@isovalent.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Ben Hutchings <benh@debian.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: bpf@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
      Link: https://lore.kernel.org/r/20220801013834.156015-5-andres@anarazel.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83aa0120
    • J
      perf tools: Rework prologue generation code · 5f4e821c
      Jiri Olsa 提交于
      Some functions we use for bpf prologue generation are going to be
      deprecated. This change reworks current code not to use them.
      
      We need to replace following functions/struct:
         bpf_program__set_prep
         bpf_program__nth_fd
         struct bpf_prog_prep_result
      
      Currently we use bpf_program__set_prep to hook perf callback before
      program is loaded and provide new instructions with the prologue.
      
      We replace this function/ality by taking instructions for specific
      program, attaching prologue to them and load such new ebpf programs
      with prologue using separate bpf_prog_load calls (outside libbpf
      load machinery).
      
      Before we can take and use program instructions, we need libbpf to
      actually load it. This way we get the final shape of its instructions
      with all relocations and verifier adjustments).
      
      There's one glitch though.. perf kprobe program already assumes
      generated prologue code with proper values in argument registers,
      so loading such program directly will fail in the verifier.
      
      That's where the fallback pre-load handler fits in and prepends
      the initialization code to the program. Once such program is loaded
      we take its instructions, cut off the initialization code and prepend
      the prologue.
      
      I know.. sorry ;-)
      
      To have access to the program when loading this patch adds support to
      register 'fallback' section handler to take care of perf kprobe programs.
      The fallback means that it handles any section definition besides the
      ones that libbpf handles.
      
      The handler serves two purposes:
        - allows perf programs to have special arguments in section name
        - allows perf to use pre-load callback where we can attach init
          code (zeroing all argument registers) to each perf program
      Suggested-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/20220616202214.70359-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5f4e821c
    • J
      perf bpf: Convert legacy map definition to BTF-defined · 8b1e1a03
      Jiri Olsa 提交于
      The libbpf is switching off support for legacy map definitions [1],
      which will break the perf llvm tests.
      
      Moving the base source map definition to BTF-defined, so we need
      to use -g compile option for to add debug/BTF info.
      
      [1] https://lore.kernel.org/bpf/20220627211527.2245459-1-andrii@kernel.org/Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20220704152721.352046-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8b1e1a03
  4. 01 8月, 2022 3 次提交
    • I
      perf symbol: Fail to read phdr workaround · 6d518ac7
      Ian Rogers 提交于
      The perf jvmti agent doesn't create program headers, in this case
      fallback on section headers as happened previously.
      
      Committer notes:
      
      To test this, from a public post by Ian:
      
      1) download a Java workload dacapo-9.12-MR1-bach.jar from
      https://sourceforge.net/projects/dacapobench/
      
      2) build perf such as "make -C tools/perf O=/tmp/perf NO_LIBBFD=1" it
      should detect Java and create /tmp/perf/libperf-jvmti.so
      
      3) run perf with the jvmti agent:
      
        perf record -k 1 java -agentpath:/tmp/perf/libperf-jvmti.so -jar dacapo-9.12-MR1-bach.jar -n 10 fop
      
      4) run perf inject:
      
        perf inject -i perf.data -o perf-injected.data -j
      
      5) run perf report
      
        perf report -i perf-injected.data | grep org.apache.fop
      
      With this patch reverted I see lots of symbols like:
      
           0.00%  java             jitted-388040-4656.so  [.] org.apache.fop.fo.FObj.bind(org.apache.fop.fo.PropertyList)
      
      With the patch (2d86612a ("perf symbol: Correct address for bss
      symbols")) I see lots of:
      
        dso__load_sym_internal: failed to find program header for symbol:
        Lorg/apache/fop/fo/FObj;bind(Lorg/apache/fop/fo/PropertyList;)V
        st_value: 0x40
      
      Fixes: 2d86612a ("perf symbol: Correct address for bss symbols")
      Reviewed-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20220731164923.691193-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d518ac7
    • N
      perf lock: Implement cpu and task filters for BPF · 6fda2405
      Namhyung Kim 提交于
      Add -a/--all-cpus and -C/--cpu options for cpu filtering.  Also -p/--pid
      and --tid options are added for task filtering.  The short -t option is
      taken for --threads already.  Tracking the command line workload is
      possible as well.
      
        $ sudo perf lock contention -a -b sleep 1
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220729200756.666106-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6fda2405
    • N
      perf lock: Use BPF for lock contention analysis · 407b36f6
      Namhyung Kim 提交于
      Add -b/--use-bpf option to use BPF to collect lock contention stats.
      For simplicity it now runs system-wide and requires C-c to stop.
      Upcoming changes will add the usual filtering.
      
        $ sudo perf lock con -b
        ^C
         contended   total wait     max wait     avg wait         type   caller
      
                42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
                23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
                 6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
                 3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
                 1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
                 1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
                 2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
                 1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
                 2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
                 1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Blake Jones <blakejones@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220729200756.666106-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      407b36f6
  5. 30 7月, 2022 4 次提交
    • Z
      perf stat: Add topdown metrics in the default perf stat on the hybrid machine · 9a0b3626
      Zhengjun Xing 提交于
      Topdown metrics are missed in the default perf stat on the hybrid machine,
      add Topdown metrics in default perf stat for hybrid systems.
      
      Currently, we support the perf metrics Topdown for the p-core PMU in the
      perf stat default, the perf metrics Topdown support for e-core PMU will be
      implemented later separately. Refactor the code adds two x86 specific
      functions. Widen the size of the event name column by 7 chars, so that all
      metrics after the "#" become aligned again.
      
      The perf metrics topdown feature is supported on the cpu_core of ADL. The
      dedicated perf metrics counter and the fixed counter 3 are used for the
      topdown events. Adding the topdown metrics doesn't trigger multiplexing.
      
      Before:
      
       # ./perf  stat  -a true
      
       Performance counter stats for 'system wide':
      
                   53.70 msec cpu-clock                 #   25.736 CPUs utilized
                      80      context-switches          #    1.490 K/sec
                      24      cpu-migrations            #  446.951 /sec
                      52      page-faults               #  968.394 /sec
               2,788,555      cpu_core/cycles/          #   51.931 M/sec
                 851,129      cpu_atom/cycles/          #   15.851 M/sec
               2,974,030      cpu_core/instructions/    #   55.385 M/sec
                 416,919      cpu_atom/instructions/    #    7.764 M/sec
                 586,136      cpu_core/branches/        #   10.916 M/sec
                  79,872      cpu_atom/branches/        #    1.487 M/sec
                  14,220      cpu_core/branch-misses/   #  264.819 K/sec
                   7,691      cpu_atom/branch-misses/   #  143.229 K/sec
      
             0.002086438 seconds time elapsed
      
      After:
      
       # ./perf stat  -a true
      
       Performance counter stats for 'system wide':
      
                   61.39 msec cpu-clock                        #   24.874 CPUs utilized
                      76      context-switches                 #    1.238 K/sec
                      24      cpu-migrations                   #  390.968 /sec
                      52      page-faults                      #  847.097 /sec
               2,753,695      cpu_core/cycles/                 #   44.859 M/sec
                 903,899      cpu_atom/cycles/                 #   14.725 M/sec
               2,927,529      cpu_core/instructions/           #   47.690 M/sec
                 428,498      cpu_atom/instructions/           #    6.980 M/sec
                 581,299      cpu_core/branches/               #    9.470 M/sec
                  83,409      cpu_atom/branches/               #    1.359 M/sec
                  13,641      cpu_core/branch-misses/          #  222.216 K/sec
                   8,008      cpu_atom/branch-misses/          #  130.453 K/sec
              14,761,308      cpu_core/slots/                  #  240.466 M/sec
               3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
               1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
               5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
               4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
                 646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
               1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
               3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
               1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
      
             0.002467839 seconds time elapsed
      Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a0b3626
    • K
      perf x86 evlist: Add default hybrid events for perf stat · cdb204ad
      Kan Liang 提交于
      Provide a new solution to replace the reverted commit ac2dc29e
      ("perf stat: Add default hybrid events")
      
      For the default software attrs, nothing is changed.
      
      For the default hardware attrs, create a new evsel for each hybrid pmu.
      
      With the new solution, adding a new default attr will not require the
      special support for the hybrid platform anymore.
      
      Also, the "--detailed" is supported on the hybrid platform
      
      With the patch,
      
        $ perf stat -a -ddd sleep 1
      
         Performance counter stats for 'system wide':
      
               32,231.06 msec cpu-clock                 #   32.056 CPUs utilized
                     529      context-switches          #   16.413 /sec
                      32      cpu-migrations            #    0.993 /sec
                      69      page-faults               #    2.141 /sec
             176,754,151      cpu_core/cycles/          #    5.484 M/sec          (41.65%)
             161,695,280      cpu_atom/cycles/          #    5.017 M/sec          (49.92%)
              48,595,992      cpu_core/instructions/    #    1.508 M/sec          (49.98%)
              32,363,337      cpu_atom/instructions/    #    1.004 M/sec          (58.26%)
              10,088,639      cpu_core/branches/        #  313.010 K/sec          (58.31%)
               6,390,582      cpu_atom/branches/        #  198.274 K/sec          (58.26%)
                 846,201      cpu_core/branch-misses/   #   26.254 K/sec          (66.65%)
                 676,477      cpu_atom/branch-misses/   #   20.988 K/sec          (58.27%)
              14,290,070      cpu_core/L1-dcache-loads/ #  443.363 K/sec          (66.66%)
               9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 K/sec          (58.27%)
                 740,725      cpu_core/L1-dcache-load-misses/ #   22.982 K/sec    (66.66%)
         <not supported>      cpu_atom/L1-dcache-load-misses/
                 480,441      cpu_core/LLC-loads/       #   14.906 K/sec          (66.67%)
                 326,570      cpu_atom/LLC-loads/       #   10.132 K/sec          (58.27%)
                     329      cpu_core/LLC-load-misses/ #   10.208 /sec           (66.68%)
                       0      cpu_atom/LLC-load-misses/ #    0.000 /sec           (58.32%)
         <not supported>      cpu_core/L1-icache-loads/
              21,982,491      cpu_atom/L1-icache-loads/ #  682.028 K/sec          (58.43%)
               4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 K/sec    (33.34%)
               4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 K/sec    (50.08%)
              13,713,090      cpu_core/dTLB-loads/      #  425.462 K/sec          (33.34%)
               9,384,727      cpu_atom/dTLB-loads/      #  291.170 K/sec          (50.08%)
                 157,387      cpu_core/dTLB-load-misses/ #    4.883 K/sec         (33.33%)
                 108,328      cpu_atom/dTLB-load-misses/ #    3.361 K/sec         (50.08%)
         <not supported>      cpu_core/iTLB-loads/
         <not supported>      cpu_atom/iTLB-loads/
                  37,655      cpu_core/iTLB-load-misses/ #    1.168 K/sec         (33.32%)
                  61,661      cpu_atom/iTLB-load-misses/ #    1.913 K/sec         (50.03%)
         <not supported>      cpu_core/L1-dcache-prefetches/
         <not supported>      cpu_atom/L1-dcache-prefetches/
         <not supported>      cpu_core/L1-dcache-prefetch-misses/
         <not supported>      cpu_atom/L1-dcache-prefetch-misses/
      
               1.005466919 seconds time elapsed
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-5-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cdb204ad
    • K
      perf evlist: Always use arch_evlist__add_default_attrs() · a9c1ecda
      Kan Liang 提交于
      Current perf stat uses the evlist__add_default_attrs() to add the
      generic default attrs, and uses arch_evlist__add_default_attrs() to add
      the Arch specific default attrs, e.g., Topdown for x86.
      
      It works well for the non-hybrid platforms. However, for a hybrid
      platform, the hard code generic default attrs don't work.
      
      Uses arch_evlist__add_default_attrs() to replace the
      evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
      modified to invoke the same __evlist__add_default_attrs() for the
      generic default attrs. No functional change.
      
      Add default_null_attrs[] to indicate the arch specific attrs.
      No functional change for the arch specific default attrs either.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-4-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9c1ecda
    • K
      perf evsel: Add arch_evsel__hw_name() · ff4207f7
      Kan Liang 提交于
      The commit 55bcf6ef ("perf: Extend PERF_TYPE_HARDWARE and
      PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for
      a hybrid system. However, current evsel__hw_name doesn't take the PMU
      type into account. It mistakenly returns the "unknown-hardware" for the
      hardware event with a specific PMU type.
      
      Add an arch specific arch_evsel__hw_name() to specially handle the PMU
      aware hardware event.
      
      Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only
      supported by X86. Only implement the specific arch_evsel__hw_name() for
      X86 in the patch.
      
      Nothing is changed for the other archs.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220721065706.2886112-3-zhengjun.xing@linux.intel.comSigned-off-by: NXing Zhengjun <zhengjun.xing@linux.intel.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff4207f7
  6. 27 7月, 2022 14 次提交
    • I
      perf bpf: Remove undefined behavior from bpf_perf_object__next() · 9a241805
      Ian Rogers 提交于
      bpf_perf_object__next() folded the last element in the list test with the
      empty list test. However, this meant that offsets were computed against
      null and that a struct list_head was compared against a 'struct
      bpf_perf_object'.
      
      Working around this with clang's undefined behavior sanitizer required
      -fno-sanitize=null and -fno-sanitize=object-size.
      
      Remove the undefined behavior by using the regular Linux list APIs and
      handling the starting case separately from the end testing case.
      
      Looking at uses like bpf_perf_object__for_each(), as the constant NULL
      or non-NULL argument can be constant propagated, the code is no less
      efficient.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Christy Lee <christylee@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: bpf@vger.kernel.org
      Cc: llvm@lists.linux.dev
      Link: https://lore.kernel.org/r/20220726220921.2567761-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a241805
    • L
      perf symbol: Skip symbols if SHF_ALLOC flag is not set · 882528d2
      Leo Yan 提交于
      Some symbols are observed with the 'st_value' field zeroed.  E.g.
      libc.so.6 in Ubuntu contains a symbol '__evoke_link_warning_getwd' which
      resides in the '.gnu.warning.getwd' section.
      
      Unlike normal sections, such kind of sections are used for linker
      warning when a file calls deprecated functions, but they are not part of
      memory images, the symbols in these sections should be dropped.
      
      This patch checks the section attribute SHF_ALLOC bit, if the bit is not
      set, it skips symbols to avoid spurious ones.
      Suggested-by: NFangrui Song <maskray@google.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chang Rui <changruinj@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220724060013.171050-3-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      882528d2
    • L
      perf symbol: Correct address for bss symbols · 2d86612a
      Leo Yan 提交于
      When using 'perf mem' and 'perf c2c', an issue is observed that tool
      reports the wrong offset for global data symbols.  This is a common
      issue on both x86 and Arm64 platforms.
      
      Let's see an example, for a test program, below is the disassembly for
      its .bss section which is dumped with objdump:
      
        ...
      
        Disassembly of section .bss:
      
        0000000000004040 <completed.0>:
        	...
      
        0000000000004080 <buf1>:
        	...
      
        00000000000040c0 <buf2>:
        	...
      
        0000000000004100 <thread>:
        	...
      
      First we used 'perf mem record' to run the test program and then used
      'perf --debug verbose=4 mem report' to observe what's the symbol info
      for 'buf1' and 'buf2' structures.
      
        # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
        # ./perf --debug verbose=4 mem report
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
          symbol__new: buf2 0x30a8-0x30e8
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
          symbol__new: buf1 0x3068-0x30a8
          ...
      
      The perf tool relies on libelf to parse symbols, in executable and
      shared object files, 'st_value' holds a virtual address; 'sh_addr' is
      the address at which section's first byte should reside in memory, and
      'sh_offset' is the byte offset from the beginning of the file to the
      first byte in the section.  The perf tool uses below formula to convert
      a symbol's memory address to a file address:
      
        file_address = st_value - sh_addr + sh_offset
                          ^
                          ` Memory address
      
      We can see the final adjusted address ranges for buf1 and buf2 are
      [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
      incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
      compiler attribute with 64-byte alignment.
      
      The problem happens for 'sh_offset', libelf returns it as 0x3028 which
      is not 64-byte aligned, combining with disassembly, it's likely libelf
      doesn't respect the alignment for .bss section, therefore, it doesn't
      return the aligned value for 'sh_offset'.
      
      Suggested by Fangrui Song, ELF file contains program header which
      contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
      segments contain the execution info.  A better choice for converting
      memory address to file address is using the formula:
      
        file_address = st_value - p_vaddr + p_offset
      
      This patch introduces elf_read_program_header() which returns the
      program header based on the passed 'st_value', then it uses the formula
      above to calculate the symbol file address; and the debugging log is
      updated respectively.
      
      After applying the change:
      
        # ./perf --debug verbose=4 mem report
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
          symbol__new: buf2 0x30c0-0x3100
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
          symbol__new: buf1 0x3080-0x30c0
          ...
      
      Fixes: f17e04af ("perf report: Fix ELF symbol parsing")
      Reported-by: NChang Rui <changruinj@gmail.com>
      Suggested-by: NFangrui Song <maskray@google.com>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d86612a
    • Y
      perf kwork: Add workqueue trace BPF support · acfb65fe
      Yang Jihong 提交于
      Implements workqueue trace bpf function.
      
      Test cases:
      
        # perf kwork -k workqueue lat -b
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
         --------------------------------------------------------------------------------------------------------------------------------
          (w)addrconf_verify_work        | 0002 |      5.856 ms |         1 |      5.856 ms |     111994.634313 s |     111994.640169 s |
          (w)vmstat_update               | 0001 |      1.247 ms |         1 |      1.247 ms |     111996.462651 s |     111996.463899 s |
          (w)neigh_periodic_work         | 0001 |      1.183 ms |         1 |      1.183 ms |     111996.462789 s |     111996.463973 s |
          (w)neigh_managed_work          | 0001 |      0.989 ms |         2 |      1.635 ms |     111996.462820 s |     111996.464455 s |
          (w)wb_workfn                   | 0000 |      0.667 ms |         1 |      0.667 ms |     111996.384273 s |     111996.384940 s |
          (w)bpf_prog_free_deferred      | 0001 |      0.495 ms |         1 |      0.495 ms |     111986.314201 s |     111986.314696 s |
          (w)mix_interrupt_randomness    | 0002 |      0.421 ms |         6 |      0.749 ms |     111995.927750 s |     111995.928499 s |
          (w)vmstat_shepherd             | 0000 |      0.374 ms |         2 |      0.385 ms |     111991.265242 s |     111991.265627 s |
          (w)e1000_watchdog              | 0002 |      0.356 ms |         5 |      0.390 ms |     111994.528380 s |     111994.528770 s |
          (w)vmstat_update               | 0000 |      0.231 ms |         2 |      0.365 ms |     111996.384407 s |     111996.384772 s |
          (w)flush_to_ldisc              | 0006 |      0.165 ms |         1 |      0.165 ms |     111995.930606 s |     111995.930771 s |
          (w)flush_to_ldisc              | 0000 |      0.094 ms |         2 |      0.095 ms |     111996.460453 s |     111996.460548 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
        # perf kwork -k workqueue rep -b
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          (w)e1000_watchdog              | 0002 |      0.627 ms |         2 |      0.324 ms |     112002.720665 s |     112002.720989 s |
          (w)flush_to_ldisc              | 0007 |      0.598 ms |         2 |      0.534 ms |     112000.875226 s |     112000.875761 s |
          (w)wq_barrier_func             | 0007 |      0.492 ms |         1 |      0.492 ms |     112000.876981 s |     112000.877473 s |
          (w)flush_to_ldisc              | 0007 |      0.281 ms |         1 |      0.281 ms |     112005.826882 s |     112005.827163 s |
          (w)mix_interrupt_randomness    | 0002 |      0.229 ms |         3 |      0.102 ms |     112005.825671 s |     112005.825774 s |
          (w)vmstat_shepherd             | 0000 |      0.202 ms |         1 |      0.202 ms |     112001.504511 s |     112001.504713 s |
          (w)bpf_prog_free_deferred      | 0001 |      0.181 ms |         1 |      0.181 ms |     112000.883251 s |     112000.883432 s |
          (w)wb_workfn                   | 0007 |      0.130 ms |         1 |      0.130 ms |     112001.505195 s |     112001.505325 s |
          (w)vmstat_update               | 0000 |      0.053 ms |         1 |      0.053 ms |     112001.504763 s |     112001.504815 s |
         --------------------------------------------------------------------------------------------------------------------------------
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-18-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      acfb65fe
    • Y
      perf kwork: Add softirq trace BPF support · 5a81927a
      Yang Jihong 提交于
      Implements softirq trace bpf function.
      
      Test cases:
      Trace softirq latency without filter:
      
        # perf kwork -k softirq lat -b
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
         --------------------------------------------------------------------------------------------------------------------------------
          (s)RCU:9                       | 0005 |      0.281 ms |         3 |      0.338 ms |     111295.752222 s |     111295.752560 s |
          (s)RCU:9                       | 0002 |      0.262 ms |        24 |      1.400 ms |     111301.335986 s |     111301.337386 s |
          (s)SCHED:7                     | 0005 |      0.177 ms |        14 |      0.212 ms |     111295.752270 s |     111295.752481 s |
          (s)RCU:9                       | 0007 |      0.161 ms |        47 |      2.022 ms |     111295.402159 s |     111295.404181 s |
          (s)NET_RX:3                    | 0003 |      0.149 ms |        12 |      1.261 ms |     111301.192964 s |     111301.194225 s |
          (s)TIMER:1                     | 0001 |      0.105 ms |         9 |      0.198 ms |     111301.180191 s |     111301.180389 s |
          ... <SNIP> ...
          (s)NET_RX:3                    | 0002 |      0.098 ms |         6 |      0.124 ms |     111295.403760 s |     111295.403884 s |
          (s)SCHED:7                     | 0001 |      0.093 ms |        19 |      0.242 ms |     111301.180256 s |     111301.180498 s |
          (s)SCHED:7                     | 0007 |      0.078 ms |        15 |      0.188 ms |     111300.064226 s |     111300.064415 s |
          (s)SCHED:7                     | 0004 |      0.077 ms |        11 |      0.213 ms |     111301.361759 s |     111301.361973 s |
          (s)SCHED:7                     | 0000 |      0.063 ms |        33 |      0.805 ms |     111295.401811 s |     111295.402616 s |
          (s)SCHED:7                     | 0003 |      0.063 ms |        14 |      0.085 ms |     111301.192255 s |     111301.192340 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
      Trace softirq latency with cpu filter:
      
        # perf kwork -k softirq lat -b -C 1
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
         --------------------------------------------------------------------------------------------------------------------------------
          (s)RCU:9                       | 0001 |      0.178 ms |         5 |      0.572 ms |     111435.534135 s |     111435.534707 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
      Trace softirq latency with name filter:
      
        # perf kwork -k softirq lat -b -n SCHED
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
         --------------------------------------------------------------------------------------------------------------------------------
          (s)SCHED:7                     | 0001 |      0.295 ms |        15 |      2.183 ms |     111452.534950 s |     111452.537133 s |
          (s)SCHED:7                     | 0002 |      0.215 ms |        10 |      0.315 ms |     111460.000238 s |     111460.000553 s |
          (s)SCHED:7                     | 0005 |      0.190 ms |        29 |      0.338 ms |     111457.032538 s |     111457.032876 s |
          (s)SCHED:7                     | 0003 |      0.097 ms |        10 |      0.319 ms |     111452.434351 s |     111452.434670 s |
          (s)SCHED:7                     | 0006 |      0.089 ms |         1 |      0.089 ms |     111450.737450 s |     111450.737539 s |
          (s)SCHED:7                     | 0007 |      0.085 ms |        17 |      0.169 ms |     111452.471333 s |     111452.471502 s |
          (s)SCHED:7                     | 0004 |      0.071 ms |        15 |      0.221 ms |     111452.535252 s |     111452.535473 s |
          (s)SCHED:7                     | 0000 |      0.044 ms |        32 |      0.130 ms |     111460.001982 s |     111460.002112 s |
         --------------------------------------------------------------------------------------------------------------------------------
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-17-yangjihong1@huawei.com
      [ Add {} for multiline if blocks ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a81927a
    • Y
      perf kwork: Add IRQ trace BPF support · 420298ae
      Yang Jihong 提交于
      Implements irq trace bpf function.
      
      Test cases:
      Trace irq without filter:
      
        # perf kwork -k irq rep -b
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          virtio0-requests:25            | 0000 |     31.026 ms |       285 |      1.493 ms |     110326.049963 s |     110326.051456 s |
          eth0:10                        | 0002 |      7.875 ms |        96 |      1.429 ms |     110313.916835 s |     110313.918264 s |
          ata_piix:14                    | 0002 |      2.510 ms |        28 |      0.396 ms |     110331.367987 s |     110331.368383 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
      Trace irq with cpu filter:
      
        # perf kwork -k irq rep -b -C 0
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          virtio0-requests:25            | 0000 |     34.288 ms |       282 |      2.061 ms |     110358.078968 s |     110358.081029 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
      Trace irq with name filter:
      
        # perf kwork -k irq rep -b -n eth0
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          eth0:10                        | 0002 |      2.184 ms |        21 |      0.572 ms |     110386.541699 s |     110386.542271 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
      Trace irq with summary:
      
        # perf kwork -k irq rep -b -S
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          virtio0-requests:25            | 0000 |     42.923 ms |       285 |      1.181 ms |     110418.128867 s |     110418.130049 s |
          eth0:10                        | 0002 |      2.085 ms |        20 |      0.668 ms |     110416.002935 s |     110416.003603 s |
          ata_piix:14                    | 0002 |      0.970 ms |         4 |      0.656 ms |     110424.034482 s |     110424.035138 s |
         --------------------------------------------------------------------------------------------------------------------------------
          Total count            :       309
          Total runtime   (msec) :    45.977 (0.003% load average)
          Total time span (msec) : 17017.655
         --------------------------------------------------------------------------------------------------------------------------------
      
      Committer testing:
      
        # perf kwork -k irq rep -b
        Starting trace, Hit <Ctrl+C> to stop and report
        ^C
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
          nvme0q20:145                   | 0019 |      0.570 ms |        28 |      0.064 ms |      26966.635102 s |      26966.635167 s |
          amdgpu:162                     | 0002 |      0.568 ms |        29 |      0.068 ms |      26966.644346 s |      26966.644414 s |
          nvme0q4:129                    | 0003 |      0.565 ms |        31 |      0.037 ms |      26966.614830 s |      26966.614866 s |
          nvme0q16:141                   | 0015 |      0.205 ms |        66 |      0.012 ms |      26967.145161 s |      26967.145174 s |
          nvme0q29:154                   | 0028 |      0.154 ms |        44 |      0.014 ms |      26967.078970 s |      26967.078984 s |
          nvme0q10:135                   | 0009 |      0.134 ms |        43 |      0.011 ms |      26967.132093 s |      26967.132104 s |
          nvme0q2:127                    | 0001 |      0.132 ms |        26 |      0.011 ms |      26966.883584 s |      26966.883595 s |
          nvme0q25:150                   | 0024 |      0.127 ms |        32 |      0.014 ms |      26966.631419 s |      26966.631433 s |
          nvme0q14:139                   | 0013 |      0.110 ms |        21 |      0.017 ms |      26966.760843 s |      26966.760861 s |
          nvme0q30:155                   | 0029 |      0.102 ms |        30 |      0.022 ms |      26966.677171 s |      26966.677193 s |
          nvme0q13:138                   | 0012 |      0.088 ms |        20 |      0.015 ms |      26966.738733 s |      26966.738748 s |
          nvme0q6:131                    | 0005 |      0.087 ms |        13 |      0.020 ms |      26966.648445 s |      26966.648465 s |
          nvme0q28:153                   | 0027 |      0.066 ms |        12 |      0.015 ms |      26966.771431 s |      26966.771447 s |
          nvme0q26:151                   | 0025 |      0.060 ms |        13 |      0.012 ms |      26966.704266 s |      26966.704278 s |
          nvme0q21:146                   | 0020 |      0.054 ms |        20 |      0.011 ms |      26967.322082 s |      26967.322094 s |
          nvme0q1:126                    | 0000 |      0.046 ms |        11 |      0.013 ms |      26966.859754 s |      26966.859767 s |
          nvme0q17:142                   | 0016 |      0.046 ms |        10 |      0.011 ms |      26967.114513 s |      26967.114524 s |
          xhci_hcd:74                    | 0015 |      0.041 ms |         3 |      0.016 ms |      26967.086004 s |      26967.086020 s |
          nvme0q8:133                    | 0007 |      0.039 ms |        12 |      0.008 ms |      26966.712056 s |      26966.712063 s |
          nvme0q32:157                   | 0031 |      0.036 ms |        10 |      0.014 ms |      26966.627054 s |      26966.627068 s |
          nvme0q9:134                    | 0008 |      0.036 ms |        11 |      0.011 ms |      26967.258452 s |      26967.258462 s |
          nvme0q7:132                    | 0006 |      0.024 ms |         3 |      0.014 ms |      26966.767404 s |      26966.767418 s |
          nvme0q11:136                   | 0010 |      0.023 ms |         5 |      0.006 ms |      26966.935455 s |      26966.935461 s |
          nvme0q31:156                   | 0030 |      0.018 ms |         5 |      0.006 ms |      26966.627517 s |      26966.627524 s |
          nvme0q12:137                   | 0011 |      0.015 ms |         2 |      0.014 ms |      26966.799588 s |      26966.799602 s |
          enp5s0-rx-0:164                | 0006 |      0.009 ms |         2 |      0.005 ms |      26966.742024 s |      26966.742028 s |
          enp5s0-rx-1:165                | 0007 |      0.006 ms |         2 |      0.004 ms |      26966.939486 s |      26966.939490 s |
          enp5s0-tx-0:166                | 0008 |      0.005 ms |         1 |      0.005 ms |      26966.939484 s |      26966.939489 s |
          enp5s0-tx-1:167                | 0009 |      0.005 ms |         1 |      0.005 ms |      26966.939484 s |      26966.939489 s |
         --------------------------------------------------------------------------------------------------------------------------------
      
        #t
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-16-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      420298ae
    • Y
      perf kwork: Implement BPF trace · daf07d22
      Yang Jihong 提交于
      'perf record' generates perf.data, which generates extra interrupts
      for hard disk, amount of data to be collected increases with time.
      
      Using eBPF trace can process the data in kernel, which solves the
      preceding two problems.
      
      Add -b/--use-bpf option for latency and report to support
      tracing kwork events using eBPF:
      
      1. Create bpf prog and attach to tracepoints,
      2. Start tracing after command is entered,
      3. After user hit "ctrl+c", stop tracing and report,
      4. Support CPU and name filtering.
      
      This commit implements the framework code and
      does not add specific event support.
      
      Test cases:
      
        # perf kwork rep -h
      
         Usage: perf kwork report [<options>]
      
            -b, --use-bpf         Use BPF to measure kwork runtime
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): runtime, max, count
            -S, --with-summary    Show summary with statistics
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork lat -h
      
         Usage: perf kwork latency [<options>]
      
            -b, --use-bpf         Use BPF to measure kwork latency
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): avg, max, count
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork lat -b
        Unsupported bpf trace class irq
      
        # perf kwork rep -b
        Unsupported bpf trace class irq
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com
      [ Simplify work_findnew() ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      daf07d22
    • Y
      perf kwork: Implement perf kwork timehist · bcc8b3e8
      Yang Jihong 提交于
      Implements framework of perf kwork timehist,
      to provide an analysis of kernel work events.
      
      Test cases:
      
        # perf kwork tim
         Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                       (TYPE)NAME:NUM                  (msec)      (msec)
         -----------------  -----------------  ------  ------------------------------  ----------  ----------
              91576.060290       91576.060344  [0000]  (s)RCU:9                             0.055       0.111
              91576.061470       91576.061547  [0000]  (s)SCHED:7                           0.077       0.073
              91576.062604       91576.062697  [0001]  (s)RCU:9                             0.094       0.409
              91576.064443       91576.064517  [0002]  (s)RCU:9                             0.074       0.114
              91576.065144       91576.065211  [0000]  (s)SCHED:7                           0.067       0.058
              91576.066564       91576.066609  [0003]  (s)RCU:9                             0.045       0.110
              91576.068495       91576.068559  [0000]  (s)SCHED:7                           0.064       0.059
              91576.068900       91576.068996  [0004]  (s)RCU:9                             0.096       0.726
              91576.069364       91576.069420  [0002]  (s)RCU:9                             0.056       0.082
              91576.069649       91576.069701  [0004]  (s)RCU:9                             0.052       0.111
              91576.070147       91576.070206  [0000]  (s)SCHED:7                           0.060       0.057
              91576.073147       91576.073202  [0000]  (s)SCHED:7                           0.054       0.060
        <SNIP>
      
        # perf kwork tim --max-stack 2 -g
         Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                       (TYPE)NAME:NUM                  (msec)      (msec)
         -----------------  -----------------  ------  ------------------------------  ----------  ----------
              91576.060290       91576.060344  [0000]  (s)RCU:9                             0.055       0.111   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.061470       91576.061547  [0000]  (s)SCHED:7                           0.077       0.073   irq_exit_rcu <- sysvec_call_function_single
              91576.062604       91576.062697  [0001]  (s)RCU:9                             0.094       0.409   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.064443       91576.064517  [0002]  (s)RCU:9                             0.074       0.114   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.065144       91576.065211  [0000]  (s)SCHED:7                           0.067       0.058   irq_exit_rcu <- sysvec_call_function_single
              91576.066564       91576.066609  [0003]  (s)RCU:9                             0.045       0.110   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.068495       91576.068559  [0000]  (s)SCHED:7                           0.064       0.059   irq_exit_rcu <- sysvec_call_function_single
              91576.068900       91576.068996  [0004]  (s)RCU:9                             0.096       0.726   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.069364       91576.069420  [0002]  (s)RCU:9                             0.056       0.082   irq_exit_rcu <- sysvec_apic_timer_interrupt
              91576.069649       91576.069701  [0004]  (s)RCU:9                             0.052       0.111   irq_exit_rcu <- sysvec_apic_timer_interrupt
        <SNIP>
      
      Committer testing:
      
        # perf kwork -k workqueue timehist | head -40
         Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                       (TYPE)NAME:NUM                  (msec)      (msec)
         -----------------  -----------------  ------  ------------------------------  ----------  ----------
              26520.211825       26520.211832  [0019]  (w)free_work                         0.007       0.004
              26520.212929       26520.212934  [0020]  (w)free_work                         0.005       0.004
              26520.213226       26520.213228  [0014]  (w)kfree_rcu_work                    0.002       0.004
              26520.214057       26520.214061  [0021]  (w)free_work                         0.004       0.004
              26520.221239       26520.221241  [0007]  (w)kfree_rcu_work                    0.002       0.009
              26520.223232       26520.223238  [0013]  (w)psi_avgs_work                     0.005       0.006
              26520.230057       26520.230060  [0020]  (w)free_work                         0.003       0.003
              26520.270428       26520.270434  [0015]  (w)free_work                         0.006       0.004
              26520.270546       26520.270550  [0014]  (w)free_work                         0.004       0.003
              26520.281626       26520.281629  [0015]  (w)free_work                         0.003       0.002
              26520.287225       26520.287230  [0012]  (w)psi_avgs_work                     0.005       0.008
              26520.287231       26520.287235  [0001]  (w)psi_avgs_work                     0.004       0.011
              26520.287236       26520.287239  [0001]  (w)psi_avgs_work                     0.003       0.012
              26520.329488       26520.329492  [0024]  (w)free_work                         0.004       0.004
              26520.330600       26520.330605  [0007]  (w)free_work                         0.005       0.004
              26520.334218       26520.334218  [0007]  (w)kfree_rcu_monitor                 0.001       0.002
              26520.335220       26520.335221  [0005]  (w)kfree_rcu_monitor                 0.001       0.004
              26520.343980       26520.343985  [0007]  (w)free_work                         0.005       0.002
              26520.345093       26520.345097  [0006]  (w)free_work                         0.004       0.003
              26520.351233       26520.351238  [0027]  (w)psi_avgs_work                     0.005       0.008
              26520.353228       26520.353229  [0007]  (w)kfree_rcu_work                    0.001       0.002
              26520.353229       26520.353231  [0005]  (w)kfree_rcu_work                    0.001       0.006
              26520.382381       26520.382383  [0006]  (w)free_work                         0.003       0.002
              26520.386547       26520.386548  [0006]  (w)free_work                         0.002       0.001
              26520.391243       26520.391245  [0015]  (w)console_callback                  0.002       0.016
              26520.415369       26520.415621  [0027]  (w)btrfs_work_helper                 0.252
              26520.415351       26520.416174  [0002]  (w)btrfs_work_helper                 0.823       0.037
              26520.415343       26520.416304  [0031]  (w)btrfs_work_helper                 0.961
              26520.415335       26520.417078  [0001]  (w)btrfs_work_helper                 1.743
              26520.415250       26520.417564  [0002]  (w)wb_workfn                         2.314
              26520.424777       26520.424787  [0002]  (w)btrfs_work_helper                 0.010
              26520.424788       26520.424798  [0002]  (w)btrfs_work_helper                 0.010
              26520.424790       26520.424805  [0001]  (w)btrfs_work_helper                 0.016       0.016
              26520.424801       26520.424807  [0002]  (w)btrfs_work_helper                 0.006
              26520.424809       26520.424831  [0002]  (w)btrfs_work_helper                 0.022       0.030
              26520.424824       26520.424835  [0027]  (w)btrfs_work_helper                 0.011
              26520.424809       26520.424867  [0001]  (w)btrfs_work_helper                 0.059       0.032
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bcc8b3e8
    • Y
      perf kwork: Implement perf kwork latency · ad3d9f7a
      Yang Jihong 提交于
      Implements framework of perf kwork latency, which is used to report time
      properties such as delay time and frequency.
      
      Test cases:
      
        # perf kwork lat -h
      
         Usage: perf kwork latency [<options>]
      
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): avg, max, count
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork lat -C 199
        Requested CPU 199 too large. Consider raising MAX_NR_CPUS
        Invalid cpu bitmap
      
        # perf kwork lat -i perf_no_exist.data
        failed to open perf_no_exist.data: No such file or directory
      
        # perf kwork lat -s avg1
          Error: Unknown --sort key: `avg1'
      
         Usage: perf kwork latency [<options>]
      
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): avg, max, count
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork lat --time FFFF,
        Invalid time span
      
        # perf kwork lat
      
          Kwork Name                     | Cpu  | Avg delay     | Count    | Max delay     | Max delay start     | Max delay end       |
         --------------------------------------------------------------------------------------------------------------------------------
         --------------------------------------------------------------------------------------------------------------------------------
          INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit)
      
      Since there are no latency-enabled events, the output is empty.
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-11-yangjihong1@huawei.com
      [ Add {} for multiline if blocks ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad3d9f7a
    • Y
      perf kwork: Implement 'report' subcommand · f98919ec
      Yang Jihong 提交于
      Implements framework of 'perf kwork report', which is used to report
      time properties such as run time and frequency:
      
      Test cases:
      
        # perf kwork
      
         Usage: perf kwork [<options>] {record|report}
      
            -D, --dump-raw-trace  dump raw trace in ASCII
            -f, --force           don't complain, do it
            -k, --kwork <kwork>   list of kwork to profile (irq, softirq, workqueue, etc)
            -v, --verbose         be more verbose (show symbol address, etc)
      
        # perf kwork report -h
      
         Usage: perf kwork report [<options>]
      
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): runtime, max, count
            -S, --with-summary    Show summary with statistics
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork report
      
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
         --------------------------------------------------------------------------------------------------------------------------------
      
        # perf kwork report -S
      
          Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
         --------------------------------------------------------------------------------------------------------------------------------
         --------------------------------------------------------------------------------------------------------------------------------
          Total count            :         0
          Total runtime   (msec) :     0.000 (0.000% load average)
          Total time span (msec) :     0.000
         --------------------------------------------------------------------------------------------------------------------------------
      
        # perf kwork report -C 0,100
        Requested CPU 100 too large. Consider raising MAX_NR_CPUS
        Invalid cpu bitmap
      
        # perf kwork report -s runtime1
          Error: Unknown --sort key: `runtime1'
      
         Usage: perf kwork report [<options>]
      
            -C, --cpu <cpu>       list of cpus to profile
            -i, --input <file>    input file name
            -n, --name <name>     event name to profile
            -s, --sort <key[,key2...]>
                                  sort by key(s): runtime, max, count
            -S, --with-summary    Show summary with statistics
                --time <str>      Time span for analysis (start,stop)
      
        # perf kwork report -i perf_no_exist.data
        failed to open perf_no_exist.data: No such file or directory
      
        # perf kwork report --time 00FFF,
        Invalid time span
      
      Since there are no report supported events, the output is empty.
      
      Briefly describe the data structure:
      
      1. "class" indicates event type. For example, irq and softiq correspond
      to different types.
      
      2. "cluster" refers to a specific event corresponding to a type. For
      example, RCU and TIMER in softirq correspond to different clusters,
      which contains three types of events: raise, entry, and exit.
      
      3. "atom" includes time of each sample and sample of the previous phase.
      (For example, exit corresponds to entry, which is used for timehist.)
      
      Committer notes:
      
      - Add {} for multiline if blocks.
      
      - report_print_work() should either return that ret variable that
        accounts how many bytes were printed or stop accounting and be void.
        Do the former for now to avoid this:
      
      builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable]
              int ret = 0;
                  ^
      1 error generated.
      
        When building with:
      
        ⬢[acme@toolbox perf]$ clang --version
        clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076)
      
      Also:
      
        -       if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) {
        +       if (dst_type < KWORK_TRACE_MAX) {
      
      Several versions of clang and at least this gcc:
      
         3    51.40 alpine:3.9                    : FAIL gcc version 8.3.0 (Alpine 8.3.0)
          builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is
                always true [-Werror,-Wtautological-compare]
                  if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) {
      
      As the first entry in a enum is zero.
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-7-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f98919ec
    • Y
      perf kwork: Add workqueue kwork record support · 97179d9d
      Yang Jihong 提交于
      Record workqueue events workqueue:workqueue_activate_work,
      workqueue:workqueue_execute_start & workqueue:workqueue_execute_end
      
      Tese cases:
      Record all events:
      
        # perf kwork record -o perf_kwork.date -- sleep 1
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 0.857 MB perf_kwork.date ]
        #
        # perf evlist -i perf_kwork.date
        irq:irq_handler_entry
        irq:irq_handler_exit
        irq:softirq_raise
        irq:softirq_entry
        irq:softirq_exit
        workqueue:workqueue_activate_work
        workqueue:workqueue_execute_start
        workqueue:workqueue_execute_end
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Record workqueue events:
      
        # perf kwork -k workqueue record -o perf_kwork.date -- sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.081 MB perf_kwork.date ]
        #
        # perf evlist -i perf_kwork.date
        workqueue:workqueue_activate_work
        workqueue:workqueue_execute_start
        workqueue:workqueue_execute_end
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Committer testing:
      
        # perf kwork record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 3.430 MB perf.data (24130 samples) ]
        # perf evlist -v
        irq:irq_handler_entry: type: 2, size: 128, config: 0x97, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:irq_handler_exit: type: 2, size: 128, config: 0x96, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_raise: type: 2, size: 128, config: 0x93, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_entry: type: 2, size: 128, config: 0x95, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_exit: type: 2, size: 128, config: 0x94, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        workqueue:workqueue_activate_work: type: 2, size: 128, config: 0x106, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        workqueue:workqueue_execute_start: type: 2, size: 128, config: 0x105, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        workqueue:workqueue_execute_end: type: 2, size: 128, config: 0x104, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        dummy:HG: type: 1, size: 128, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|RAW|IDENTIFIER, read_format: ID, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
        # perf script | grep workqueue | head
                 swapper     0 [018] 26035.043289: workqueue:workqueue_activate_work: work struct 0xffff8b8ffeeae368
         kworker/18:2-ev 70440 [018] 26035.043293: workqueue:workqueue_execute_start: work struct 0xffff8b8ffeeae368: function free_work
         kworker/18:2-ev 70440 [018] 26035.043301:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffeeae368: function free_work
                 swapper     0 [021] 26035.044704: workqueue:workqueue_activate_work: work struct 0xffff8b8ffef6e368
         kworker/21:0-ev 4080535 [021] 26035.044709: workqueue:workqueue_execute_start: work struct 0xffff8b8ffef6e368: function free_work
         kworker/21:0-ev 4080535 [021] 26035.044716:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffef6e368: function free_work
                 swapper     0 [018] 26035.045230: workqueue:workqueue_activate_work: work struct 0xffff8b8ffeeae368
         kworker/18:2-ev 70440 [018] 26035.045232: workqueue:workqueue_execute_start: work struct 0xffff8b8ffeeae368: function free_work
         kworker/18:2-ev 70440 [018] 26035.045235:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffeeae368: function free_work
                 swapper     0 [001] 26035.052046: workqueue:workqueue_activate_work: work struct 0xffff8b8108901590
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-5-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97179d9d
    • Y
      perf kwork: Add softirq kwork record support · e6439321
      Yang Jihong 提交于
      Record softirq events irq:softirq_raise, irq:softirq_entry &
      irq:softirq_exit.
      
      Test cases:
      Record all events:
      
        # perf kwork record -o perf_kwork.date -- sleep 1
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 0.897 MB perf_kwork.date ]
        #
        # perf evlist -i perf_kwork.date
        irq:irq_handler_entry
        irq:irq_handler_exit
        irq:softirq_raise
        irq:softirq_entry
        irq:softirq_exit
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Record softirq events:
      
        # perf kwork -k softirq record -o perf_kwork.date -- sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.141 MB perf_kwork.date ]
        #
        # perf evlist -i perf_kwork.date
        irq:softirq_raise
        irq:softirq_entry
        irq:softirq_exit
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Committer testing:
      
        # perf kwork record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 3.078 MB perf.data (17433 samples) ]
        # perf evlist -v
        irq:irq_handler_entry: type: 2, size: 128, config: 0x97, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:irq_handler_exit: type: 2, size: 128, config: 0x96, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_raise: type: 2, size: 128, config: 0x93, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_entry: type: 2, size: 128, config: 0x95, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        irq:softirq_exit: type: 2, size: 128, config: 0x94, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
        dummy:HG: type: 1, size: 128, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|RAW|IDENTIFIER, read_format: ID, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
        # perf script | head
            migration/12    73 [012] 25884.940992:     irq:softirq_raise: vec=9 [action=RCU]
            migration/12    73 [012] 25884.940994:     irq:softirq_entry: vec=9 [action=RCU]
            migration/12    73 [012] 25884.940995:      irq:softirq_exit: vec=9 [action=RCU]
                 swapper     0 [004] 25884.940995:     irq:softirq_raise: vec=9 [action=RCU]
                 swapper     0 [004] 25884.940998:     irq:softirq_entry: vec=9 [action=RCU]
                 swapper     0 [004] 25884.940999:      irq:softirq_exit: vec=9 [action=RCU]
                     cc1 71212 [021] 25884.941990:     irq:softirq_raise: vec=9 [action=RCU]
                 swapper     0 [004] 25884.941991:     irq:softirq_raise: vec=9 [action=RCU]
                     cc1 71212 [021] 25884.941992:     irq:softirq_raise: vec=7 [action=SCHED]
               perf-exec 71208 [013] 25884.941992:     irq:softirq_raise: vec=9 [action=RCU]
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-4-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e6439321
    • Y
      perf kwork: Add irq kwork record support · 4f8ae962
      Yang Jihong 提交于
      Record interrupt events irq:irq_handler_entry & irq_handler_exit
      
      Test cases:
      
       # perf kwork record -o perf_kwork.date -- sleep 1
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 0.556 MB perf_kwork.date ]
        #
        # perf evlist -i perf_kwork.date
        irq:irq_handler_entry
        irq:irq_handler_exit
        dummy:HG
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-3-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f8ae962
    • Y
      perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) · 0f70d8e9
      Yang Jihong 提交于
      The 'perf kwork' tool is used to trace time properties of kernel work
      (such as irq, softirq, and workqueue), including runtime, latency, and
      timehist, using the infrastructure in the perf tools to allow tracing
      extra targets.
      
      This is the first commit to reuse the 'perf record' framework code to
      implement a simple record function, kwork is not supported currently.
      
      Test cases:
      
        # perf
      
         usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
      
         The most commonly used perf commands are:
        <SNIP>
           iostat          Show I/O performance metrics
           kallsyms        Searches running kernel for symbols
           kmem            Tool to trace/measure kernel memory properties
           kvm             Tool to trace/measure kvm guest os
           kwork           Tool to trace/measure kernel work properties (latencies)
           list            List all symbolic event types
           lock            Analyze lock events
           mem             Profile memory accesses
           record          Run a command and record its profile into perf.data
        <SNIP>
         See 'perf help COMMAND' for more information on a specific command.
      
        # perf kwork
      
         Usage: perf kwork [<options>] {record}
      
            -D, --dump-raw-trace  dump raw trace in ASCII
            -f, --force           don't complain, do it
            -k, --kwork <kwork>   list of kwork to profile
            -v, --verbose         be more verbose (show symbol address, etc)
      
        # perf kwork record -- sleep 1
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 1.787 MB perf.data ]
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com
      [ Add {} for multiline if blocks ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0f70d8e9
  7. 26 7月, 2022 1 次提交
  8. 25 7月, 2022 1 次提交
    • K
      perf tsc: Add arch TSC frequency information · bc2373a5
      Kan Liang 提交于
      The TSC frequency information is required for the event metrics with the
      literal, system_tsc_freq. For the newer Intel platform, the TSC
      frequency information can be retrieved from the CPUID leaf 0x15.  If the
      TSC frequency information isn't present the /proc/cpuinfo approach is
      used.
      
      Refactor cpuid() for this use. Note, the previous stack pushing/popping
      approach was broken on x86-64 that has stack red zones that would be
      clobbered.
      
      Committer testing:
      
      Before:
      
        $ perf record sleep 0.0001
        [ perf record: Woken up 1 times to write data ]
        $ perf report --header-only |& grep cpuid
        # cpuid : AuthenticAMD,25,33,0
        $
      
      After the patch:
      
        $ perf record sleep 0.0001
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.002 MB perf.data (8 samples) ]
        $ perf report --header-only |& grep cpuid
        # cpuid : AuthenticAMD,25,33,0
        $
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220718164312.3994191-2-irogers@google.comSigned-off-by: NIan Rogers <irogers@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bc2373a5
  9. 21 7月, 2022 1 次提交
  10. 20 7月, 2022 5 次提交
    • J
      perf cs-etm: Fix duplicated 'the' in comment · 2c91cd88
      Jason Wang 提交于
      The double `the' is duplicated in the comment, remove one.
      Signed-off-by: NJason Wang <wangborong@cdjrlc.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20220716044040.43123-1-wangborong@cdjrlc.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2c91cd88
    • J
      perf probe: Fix duplicated 'the' in comment · c69d33eb
      Jason Wang 提交于
      The double `the' is duplicated in the comment, remove one.
      Signed-off-by: NJason Wang <wangborong@cdjrlc.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Zechuan Chen <chenzechuan1@huawei.com>
      Link: http://lore.kernel.org/lkml/20220716043957.42829-1-wangborong@cdjrlc.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c69d33eb
    • A
      perf scripting perl: Ignore some warnings to keep building with perl headers · 63a4354a
      Arnaldo Carvalho de Melo 提交于
      On gcc 12 we started seeing this:
      
        In file included from /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/perl.h:2999,
                         from util/scripting-engines/trace-event-perl.c:35:
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/inline.h: In function 'Perl_is_utf8_valid_partial_char_flags':
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/handy.h:125:23: error: cast from function call of type 'STRLEN' {aka 'long unsigned int'} to non-matching type '_Bool' [-Werror=bad-function-cast]
          125 | #define cBOOL(cbool) ((bool) (cbool))
              |                       ^
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/inline.h:2363:12: note: in expansion of macro 'cBOOL'
         2363 |     return cBOOL(is_utf8_char_helper_(s0, e, flags));
              |            ^~~~~
        In file included from /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/perl.h:7242:
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/inline.h: In function 'Perl_cop_file_avn':
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/inline.h:3489:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
         3489 |     const char *file = CopFILE(cop);
              |     ^~~~~
        In file included from /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/perl.h:7243:
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/sv_inline.h: In function 'Perl_newSV_type':
        /usr/lib/perl5/5.36.0/x86_64-linux-thread-multi/CORE/sv_inline.h:376:5: error: enumeration value 'SVt_LAST' not handled in switch [-Werror=switch-enum]
          376 |     switch (type) {
              |     ^~~~~~
      
      So disable those warnings to keep building with perl devel headers.
      
      Noticed, among other distros, on opensuse tumbleweed:
      
      gcc version 12.1.1 20220629 [revision 7811663964aa7e31c3939b859bbfa2e16919639f] (SUSE Linux)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      63a4354a
    • I
      perf python: Avoid deprecation warning on distutils · ee87a084
      Ian Rogers 提交于
      Fix the following DeprecationWarning:
      
        tools/perf/util/setup.py:31: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
      
      Note: the setuptools module may need installing, for example:
      
        $ sudo apt install python-setuptools
      
      Reviewer comments:
      
      James said:
      
      Tested it with python 2.7 and 3.8 by running "make install-python_ext PYTHON=..."
      
      Committer notes:
      
      Tested with:
      
       $ make -k BUILD_BPF_SKEL=1 PYTHON=python3 O=/tmp/build/perf -C tools/perf install-bin ; perf test python
      
       $ make -k BUILD_BPF_SKEL=1 O=/tmp/build/perf -C tools/perf install-bin ; perf test python
      Reviewed-by: NJames Clark <james.clark@arm.com>
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220615014206.26651-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ee87a084
    • A
      perf intel-pt: Use guest pid/tid etc in guest samples · 98759cca
      Adrian Hunter 提交于
      When decoding with guest sideband information, for VMX non-root (NR)
      i.e. guest events, replace the host (hypervisor) pid/tid with guest values,
      and provide also the new machine_pid and vcpu values.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NIan Rogers <irogers@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: kvm@vger.kernel.org
      Link: https://lore.kernel.org/r/20220711093218.10967-35-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      98759cca
新手
引导
客服 返回
顶部