1. 23 4月, 2018 3 次提交
  2. 19 4月, 2018 3 次提交
  3. 17 4月, 2018 2 次提交
    • T
      perf list: Add s390 support for detailed/verbose PMU event description · 038586c3
      Thomas Richter 提交于
      'perf list' with flags -d and -v print a description (-d) or a very
      verbose explanation (-v) of CPU specific counter events.  These
      descriptions are provided with the json files in directory
      pmu-events/arch/s390/*.json.
      
      Display of these descriptions on s390 requires the corresponding json
      files.
      
      On s390 this does not work because function is_pmu_core() does not
      detect the s390 directory name where the CPU specific events are listed.
      On x86 it is:
      
        /sys/bus/event_source/devices/cpu
      
      whereas on s390 it is:
      
        /sys/bus/event_source/devices/cpum_cf
        /sys/bus/event_source/devices/cpum_sf
      
      Fix this by adding s390 directory name testing to function
      is_pmu_core(). This is the same approach as taken for the ARM platform.
      
      Output before:
      
      [root@s35lp76 perf]# ./perf list -d pmu
      List of pre-defined events (to be used in -e):
      
        cpum_cf/AES_BLOCKED_CYCLES/      [Kernel PMU event]
        cpum_cf/AES_BLOCKED_FUNCTIONS/   [Kernel PMU event]
        cpum_cf/AES_CYCLES/              [Kernel PMU event]
        cpum_cf/AES_FUNCTIONS/           [Kernel PMU event]
        ....
        cpum_cf/TX_NC_TEND/              [Kernel PMU event]
        cpum_cf/VX_BCD_EXECUTION_SLOTS/  [Kernel PMU event]
        cpum_sf/SF_CYCLES_BASIC/         [Kernel PMU event]
      
      Output after:
      
      [root@s35lp76 perf]# ./perf list -d pmu
      List of pre-defined events (to be used in -e):
      
        cpum_cf/AES_BLOCKED_CYCLES/      [Kernel PMU event]
        cpum_cf/AES_BLOCKED_FUNCTIONS/   [Kernel PMU event]
        cpum_cf/AES_CYCLES/              [Kernel PMU event]
        cpum_cf/AES_FUNCTIONS/           [Kernel PMU event]
        ....
        cpum_cf/TX_NC_TEND/              [Kernel PMU event]
        cpum_cf/VX_BCD_EXECUTION_SLOTS/  [Kernel PMU event]
        cpum_sf/SF_CYCLES_BASIC/         [Kernel PMU event]
      
      3906:
        bcd_dfp_execution_slots
             [BCD DFP Execution Slots]
        decimal_instructions
             [Decimal Instructions]
        dtlb2_gpage_writes
             [DTLB2 GPAGE Writes]
        dtlb2_hpage_writes
             [DTLB2 HPAGE Writes]
        dtlb2_misses
             [DTLB2 Misses]
        dtlb2_writes
             [DTLB2 Writes]
        itlb2_misses
             [ITLB2 Misses]
        itlb2_writes
             [ITLB2 Writes]
        l1c_tlb2_misses
             [L1C TLB2 Misses]
        .....
      
      cfvn 3:
        cpu_cycles
             [CPU Cycles]
        instructions
             [Instructions]
        l1d_dir_writes
             [L1D Directory Writes]
        l1d_penalty_cycles
             [L1D Penalty Cycles]
        l1i_dir_writes
             [L1I Directory Writes]
        l1i_penalty_cycles
             [L1I Penalty Cycles]
        problem_state_cpu_cycles
             [Problem State CPU Cycles]
        problem_state_instructions
             [Problem State Instructions]
        ....
      
      csvn generic:
        aes_blocked_cycles
             [AES Blocked Cycles]
        aes_blocked_functions
             [AES Blocked Functions]
        aes_cycles
             [AES Cycles]
        aes_functions
             [AES Functions]
        dea_blocked_cycles
             [DEA Blocked Cycles]
        dea_blocked_functions
             [DEA Blocked Functions]
        ....
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180416132314.33249-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      038586c3
    • A
      perf report: Extend raw dump (-D) out with switch out event type · b3f35b5d
      Alexey Budankov 提交于
      Print additional 'preempt' tag for PERF_RECORD_SWITCH[_CPU_WIDE] OUT records when
      event header misc field contains PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit set
      designating preemption context switch out event:
      
      tools/perf/perf report -D -i perf.data | grep _SWITCH
      
      0 768361415226 0x27f076 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     8/8
      4 768362216813 0x28f45e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:     0/0
      4 768362217824 0x28f486 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:  4073/4073
      0 768362414027 0x27f0ce [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid:     8/8
      0 768362414367 0x27f0f6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:     0/0
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/6f5aebb9-b96c-f304-f08f-8f046d38de4f@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b3f35b5d
  4. 13 4月, 2018 3 次提交
    • A
      perf annotate: Allow setting the offset level in .perfconfig · 43c40231
      Arnaldo Carvalho de Melo 提交于
      The default is 1 (jump_target):
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      But one can ask for showing offsets for call instructions by setting
      this:
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      Or using a big value to ask for all offsets to be shown:
      
        # cat ~/.perfconfig
        [annotate]
      
      	offset_level = 100
      
      	hide_src_code = true
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26   0:   nop
          4.61   5:   push   %rbx
         19.33   6:   pushfq
          7.97   7:   pop    %rax
          0.32   8:   nop
          0.06   d:   mov    %rax,%rbx
         14.63  10:   cli
          0.06  11:   nop
                17:   xor    %eax,%eax
                19:   mov    $0x1,%edx
         49.94  1e:   lock   cmpxchg %edx,(%rdi)
          0.16  22:   test   %eax,%eax
                24: ↓ jne    2b
          2.66  26:   mov    %rbx,%rax
                29:   pop    %rbx
                2a: ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                32:   mov    %rbx,%rax
                35:   pop    %rbx
                36: ← retq
         #
      
      This also affects the TUI, i.e. the default 'perf annotate' and 'perf
      top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
      in the build, i.e.:
      
        # perf version --build-options | grep slang
                    libslang: [ on  ]  # HAVE_SLANG_SUPPORT
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      43c40231
    • A
      perf report: Fix switching to another perf.data file · 7b366142
      Arnaldo Carvalho de Melo 提交于
      In the TUI the 's' hotkey can be used to switch to another perf.data
      file in the current directory, but that got broken in Fixes:
      b01141f4 ("perf annotate: Initialize the priv are in symbol__new()"),
      that would show this once another file was chosen:
      
          ┌─Fatal Error─────────────────────────────────────┐
          │Annotation needs to be init before symbol__init()│
          │                                                 │
          │                                                 │
          │Press any key...                                 │
          └─────────────────────────────────────────────────┘
      
      Fix it by just silently bailing out if symbol__annotation_init() was already
      called, just like is done with symbol__init(), i.e. they are done just once at
      session start, not when switching to a new perf.data file.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: b01141f4 ("perf annotate: Initialize the priv are in symbol__new()")
      Link: https://lkml.kernel.org/n/tip-ogppdtpzfax7y1h6gjdv5s6u@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7b366142
    • T
      perf record: Change warning for missing sysfs entry to debug · 4f75f1cb
      Thomas Richter 提交于
      Using perf on 4.16.0 kernel on s390 shows this warning:
      
         failed: can't open node sysfs data
      
      each time I run command perf record ... for example:
      
        [root@s35lp76 perf]# ./perf record -e rB0000 -- sleep 1
        [ perf record: Woken up 1 times to write data ]
        failed: can't open node sysfs data
        [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ]
        [root@s35lp76 perf]#
      
      It turns out commit e2091ced ("perf tools: Add MEM_TOPOLOGY feature
      to perf data file") tries to open directory named /sys/devices/system/node/
      which does not exist on s390.
      
      This is the call stack:
       __cmd_record
       +---> perf_session__write_header
             +---> perf_header__adds_write
                   +---> do_write_feat
      	           +---> write_mem_topology
      		         +---> build_mem_topology
      			       prints warning
      
      The issue starts in do_write_feat() which unconditionally loops over all
      features and now includes HEADER_MEM_TOPOLOGY and calls write_mem_topology().
      
      Function record__init_features() at the beginning of __cmd_record() sets
      all features and then turns off some of them.
      
      Fix this by changing the warning to a level 2 debug output statement.
      
      So it is only shown when debug level 2 or higher is set.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180412133246.92801-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f75f1cb
  5. 12 4月, 2018 3 次提交
  6. 09 4月, 2018 3 次提交
    • S
      perf tests clang: Fix function name for clang IR test · fcbd8fa4
      Sandipan Das 提交于
      As stated in tests/llvm-src-base.c, the name of the bpf function should
      be "bpf_func__SyS_epoll_pwait" but this clang test fails as it tries to
      lookup "bpf_func__SyS_epoll_wait".
      
      Before applying patch:
      
      55: builtin clang support                                 :
      55.1: builtin clang compile C source to IR                : FAILED!
      55.2: builtin clang compile C source to ELF object        : Skip
      
      After applying patch:
      
      55: builtin clang support                                 :
      55.1: builtin clang compile C source to IR                : Ok
      55.2: builtin clang compile C source to ELF object        : Ok
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: e67d52d4 ("perf clang: Update test case to use real BPF script")
      Link: http://lkml.kernel.org/r/20180404180419.19056-3-sandipan@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fcbd8fa4
    • S
      perf clang: Add support for recent clang versions · 7854e499
      Sandipan Das 提交于
      The clang API calls used by perf have changed in recent releases and
      builds succeed with libclang-3.9 only. This introduces compatibility
      with libclang-4.0 and above.
      
      Without this patch, we will see the following compilation errors with
      libclang-4.0+:
      
       util/c++/clang.cpp: In function ‘clang::CompilerInvocation* perf::createCompilerInvocation(llvm::opt::ArgStringList, llvm::StringRef&, clang::DiagnosticsEngine&)’:
       util/c++/clang.cpp:62:33: error: ‘IK_C’ was not declared in this scope
         Opts.Inputs.emplace_back(Path, IK_C);
                                        ^~~~
       util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::Module> perf::getModuleFromSource(llvm::opt::ArgStringList, llvm::StringRef, llvm::IntrusiveRefCntPtr<clang::vfs::FileSystem>)’:
       util/c++/clang.cpp:75:26: error: no matching function for call to ‘clang::CompilerInstance::setInvocation(clang::CompilerInvocation*)’
         Clang.setInvocation(&*CI);
                                 ^
       In file included from util/c++/clang.cpp:14:0:
       /usr/include/clang/Frontend/CompilerInstance.h:231:8: note: candidate: void clang::CompilerInstance::setInvocation(std::shared_ptr<clang::CompilerInvocation>)
          void setInvocation(std::shared_ptr<CompilerInvocation> Value);
               ^~~~~~~~~~~~~
      
      Committer testing:
      
      Tested on Fedora 27 after installing the clang-devel and llvm-devel
      packages, versions:
      
        # rpm -qa | egrep llvm\|clang
        llvm-5.0.1-6.fc27.x86_64
        clang-libs-5.0.1-5.fc27.x86_64
        clang-5.0.1-5.fc27.x86_64
        clang-tools-extra-5.0.1-5.fc27.x86_64
        llvm-libs-5.0.1-6.fc27.x86_64
        llvm-devel-5.0.1-6.fc27.x86_64
        clang-devel-5.0.1-5.fc27.x86_64
        #
      
      Make sure you don't have some older version lying around in /usr/local,
      etc, then:
      
        $ make LIBCLANGLLVM=1 -C tools/perf install-bin
      
      And in the end perf will be linked agains these libraries:
      
        # ldd ~/bin/perf | egrep -i llvm\|clang
      	libclangAST.so.5 => /lib64/libclangAST.so.5 (0x00007f8bb2eb4000)
      	libclangBasic.so.5 => /lib64/libclangBasic.so.5 (0x00007f8bb29e3000)
      	libclangCodeGen.so.5 => /lib64/libclangCodeGen.so.5 (0x00007f8bb23f7000)
      	libclangDriver.so.5 => /lib64/libclangDriver.so.5 (0x00007f8bb2060000)
      	libclangFrontend.so.5 => /lib64/libclangFrontend.so.5 (0x00007f8bb1d06000)
      	libclangLex.so.5 => /lib64/libclangLex.so.5 (0x00007f8bb1a3e000)
      	libclangTooling.so.5 => /lib64/libclangTooling.so.5 (0x00007f8bb17d4000)
      	libclangEdit.so.5 => /lib64/libclangEdit.so.5 (0x00007f8bb15c5000)
      	libclangSema.so.5 => /lib64/libclangSema.so.5 (0x00007f8bb0cc9000)
      	libclangAnalysis.so.5 => /lib64/libclangAnalysis.so.5 (0x00007f8bb0a23000)
      	libclangParse.so.5 => /lib64/libclangParse.so.5 (0x00007f8bb0725000)
      	libclangSerialization.so.5 => /lib64/libclangSerialization.so.5 (0x00007f8bb039a000)
      	libLLVM-5.0.so => /lib64/libLLVM-5.0.so (0x00007f8bace98000)
      	libclangASTMatchers.so.5 => /lib64/../lib64/libclangASTMatchers.so.5 (0x00007f8bab735000)
      	libclangFormat.so.5 => /lib64/../lib64/libclangFormat.so.5 (0x00007f8bab4b2000)
      	libclangRewrite.so.5 => /lib64/../lib64/libclangRewrite.so.5 (0x00007f8bab2a1000)
      	libclangToolingCore.so.5 => /lib64/../lib64/libclangToolingCore.so.5 (0x00007f8bab08e000)
        #
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: 00b86691 ("perf clang: Add builtin clang support ant test case")
      Link: http://lkml.kernel.org/r/20180404180419.19056-2-sandipan@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7854e499
    • A
      perf tools: No need to include namespaces.h in util.h · ad0902e0
      Arnaldo Carvalho de Melo 提交于
      The only thing that is needed there is a forward declaration for 'struct
      nsinfo', so disentanble this, which in turns allows built-in clang
      builds, i.e. 'make LIBCLANGLLVM=1 -C tools/perf'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-vq26rsuwq1cqylpcyvq89c84@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad0902e0
  7. 06 4月, 2018 2 次提交
  8. 05 4月, 2018 2 次提交
  9. 04 4月, 2018 2 次提交
  10. 03 4月, 2018 3 次提交
  11. 02 4月, 2018 1 次提交
    • K
      perf tools: Add a "dso_size" sort order · b74d12d5
      Kim Phillips 提交于
      Add DSO size to perf report/top sort output list.
      
      This includes adding a map__size fn to map.h, which is
      approximately equal to the DSO data file_size:
      
        DSO				file size	map (end-start)	file / (end-start)
        libwebkit2gtk-4.0.so.37.24.9	43260072	41295872	95%
        libglib-2.0.so.0.5400.1		 1125680	 1118208	99%
        libc-2.26.so			 1960656 	 1925120	101%
        libdbus-1.so.3.14.13		  309456 	  303104	102%
      
      Sample output:
      
        $ ./perf report -s dso_size,dso
        Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
        Overhead  DSO size  Shared Object
          90.62%   unknown  [unknown]
           2.87%   1118208  libglib-2.0.so.0.5400.1
           1.92%    303104  libdbus-1.so.3.14.13
           1.42%   1925120  libc-2.26.so
           0.77%  41295872  libwebkit2gtk-4.0.so.37.24.9
           0.61%    335872  libgobject-2.0.so.0.5400.1
           0.41%   1052672  libgdk-3.so.0.2200.25
           0.36%    106496  libpthread-2.26.so
           0.29%    221184  dbus-daemon
           0.17%    159744  ld-2.26.so
           0.13%     49152  libwayland-client.so.0.3.0
           0.12%   1642496  libgio-2.0.so.0.5400.1
           0.09%   73277443  libgtk-3.so.0.2200.25
           0.09%  12324864  libmozjs-52.so.0.0.0
           0.05%   4796416  perf
           0.04%    843776  libgjs.so.0.0.0
           0.03%   1409024  libmutter-clutter-1.so
      
      Committer testing:
      
      To sort by DSO size, use:
      
        # perf report -F dso_size,dso,overhead -s dso_size
        <SNIP>
           3465216  libdns-export.so.174.0.1   0.00%
           3522560  libgc.so.1.0.3             0.00%
           3538944  libbfd-2.29-13.fc27.so     0.59%
           3670016  libunistring.so.2.1.0      0.00%
           3723264  libguile-2.0.so.22.8.1     0.00%
           3776512  libgio-2.0.so.0.5400.3     0.00%
           3891200  libc-2.26.so               0.96%
           3944448  libmozjs-17.0.so           0.00%
           4218880  libperl.so.5.26.1          0.18%
           4452352  libpython2.7.so.1.0        0.02%
           4472832  perf                       0.02%
           4603904  git                        0.01%
           4751360  libcrypto.so.1.1.0g        0.00%
           5005312  libslang.so.2.3.1          0.00%
           7315456  libgtk-3.so.0.2200.26      0.09%
           8818688  i965_dri.so                2.46%
           8818688  i965_dri.so (deleted)      1.26%
          12414976  libmozjs-52.so.0.0.0       0.03%
          23642112  cc1                        2.02%
          27889664  [kernel.kallsyms]         25.41%
          80834560  libxul.so (deleted)       15.68%
          98078720  chrome                    32.03%
        1056964608  [kernel.kallsyms]          1.59%
        #
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b74d12d5
  12. 28 3月, 2018 2 次提交
  13. 24 3月, 2018 5 次提交
    • A
      perf annotate: Use absolute addresses to calculate jump target offsets · 980b68ec
      Arnaldo Carvalho de Melo 提交于
      These types of jumps were confusing the annotate browser:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        Percent│ffffffff81a00020:   swapgs
        <SNIP>
               │ffffffff81a00128: ↓ jae    ffffffff81a00139 <syscall_return_via_sysret+0x53>
        <SNIP>
               │ffffffff81a00155: → jmpq   *0x825d2d(%rip)   # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      I.e. the syscall_return_via_sysret function is actually "inside" the
      entry_SYSCALL_64 function, and the offsets in jumps like these (+0x53)
      are relative to syscall_return_via_sysret, not to syscall_return_via_sysret.
      
      Or this may be some artifact in how the assembler marks the start and
      end of a function and how this ends up in the ELF symtab for vmlinux,
      i.e. syscall_return_via_sysret() isn't "inside" entry_SYSCALL_64, but
      just right after it.
      
      From readelf -sw vmlinux:
      
       80267: ffffffff81a00020   315 NOTYPE  GLOBAL DEFAULT    1 entry_SYSCALL_64
         316: ffffffff81a000e6     0 NOTYPE  LOCAL  DEFAULT    1 syscall_return_via_sysret
      
       0xffffffff81a00020 + 315 > 0xffffffff81a000e6
      
      So instead of looking for offsets after that last '+' sign, calculate
      offsets for jump target addresses that are inside the function being
      disassembled from the absolute address, 0xffffffff81a00139 in this case,
      subtracting from it the objdump address for the start of the function
      being disassembled, entry_SYSCALL_64() in this case.
      
      So, before this patch:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
      Percent│       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↑ jmp    6c
             │       mov    %cr3,%rdi
             │     ↑ jmp    62
             │       mov    %rdi,%rax
             │       and    $0x7ff,%rdi
             │       bt     %rdi,%gs:0x2219a
             │     ↑ jae    53
             │       btr    %rdi,%gs:0x2219a
             │       mov    %rax,%rdi
             │     ↑ jmp    5b
      
      After:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
        0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      With those at least navigating to the right destination, an improvement
      for these cases seems to be to be to somehow mark those inner functions,
      which in this case could be:
      
      entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
             │syscall_return_via_sysret:
             │       pop    %r15
             │       pop    %r14
             │       pop    %r13
             │       pop    %r12
             │       pop    %rbp
             │       pop    %rbx
             │       pop    %rsi
             │       pop    %r10
             │       pop    %r9
             │       pop    %r8
             │       pop    %rax
             │       pop    %rsi
             │       pop    %rdx
             │       pop    %rsi
             │       mov    %rsp,%rdi
             │       mov    %gs:0x5004,%rsp
             │       pushq  0x28(%rdi)
             │       pushq  (%rdi)
             │       push   %rax
             │     ↓ jmp    132
             │       mov    %cr3,%rdi
             │    ┌──jmp    128
             │    │  mov    %rdi,%rax
             │    │  and    $0x7ff,%rdi
             │    │  bt     %rdi,%gs:0x2219a
             │    │↓ jae    119
             │    │  btr    %rdi,%gs:0x2219a
             │    │  mov    %rax,%rdi
             │    │↓ jmp    121
             │119:│  mov    %rax,%rdi
             │    │  bts    $0x3f,%rdi
             │121:│  or     $0x800,%rdi
             │128:└─→or     $0x1000,%rdi
             │       mov    %rdi,%cr3
             │132:   pop    %rax
             │       pop    %rdi
             │       pop    %rsp
             │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>
      
      This all gets much better viewed if one uses 'perf report --ignore-vmlinux'
      forcing the usage of /proc/kcore + /proc/kallsyms, when the above
      actually gets down to:
      
        # perf report --ignore-vmlinux
        ## do '/64', will show the function names containing '64',
        ## navigate to /entry_SYSCALL_64_after_hwframe.annotation,
        ## press 'A' to annotate, then 'P' to print that annotation
        ## to a file
        ## From another xterm (or see on screen, this 'P' thing is for
        ## getting rid of those right side scroll bars/spaces):
        # cat /entry_SYSCALL_64_after_hwframe.annotation
        entry_SYSCALL_64_after_hwframe() /proc/kcore
        Event: cycles:ppp
      
        Percent
                    Disassembly of section load0:
      
                    ffffffff9aa00044 <load0>:
         11.97        push   %rax
          4.85        push   %rdi
                      push   %rsi
          2.59        push   %rdx
          2.27        push   %rcx
          0.32        pushq  $0xffffffffffffffda
          1.29        push   %r8
                      xor    %r8d,%r8d
          1.62        push   %r9
          0.65        xor    %r9d,%r9d
          1.62        push   %r10
                      xor    %r10d,%r10d
          5.50        push   %r11
                      xor    %r11d,%r11d
          3.56        push   %rbx
                      xor    %ebx,%ebx
          4.21        push   %rbp
                      xor    %ebp,%ebp
          2.59        push   %r12
          0.97        xor    %r12d,%r12d
          3.24        push   %r13
                      xor    %r13d,%r13d
          2.27        push   %r14
                      xor    %r14d,%r14d
          4.21        push   %r15
                      xor    %r15d,%r15d
          0.97        mov    %rsp,%rdi
          5.50      → callq  do_syscall_64
         14.56        mov    0x58(%rsp),%rcx
          7.44        mov    0x80(%rsp),%r11
          0.32        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        shl    $0x10,%rcx
          0.32        sar    $0x10,%rcx
          3.24        cmp    %rcx,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          2.27        cmpq   $0x33,0x88(%rsp)
          1.29      → jne    swapgs_restore_regs_and_return_to_usermode
                      mov    0x30(%rsp),%r11
          8.74        cmp    %r11,0x90(%rsp)
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        test   $0x10100,%r11
                    → jne    swapgs_restore_regs_and_return_to_usermode
          0.32        cmpq   $0x2b,0xa0(%rsp)
          0.65      → jne    swapgs_restore_regs_and_return_to_usermode
      
      I.e. using kallsyms makes the function start/end be done differently
      than using what is in the vmlinux ELF symtab and actually the hits
      goes to entry_SYSCALL_64_after_hwframe, which is a GLOBAL() after the
      start of entry_SYSCALL_64:
      
        ENTRY(entry_SYSCALL_64)
                UNWIND_HINT_EMPTY
        <SNIP>
                pushq   $__USER_CS                      /* pt_regs->cs */
                pushq   %rcx                            /* pt_regs->ip */
        GLOBAL(entry_SYSCALL_64_after_hwframe)
                pushq   %rax                            /* pt_regs->orig_ax */
      
                PUSH_AND_CLEAR_REGS rax=$-ENOSYS
      
      And it goes and ends at:
      
                cmpq    $__USER_DS, SS(%rsp)            /* SS must match SYSRET */
                jne     swapgs_restore_regs_and_return_to_usermode
      
                /*
                 * We win! This label is here just for ease of understanding
                 * perf profiles. Nothing jumps here.
                 */
        syscall_return_via_sysret:
                /* rcx and r11 are already restored (see code above) */
                UNWIND_HINT_EMPTY
                POP_REGS pop_rdi=0 skip_r11rcx=1
      
      So perhaps some people should really just play with '--ignore-vmlinux'
      to force /proc/kcore + kallsyms.
      
      One idea is to do both, i.e. have a vmlinux annotation and a
      kcore+kallsyms one, when possible, and even show the patched location,
      etc.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-r11knxv8voesav31xokjiuo6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      980b68ec
    • A
      perf annotate: Defer searching for comma in raw line till it is needed · c448234c
      Arnaldo Carvalho de Melo 提交于
      That strchr() in jump__scnprintf() needs to be nuked somehow, as it,
      IIRC is already done in jump__parse() and if needed at scnprintf() time,
      should be stashed in the struct filled in parse() time.
      
      For now jus defer it to just before where it is used.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-j0t5hagnphoz9xw07bh3ha3g@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c448234c
    • A
      perf annotate: Support jumping from one function to another · e4cc91b8
      Arnaldo Carvalho de Melo 提交于
      For instance:
      
        entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f/build/vmlinux
          5.50 │     → callq  do_syscall_64
         14.56 │       mov    0x58(%rsp),%rcx
          7.44 │       mov    0x80(%rsp),%r11
          0.32 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       shl    $0x10,%rcx
          0.32 │       sar    $0x10,%rcx
          3.24 │       cmp    %rcx,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          2.27 │       cmpq   $0x33,0x88(%rsp)
          1.29 │     → jne    swapgs_restore_regs_and_return_to_usermode
               │       mov    0x30(%rsp),%r11
          8.74 │       cmp    %r11,0x90(%rsp)
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       test   $0x10100,%r11
               │     → jne    swapgs_restore_regs_and_return_to_usermode
          0.32 │       cmpq   $0x2b,0xa0(%rsp)
          0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
      
      It'll behave just like a "call" instruction, i.e. press enter or right
      arrow over one such line and the browser will navigate to the annotated
      disassembly of that function, which when exited, via left arrow or esc,
      will come back to the calling function.
      
      Now to support jump to an offset on a different function...
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e4cc91b8
    • A
      perf annotate: Add "_local" to jump/offset validation routines · 2eff0611
      Arnaldo Carvalho de Melo 提交于
      Because they all really check if we can access data structures/visual
      constructs where a "jump" instruction targets code in the same function,
      i.e. things like:
      
        __pthread_mutex_lock  /usr/lib64/libpthread-2.26.so
        1.95 │       mov    __pthread_force_elision,%ecx
             │    ┌──test   %ecx,%ecx
        0.07 │    ├──je     60
             │    │  test   $0x300,%esi
             │    │↓ jne    60
             │    │  or     $0x100,%esi
             │    │  mov    %esi,0x10(%rdi)
             │ 42:│  mov    %esi,%edx
             │    │  lea    0x16(%r8),%rsi
             │    │  mov    %r8,%rdi
             │    │  and    $0x80,%edx
             │    │  add    $0x8,%rsp
             │    │→ jmpq   __lll_lock_elision
             │    │  nop
        0.29 │ 60:└─→and    $0x80,%esi
        0.07 │       mov    $0x1,%edi
        0.29 │       xor    %eax,%eax
        2.53 │       lock   cmpxchg %edi,(%r8)
      
      And not things like that "jmpq __lll_lock_elision", that instead should behave
      like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2eff0611
    • P
      perf python: Reference Py_None before returning it · 83428f2f
      Petr Machata 提交于
      Python None objects are handled just like all the other objects with
      respect to their reference counting. Before returning Py_None, its
      reference count thus needs to be bumped.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Machata <petrm@mellanox.com>
      Link: http://lkml.kernel.org/r/b1e565ecccf68064d8d54f37db5d028dda8fa522.1521675563.git.petrm@mellanox.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83428f2f
  14. 22 3月, 2018 2 次提交
  15. 21 3月, 2018 4 次提交
    • A
      perf annotate: No need to calculate notes->start twice · 425859ff
      Arnaldo Carvalho de Melo 提交于
      Since we already set notes->start to map__rip_2objdump(map, sym->start)
      in symbol__annotate2(), no need to calculate that address again in
      symbol__calc_lines(), just use notes->start.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ycxlg8mm5ueuj21w6gi62l7g@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      425859ff
    • A
      perf annotate browser: Add 'P' hotkey to dump annotation to file · d9bd7665
      Arnaldo Carvalho de Melo 提交于
      Just like we have in the histograms browser used as the main screen for
      'perf top --tui' and 'perf report --tui', to print the current
      annotation to a file with a named composed by the symbol name and the
      ".annotation" suffix.
      
      Here is one example of pressing 'A' on 'perf top' to live annotate a
      kernel function and then press 'P' to dump that annotation, the
      resulting file:
      
        # cat _raw_spin_lock_irqsave.annotation
        _raw_spin_lock_irqsave() /proc/kcore
        Event: cycles:ppp
      
          7.14        nop
         21.43        push   %rbx
          7.14        pushfq
                      pop    %rax
                      nop
                      mov    %rax,%rbx
                      cli
                      nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         64.29        lock   cmpxchg %edx,(%rdi)
                      test   %eax,%eax
                    ↓ jne    2b
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  queued_spin_lock_slowpath
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zzmnrwugb5vtk7bvg0rbx150@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d9bd7665
    • A
      perf annotate: Add function header to --stdio2 · 864298f2
      Arnaldo Carvalho de Melo 提交于
        # perf annotate --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
        Event: anon group { cycles, instructions }
      
          0.00   3.17      → callq  __fentry__
          0.00   7.94        push   %rbx
          7.69  36.51      → callq  __page_file_index
                             mov    %rax,%rbx
          7.69   3.17      → callq  *ffffffff82225cd0
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  queued_spin_lock_slowpath
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-i86yfyzl8m194ioxgj1jo32f@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      864298f2
    • A
      perf annotate: Use the default annotation options for --stdio2 · 35632892
      Arnaldo Carvalho de Melo 提交于
      With an empty '[annotate]' section in ~/.perfconfig:
      
        # perf record -a --all-kernel -e '{cycles,instructions}:P' sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.243 MB perf.data (5513 samples) ]
        # perf annotate --stdio2 _raw_spin_lock | head -20
      
                           Disassembly of section .text:
      
                           ffffffff81868790 <_raw_spin_lock>:
                           _raw_spin_lock():
                           EXPORT_SYMBOL(_raw_spin_trylock_bh);
                           #endif
      
                           #ifndef CONFIG_INLINE_SPIN_LOCK
                           void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                           {
                           → callq  __fentry__
                           atomic_cmpxchg():
                                   return xadd(&v->counter, -i);
                           }
      
                           static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                           {
        # perf annotate --stdio2 _raw_spin_lock | head -20
                           → callq  __fentry__
                             xor    %eax,%eax
                             mov    $0x1,%edx
         87.50 100.00        lock   cmpxchg %edx,(%rdi)
          6.25   0.00        test   %eax,%eax
                           ↓ jne    16
          6.25   0.00        repz   retq
                       16:   mov    %eax,%esi
                           ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
        #
        # cat ~/.perfconfig
        [annotate]
      
          hide_src_code = false
          show_linenr = true
        # perf annotate --stdio2 _raw_spin_lock | head -20
      
                       3   Disassembly of section .text:
      
                       5   ffffffff81868790 <_raw_spin_lock>:
                       6   _raw_spin_lock():
                       143 EXPORT_SYMBOL(_raw_spin_trylock_bh);
                       144 #endif
      
                       146 #ifndef CONFIG_INLINE_SPIN_LOCK
                       147 void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                       148 {
                           → callq  __fentry__
                       150 atomic_cmpxchg():
                       187         return xadd(&v->counter, -i);
                       188 }
      
                       190 static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                       191 {
        #
        # cat ~/.perfconfig
        [annotate]
      
          hide_src_code = true
          show_total_period = true
        # perf annotate --stdio2 _raw_spin_lock | head -20
                                     → callq  __fentry__
                                       xor    %eax,%eax
                                       mov    $0x1,%edx
            1411316      152339        lock   cmpxchg %edx,(%rdi)
             344694           0        test   %eax,%eax
                                     ↓ jne    16
              80806           0        repz   retq
                                 16:   mov    %eax,%esi
                                     ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-nu4rxg5zkdtgs1b2gc40p7v7@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      35632892