1. 04 7月, 2017 1 次提交
    • A
      perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip · 97365e81
      Arnaldo Carvalho de Melo 提交于
      We should set attr.exclude_kernel when probing for attr.precise_ip
      level, otherwise !CAP_SYS_ADMIN users will not default to skidless
      samples in capable hardware.
      
      The increase in the paranoid level in commit 0161028b ("perf/core:
      Change the default paranoia level to 2") broke this, fix it by excluding
      kernel samples when probing.
      
      Before:
      
        $ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (6 samples) ]
        $ perf evlist -v
        cycles:u: sample_freq: 4000, sample_type: IP|TID|TIME|PERIOD, exclude_kernel: 1
      
      After:
      
        $ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
        $ perf evlist -v
        cycles:ppp: sample_freq: 4000, sample_type: IP|TID|TIME|PERIOD, exclude_kernel: 1, precise_ip: 3
                                                                                           ^^^^^^^^^^^^^
                                                                                           ^^^^^^^^^^^^^
                                                                                           ^^^^^^^^^^^^^
        $
      
      To further clarify: we always set .exclude_kernel when non !CAP_SYS_ADMIN
      users profile, its just on the attr.precise_ip probing that we weren't doing
      so, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 7f8d1ade ("perf tools: By default use the most precise "cycles" hw counter available")
      Link: http://lkml.kernel.org/n/tip-t2qttwhbnua62o5gt75cueml@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97365e81
  2. 26 6月, 2017 1 次提交
    • J
      perf machine: Fix segfault for kernel.kptr_restrict=2 · 3f938ee2
      Jiri Olsa 提交于
      Michael reported the segfault when kernel.kptr_restrict=2 is set.
      
        $ perf record ls
        ...
        perf: Segmentation fault
        Obtained 16 stack frames.
        ./perf(dump_stack+0x2d) [0x5068df]
        ./perf(sighandler_dump_stack+0x2d) [0x5069bf]
        ./perf() [0x43e47b]
        /lib64/libc.so.6(+0x3594f) [0x7f762004794f]
        /lib64/libc.so.6(strlen+0x26) [0x7f762009ef86]
        /lib64/libc.so.6(__strdup+0xd) [0x7f762009ecbd]
        ./perf(maps__set_kallsyms_ref_reloc_sym+0x4d) [0x51590f]
        ./perf(machine__create_kernel_maps+0x136) [0x50a7de]
        ./perf(perf_session__create_kernel_maps+0x2c) [0x510a81]
        ./perf(perf_session__new+0x13d) [0x510e23]
        ./perf() [0x43fd61]
        ./perf(cmd_record+0x704) [0x441823]
        ./perf() [0x4bc1a0]
        ./perf() [0x4bc40d]
        ./perf() [0x4bc55f]
        ./perf(main+0x2d5) [0x4bc939]
        Segmentation fault (core dumped)
      
      The reason is that with kernel.kptr_restrict=2, we don't get
      the symbol from machine__get_running_kernel_start, which we
      want to use in maps__set_kallsyms_ref_reloc_sym and we crash.
      
      Check the symbol name value before calling
      maps__set_kallsyms_ref_reloc_sym() and succeed without ref_reloc_sym
      being set. It's safe because we check its existence before we use it.
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170626095153.553-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f938ee2
  3. 23 6月, 2017 1 次提交
    • B
      perf probe: Fix probe definition for inlined functions · 7598f8bc
      Björn Töpel 提交于
      In commit 613f050d ("perf probe: Fix to probe on gcc generated
      functions in modules"), the offset from symbol is, incorrectly, added
      to the trace point address. This leads to incorrect probe trace points
      for inlined functions and when using relative line number on symbols.
      
      Prior this patch:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
      
      After:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
      
      Committer testing:
      
      Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
      the functions that while not being explicitely marked as inline, were inlined
      by the compiler:
      
        # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
        __ew32
        e1000_regdump
        e1000e_dump_ps_pages
        e1000_desc_unused
        e1000e_systim_to_hwtstamp
        e1000e_rx_hwtstamp
        e1000e_update_rdt_wa
        e1000e_update_tdt_wa
        e1000_put_txbuf
        e1000_consume_page
      
      Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
      them:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
      e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
      that function:
      
        $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
        <SNIP>
        <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
          <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
          <13e287>   DW_AT_low_pc      : 0x17a30
        <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
          <13e6f4>   DW_AT_low_pc      : 0x17be6
        <SNIP>
        <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
           <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page
      
      So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
      inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
      address, gives us the offset we should use in the probe definition:
      
        0x17be6 - 0x17a30 = 438
      
      but above we have 876, which is twice as much.
      
      Lets see the second inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
        <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
          <13e873>   DW_AT_low_pc      : 0x17d21
      
        0x17d21 - 0x17a30 = 753
      
      So we where adding it at twice the offset from the containing function as we
      should.
      
      And then after this patch:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
        #
      
      Which matches the two first expansions and shows that because we were
      doubling the offset it would spill over the next function:
      
        readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
         673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
         674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps
      
      This is the 3rd inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
         <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
          <13ec7c>   DW_AT_low_pc      : 0x17f79
      
        0x17f79 - 0x17a30 = 1353
      
       So:
      
         0x17a30 + 2 * 1353 = 0x184c2
      
        And:
      
         0x184c2 - 0x18090 = 1074
      
      Which explains the bogus third expansion for e1000_consume_page() to end up at:
      
         p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      All fixed now :-)
      
      [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 613f050d ("perf probe: Fix to probe on gcc generated functions in modules")
      Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7598f8bc
  4. 20 6月, 2017 1 次提交
  5. 17 6月, 2017 1 次提交
    • M
      perf unwind: Report module before querying isactivation in dwfl unwind · 9126cbba
      Milian Wolff 提交于
      The PC returned by dwfl_frame_pc() may map into a not-yet-reported
      module. We have to report it before we continue unwinding. But when we
      query for the isactivation flag in dwfl_frame_pc, libdw will actually do
      one more unwinding step internally which can then break and lead to
      missed frames or broken stacks.
      
      With libunwind we get e.g.:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
        heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	           f5a1c QGuiApplicationPrivate::createPlatformIntegration (/usr/lib/libQt5Gui.so.5.8.0)
      	           f650c QGuiApplicationPrivate::createEventDispatcher (/usr/lib/libQt5Gui.so.5.8.0)
      	          298524 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      ~~~~~
      
      Note the two frames 1589e8 and 78622 in the first sample. These are
      missing when unwinding with libdw. The second sample's breakage is
      more obvious:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
      heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	          723dbf [unknown] ([unknown])
      ~~~~~
      
      This patch fixes this issue and the libdw unwinder mimicks the libunwind
      behavior more closely.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9126cbba
  6. 16 6月, 2017 1 次提交
  7. 15 6月, 2017 2 次提交
    • J
      perf tools: Fix build with ARCH=x86_64 · 7a759cd8
      Jiada Wang 提交于
      With commit: 0a943cb1 (tools build: Add HOSTARCH Makefile variable)
      when building for ARCH=x86_64, ARCH=x86_64 is passed to perf instead of
      ARCH=x86, so the perf build process searchs header files from
      tools/arch/x86_64/include, which doesn't exist.
      
      The following build failure is seen:
      
        In file included from util/event.c:2:0:
          tools/include/uapi/linux/mman.h:4:27: fatal error: uapi/asm/mman.h: No such file or directory
          compilation terminated.
      
      Fix this issue by using SRCARCH instead of ARCH in perf, just like the
      main kernel Makefile and tools/objtool's.
      Signed-off-by: NJiada Wang <jiada_wang@mentor.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Rui Teng <rui.teng@linux.vnet.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 0a943cb1 ("tools build: Add HOSTARCH Makefile variable")
      Link: http://lkml.kernel.org/r/1491793357-14977-2-git-send-email-jiada_wang@mentor.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a759cd8
    • A
      perf evsel: Fix probing of precise_ip level for default cycles event · 7a1ac110
      Arnaldo Carvalho de Melo 提交于
      Since commit 18e7a45a ("perf/x86: Reject non sampling events with
      precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
      with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
      in the routine used to probe the max precise level when no events were
      passed to 'perf record' or 'perf top', i.e.:
      
      	perf_evsel__new_cycles()
      		perf_event_attr__set_max_precise_ip()
      
      The x86 code, in x86_pmu_hw_config(), which is called all the way from
      sys_perf_event_open() did, starting with the aforementioned commit:
      
                      /* There's no sense in having PEBS for non sampling events: */
                      if (!is_sampling_event(event))
                              return -EINVAL;
      
      Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
      just the non precise cycles variant.
      
      To make sure that this is the case, I tested it, before this patch,
      with:
      
        # perf probe -L x86_pmu_hw_config
        <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
              0  int x86_pmu_hw_config(struct perf_event *event)
              1  {
              2         if (event->attr.precise_ip) {
      <SNIP>
             17                 if (event->attr.precise_ip > precise)
             18                         return -EOPNOTSUPP;
      
                                /* There's no sense in having PEBS for non sampling events: */
             21                 if (!is_sampling_event(event))
             22                         return -EINVAL;
                        }
      <SNIP>
        # perf probe x86_pmu_hw_config:22
        Added new events:
          probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
          probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
      
        You can now use it in all perf tools, such as:
      
              perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
      
        # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
           0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
          41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
          41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
          41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
          41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
        #
      
      I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
      
      So fix it by just setting attr.sample_period
      
      Now, after this patch:
      
        # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
           0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_open_cloexec_flag (/home/acme/bin/perf)
           0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
           0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
        [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
        #
      
      The probe one called from perf_event_attr__set_max_precise_ip() works
      the first time, with attr.precise_ip = 3, wit hthe next ones being the
      per cpu ones for the cycles:ppp event.
      
      And here is the text from a report and alternative proposed patch by
      Thomas-Mich Richter:
      
       ---
      
      On s390 the counter and sampling facility do not support a precise IP
      skid level and sometimes returns EOPNOTSUPP when structure member
      precise_ip in struct perf_event_attr is not set to zero.
      
      On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
      happens only when no events are specified on command line.
      
      The functions called are
      ...
        --> perf_evlist__add_default
            --> perf_evsel__new_cycles
                --> perf_event_attr__set_max_precise_ip
      
      The last function determines the value of structure member precise_ip by
      invoking the perf_event_open() system call and checking the return code.
      The first successful open is the value for precise_ip.
      
      However the value is determined without setting member sample_period and
      indicates no sampling.
      
      On s390 the counter facility and sampling facility are different.  The
      above procedure determines a precise_ip value of 3 using the counter
      facility. Later it uses the sampling facility with a value of 3 and
      fails with EOPNOTSUPP.
      
       ---
      
      v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
          of unnamed union members in the container struct initialization, so
          move from:
      
      	struct perf_event_attr attr = {
      		...
      		.sample_period = 1,
      	};
      
      to right after it as:
      
      	struct perf_event_attr attr = {
      		...
      	};
      
      	attr.sample_period = 1;
      
      v3: We need to reset .sample_period to 0 to let the users of
      perf_evsel__new_cycles() to properly setup attr.sample_period or
      attr.sample_freq. Reported by Ingo Molnar.
      Reported-and-Acked-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
      Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a1ac110
  8. 09 6月, 2017 10 次提交
  9. 08 6月, 2017 6 次提交
  10. 06 6月, 2017 7 次提交
    • M
      perf report: Ensure the perf DSO mapping matches what libdw sees · 2538b9e2
      Milian Wolff 提交于
      In some situations the libdw unwinder stopped working properly.  I.e.
      with libunwind we see:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	           608f3 _GLOBAL__sub_I_kdynamicjobtracker.cpp (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      	            f199 call_init.part.0 (/usr/lib/ld-2.25.so)
      	            f2a5 _dl_init (/usr/lib/ld-2.25.so)
      	             db9 _dl_start_user (/usr/lib/ld-2.25.so)
      ~~~~~
      
      But with libdw and without this patch this sample is not properly
      unwound:
      
      ~~~~~
      heaptrack_gui  2228 135073.400112:     641314 cycles:
      	            e8ed _dl_fixup (/usr/lib/ld-2.25.so)
      	           15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so)
      	           ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0)
      ~~~~~
      
      Debug output showed me that libdw found a module for the last frame
      address, but it thinks it belongs to /usr/lib/ld-2.25.so. This patch
      double-checks what libdw sees and what perf knows. If the mappings
      mismatch, we now report the elf known to perf. This fixes the situation
      above, and the libdw unwinder produces the same stack as libunwind.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2538b9e2
    • M
      perf report: Include partial stacks unwound with libdw · 5ea0416f
      Milian Wolff 提交于
      So far the whole stack was thrown away when any error occurred before
      the maximum stack depth was unwound. This is actually a very common
      scenario though. The stacks that got unwound so far are still
      interesting. This removes a large chunk of differences when comparing
      perf script output for libunwind and libdw perf unwinding.
      
      E.g. with libunwind:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
                         192ca mmap64 (/usr/lib/ld-2.25.so)
                          59a9 _dl_map_object_from_fd (/usr/lib/ld-2.25.so)
                          83d0 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
                         1a3e0 strcmp (/usr/lib/ld-2.25.so)
                          82b2 _dl_map_object (/usr/lib/ld-2.25.so)
                          cda1 openaux (/usr/lib/ld-2.25.so)
                         1834f _dl_catch_error (/usr/lib/ld-2.25.so)
                          cfe2 _dl_map_object_deps (/usr/lib/ld-2.25.so)
                          3481 dl_main (/usr/lib/ld-2.25.so)
                         17387 _dl_sysdep_start (/usr/lib/ld-2.25.so)
                          4d37 _dl_start (/usr/lib/ld-2.25.so)
                           d87 _start (/usr/lib/ld-2.25.so)
      ~~~~~
      
      With libdw without this patch:
      
      ~~~~~
      heaptrack_gui  2228 135073.388524:     479408 cycles:
              ffffffff811749ed perf_iterate_ctx ([kernel.kallsyms])
              ffffffff81181662 perf_event_mmap ([kernel.kallsyms])
              ffffffff811cf5ed mmap_region ([kernel.kallsyms])
              ffffffff811cfe6b do_mmap ([kernel.kallsyms])
              ffffffff811b0dca vm_mmap_pgoff ([kernel.kallsyms])
              ffffffff811cdb0c sys_mmap_pgoff ([kernel.kallsyms])
              ffffffff81033acb sys_mmap ([kernel.kallsyms])
              ffffffff81631d37 entry_SYSCALL_64_fastpath ([kernel.kallsyms])
      
      heaptrack_gui  2228 135073.388677:     611329 cycles:
      ~~~~~
      
      With this patch applied, the libdw unwinder will produce the same
      output as the libunwind unwinder.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170601210021.20046-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5ea0416f
    • K
      perf annotate: Add missing powerpc triplet · 6db47fde
      Kim Phillips 提交于
      On an Ubuntu xenial system, 'perf annotate' says to install powerpc
      objdump on a system that already has binutils-powerpc-linux-gnu
      installed.  Make perf aware of the missing triplet for the
      powerpc-linux-gnu target.
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170529142754.7fbfb1152fd8f2663de0ea70@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6db47fde
    • J
      perf test: Disable breakpoint signal tests for powerpc · 598762cf
      Jiri Olsa 提交于
      The following tests are failing on powerpc:
      
        # perf test break
        18: Breakpoint overflow signal handler  : FAILED!
        19: Breakpoint overflow sampling        : FAILED!
      
      The powerpc kenel so far does not have support to even create
      instruction breakpoints using the perf event interface, so those tests
      fail early in the config phase.
      
      I added a '->is_supported()' callback to test struct to be able to
      disable specific tests. It seems better than putting ifdefs directly to
      the test array.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170601205450.GA398@kravaSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      598762cf
    • N
      perf symbols: Use correct filename for compressed modules in build-id cache · a09935b8
      Namhyung Kim 提交于
      The decompress_kmodule() decompresses kernel modules in order to load
      symbols from it.  In the DSO_BINARY_TYPE__BUILD_ID_CACHE case, it needs
      the full file path to extract the file extension to determine the
      decompression method.  But overwriting 'name' will fail the
      decompression since it might point to a non-existing old file.
      
      Instead, use dso->long_name for having the correct extension and use the
      real filename to decompress.
      
      In the DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP case, both names should
      be the same.  This allows resolving symbols in the old modules.
      
      Before:
      
        $ perf report -i perf.data.old | grep scsi_mod
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000004aa6
           0.00%  as       [scsi_mod]    [k] 0x00000000000099e1
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000009830
           0.00%  cc1      [scsi_mod]    [k] 0x0000000000001b8f
      
      After:
      
           0.00%  cc1      [scsi_mod]    [k] scsi_handle_queue_ramp_up
           0.00%  as       [scsi_mod]    [k] scsi_sg_alloc
           0.00%  cc1      [scsi_mod]    [k] scsi_setup_cmnd
           0.00%  cc1      [scsi_mod]    [k] scsi_get_command
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a09935b8
    • N
      perf symbols: Set module info when build-id event found · 6b335e8f
      Namhyung Kim 提交于
      Like machine__findnew_module_dso(), it should set necessary info for
      kernel modules to find symbol info from the file.  Factor out
      dso__set_module_info() to do it.
      
      This is needed for dso__needs_decompress() to detect such DSOs.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6b335e8f
    • N
      perf header: Set proper module name when build-id event found · 1deec1bd
      Namhyung Kim 提交于
      When perf processes build-id event, it creates DSOs with the build-id.
      But it didn't set the module short name (like '[module-name]') so when
      processing a kernel mmap event of the module, it cannot found the DSO as
      it only checks the short names.
      
      That leads for perf to create a same DSO without the build-id info and
      it'll lookup the system path even if the DSO is already in the build-id
      cache.  After kernel was updated, perf cannot find the DSO  and cannot
      show symbols in it anymore.
      
      You can see this if you have an old data file (w/ old kernel version):
      
        $ perf report -i perf.data.old -v |& grep scsi_mod
        build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz : cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1
        Failed to open /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz, continuing without symbols
        ...
      
      The second message didn't show the build-id.  With this patch:
      
        $ perf report -i perf.data.old -v |& grep scsi_mod
        build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz: cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1
        /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz with build id cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 not found, continuing without symbols
        ...
      
      Now it shows the build-id but still cannot load the symbol table.  This
      is a different problem which will be fixed in the next patch.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170531120105.21731-1-namhyung@kernel.org
      [ Fix the build on older compilers (debian <= 8, fedora <= 21, etc) wrt kmod_path var init ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1deec1bd
  11. 02 6月, 2017 2 次提交
    • A
      perf stat: Only print NMI watchdog hint when enabled · 918c7b06
      Andi Kleen 提交于
      Only print the NMI watchdog hint when that watchdog it actually enabled.
      
      This avoids printing these unnecessarily.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857wz1i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      918c7b06
    • K
      perf annotate: Fix branch instruction with multiple operands · b13bbeee
      Kim Phillips 提交于
      'perf annotate' is dropping the cr* fields from branch instructions.
      
      Fix it by adding support to display branch instructions having
      multiple operands.
      
      Power Arch objdump of int_sqrt:
      
       20.36 | c0000000004d2694:   subf   r10,r10,r3
             | c0000000004d2698: v bgt    cr6,c0000000004d26a0 <int_sqrt+0x40>
        1.82 | c0000000004d269c:   mr     r3,r10
       29.18 | c0000000004d26a0:   mr     r10,r8
             | c0000000004d26a4: v bgt    cr7,c0000000004d26ac <int_sqrt+0x4c>
             | c0000000004d26a8:   mr     r10,r7
      
      Power Arch Before Patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    4c
             |       mr     r10,r7
      
      Power Arch After patch:
      
       20.36 |       subf   r10,r10,r3
             |     v bgt    cr6,40
        1.82 |       mr     r3,r10
       29.18 | 40:   mr     r10,r8
             |     v bgt    cr7,4c
             |       mr     r10,r7
      
      Also support AArch64 conditional branch instructions, which can
      have up to three operands:
      
      Aarch64 Non-simplified (raw objdump) view:
      
             │ffff0000083cd11c: ↑ cbz    w0, ffff0000083cd100 <security_fil▒
      ...
        4.44 │ffff000│083cd134: ↓ tbnz   w0, #26, ffff0000083cd190 <securit▒
      ...
        1.37 │ffff000│083cd144: ↓ tbnz   w22, #5, ffff0000083cd1a4 <securit▒
             │ffff000│083cd148:   mov    w19, #0x20000                   //▒
        1.02 │ffff000│083cd14c: ↓ tbz    w22, #2, ffff0000083cd1ac <securit▒
      ...
        0.68 │ffff000└──3cd16c: ↑ cbnz   w0, ffff0000083cd120 <security_fil▒
      
      Aarch64 Simplified, before this patch:
      
             │    ↑ cbz    40
      ...
        4.44 │   │↓ tbnz   w0, #26, ffff0000083cd190 <security_file_permiss▒
      ...
        1.37 │   │↓ tbnz   w22, #5, ffff0000083cd1a4 <security_file_permiss▒
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ffff0000083cd1ac <security_file_permiss▒
      ...
        0.68 │   └──cbnz   60
      
      the cbz operand is missing, and the tbz doesn't get simplified processing
      at all because the parsing function failed to match an address.
      
      Aarch64 Simplified, After this patch applied:
      
             │    ↑ cbz    w0, 40
      ...
        4.44 │   │↓ tbnz   w0, #26, d0
      ...
        1.37 │   │↓ tbnz   w22, #5, e4
             │   │  mov    w19, #0x20000                   // #131072
        1.02 │   │↓ tbz    w22, #2, ec
      ...
        0.68 │   └──cbnz   w0, 60
      Originally-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@samba.org>
      Reported-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NKim Phillips <kim.phillips@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/20170601092959.f60d98912e8a1b66fd1e4c0e@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b13bbeee
  12. 01 6月, 2017 1 次提交
    • J
      perf trace: Add mmap alias for s390 · 54265664
      Jiri Olsa 提交于
      The s390 architecture maps sys_mmap (nr 90) into sys_old_mmap.  For this
      reason perf trace can't find the proper syscall event to get args format
      from and displays it wrongly as 'continued'.
      
      To fix that fill the "alias" field with "old_mmap" for trace's mmap record
      to get the correct translation.
      
      Before:
           0.042 ( 0.011 ms): vest/43052 fstat(statbuf: 0x3ffff89fd90                ) = 0
           0.042 ( 0.028 ms): vest/43052  ... [continued]: mmap()) = 0x3fffd6e2000
           0.072 ( 0.025 ms): vest/43052 read(buf: 0x3fffd6e2000, count: 4096        ) = 6
      
      After:
           0.045 ( 0.011 ms): fstat(statbuf: 0x3ffff8a0930                           ) = 0
           0.057 ( 0.018 ms): mmap(arg: 0x3ffff8a0858                                ) = 0x3fffd14a000
           0.076 ( 0.025 ms): read(buf: 0x3fffd14a000, count: 4096                   ) = 6
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170531113557.19175-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      54265664
  13. 27 5月, 2017 2 次提交
  14. 26 5月, 2017 1 次提交
  15. 24 5月, 2017 3 次提交
    • I
      tools/include: Sync kernel ABI headers with tooling headers · 6e30437b
      Ingo Molnar 提交于
      Sync (copy) the following v4.12 kernel headers to the tooling headers:
      
        arch/x86/include/asm/disabled-features.h:
        arch/x86/include/uapi/asm/kvm.h:
        arch/powerpc/include/uapi/asm/kvm.h:
        arch/s390/include/uapi/asm/kvm.h:
        arch/arm/include/uapi/asm/kvm.h:
        arch/arm64/include/uapi/asm/kvm.h:
      
         - 'struct kvm_sync_regs' got changed in an ABI-incompatible way,
           fortunately none of the (in-kernel) tooling relied on it
      
         - new KVM_DEV calls added
      
        arch/x86/include/asm/required-features.h:
      
         - 5-level paging hardware ABI detail added
      
        arch/x86/include/asm/cpufeatures.h:
      
         - new CPU feature added
      
        arch/x86/include/uapi/asm/vmx.h:
      
         - new VMX exit conditions
      
      None of the changes requires fixes in the tooling source code.
      
      This addresses the following warnings:
      
        Warning: include/uapi/linux/stat.h differs from kernel
        Warning: arch/x86/include/asm/disabled-features.h differs from kernel
        Warning: arch/x86/include/asm/required-features.h differs from kernel
        Warning: arch/x86/include/asm/cpufeatures.h differs from kernel
        Warning: arch/x86/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/x86/include/uapi/asm/vmx.h differs from kernel
        Warning: arch/powerpc/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/s390/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/arm/include/uapi/asm/kvm.h differs from kernel
        Warning: arch/arm64/include/uapi/asm/kvm.h differs from kernel
      
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524065721.j2mlch6bgk5klgbc@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e30437b
    • N
      perf tools: Put caller above callee in --children mode · 7111ffff
      Namhyung Kim 提交于
      The __hpp__sort_acc() sorts entries using callchain depth in order to
      put callers above in children mode.  But it assumed the callchain order
      was callee-first.  Now default (for children) is caller-first so the
      order of entries is reverted.
      
      For example, consider following case:
      
        $ perf report --no-children
        ..l
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ..........................
        #
            99.44%  a.out    a.out                [.] main
                    |
                    ---main
                       __libc_start_main
                       _start
      
      Then children mode should show 'start' above '__libc_start_main' since
      it's the caller (parent) of the __libc_start_main.  But it's reversed:
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.61%     0.00%  a.out    a.out            [.] _start
            99.54%    99.44%  a.out    a.out            [.] main
      
      This patch fixes it.
      
        # Children      Self  Command  Shared Object    Symbol
        # ........  ........  .......  ...............  .....................
        #
            99.61%     0.00%  a.out    a.out            [.] _start
            99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
            99.54%    99.44%  a.out    a.out            [.] main
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-8-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7111ffff
    • M
      perf report: Do not drop last inlined frame · 4d53b9d5
      Milian Wolff 提交于
      The very last inlined frame, i.e. the one furthest away from the
      non-inlined frame, was silently dropped. This is apparent when
      comparing the output of `perf script` and `addr2line`:
      
      ~~~~~~
        $ perf script --inline
        ...
        a.out 26722 80836.309329:      72425 cycles:
                           21561 __hypot_finite (/usr/lib/libm-2.25.so)
                            ace3 hypot (/usr/lib/libm-2.25.so)
                             a4a main (a.out)
                                 std::abs<double>
                                 std::_Norm_helper<true>::_S_do_it<double>
                                 std::norm<double>
                                 main
                           20510 __libc_start_main (/usr/lib/libc-2.25.so)
                             bd9 _start (a.out)
      
        $ addr2line -a -f -i -e /tmp/a.out a4a | c++filt
        0x0000000000000a4a
        std::__complex_abs(doublecomplex )
        /usr/include/c++/6.3.1/complex:589
        double std::abs<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:597
        double std::_Norm_helper<true>::_S_do_it<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:654
        double std::norm<double>(std::complex<double> const&)
        /usr/include/c++/6.3.1/complex:664
        main
        /tmp/inlining.cpp:14
      ~~~~~
      
      Note how `std::__complex_abs` is missing from the `perf script`
      output. This is similarly showing up in `perf report`. The patch
      here fixes this issue, and the output becomes:
      
      ~~~~~
        a.out 26722 80836.309329:      72425 cycles:
                           21561 __hypot_finite (/usr/lib/libm-2.25.so)
                            ace3 hypot (/usr/lib/libm-2.25.so)
                             a4a main (a.out)
                                 std::__complex_abs
                                 std::abs<double>
                                 std::_Norm_helper<true>::_S_do_it<double>
                                 std::norm<double>
                                 main
                           20510 __libc_start_main (/usr/lib/libc-2.25.so)
                             bd9 _start (a.out)
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-7-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d53b9d5