1. 17 11月, 2017 3 次提交
  2. 25 9月, 2017 1 次提交
    • A
      perf evsel: Fix attr.exclude_kernel setting for default cycles:p · f1e52f14
      Arnaldo Carvalho de Melo 提交于
      Yet another fix for probing the max attr.precise_ip setting: it is not
      enough settting attr.exclude_kernel for !root users, as they _can_
      profile the kernel if the kernel.perf_event_paranoid sysctl is set to
      -1, so check that as well.
      
      Testing it:
      
      As non root:
      
        $ sysctl kernel.perf_event_paranoid
        kernel.perf_event_paranoid = 2
        $ perf record sleep 1
        $ perf evlist -v
        cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...
      
      Now as non-root, but with kernel.perf_event_paranoid set set to the
      most permissive value, -1:
      
        $ sysctl kernel.perf_event_paranoid
        kernel.perf_event_paranoid = -1
        $ perf record sleep 1
        $ perf evlist -v
        cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
        $
      
      I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
           non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.
      
      In both cases, use the highest available precision: attr.precise_ip = 3.
      Reported-and-Tested-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: d37a3697 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
      Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1e52f14
  3. 13 9月, 2017 1 次提交
  4. 02 9月, 2017 1 次提交
  5. 28 8月, 2017 1 次提交
  6. 23 8月, 2017 1 次提交
  7. 22 8月, 2017 1 次提交
    • A
      perf evsel: Fix buffer overflow while freeing events · 475fb533
      Andi Kleen 提交于
      Fix buffer overflow for:
      
        % perf stat -e msr/tsc/,cstate_core/c7-residency/ true
      
      that causes glibc free list corruption. For some reason it doesn't
      trigger in valgrind, but it is visible in AS:
      
        =================================================================
        ==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
        READ of size 4 at 0x603000003f5c thread T0
          #0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
          #1 0x56c57a in perf_evsel__close util/evsel.c:1717
          #2 0x55ed5f in perf_evlist__close util/evlist.c:1631
          #3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
          #4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
          #5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
          #6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
          #7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
          #8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
          #9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
          #10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
          #11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
      
        0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
        allocated by thread T0 here:
          #0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
          #1 0x648a2d in zalloc util/util.h:23
          #2 0x648a88 in xyarray__new util/xyarray.c:9
          #3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
          #4 0x56b427 in perf_evsel__open util/evsel.c:1529
          #5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
          #6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
          #7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
          #8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
          #9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
          #10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
          #11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
          #12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
          #13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
          #14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
      
      The event is allocated with cpus == 1, but freed with cpus == real number
      When the evsel close function walks the file descriptors it exceeds the
      fd xyarray boundaries and reads random memory.
      
      v2:
      
      Now that xyarrays save their original dimensions we can use these to
      iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
      evsel.c to use these fields directly. This allows simplifying the code
      and dropping quite a few function arguments. Adjust all callers by
      removing the unneeded arguments.
      
      The actual perf event reading still uses the original values from the
      evsel list.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
      [ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      475fb533
  8. 27 7月, 2017 3 次提交
  9. 25 7月, 2017 1 次提交
    • J
      perf evsel: Add verbose output for sys_perf_event_open fallback · 2b04e0f8
      Jiri Olsa 提交于
      Adding info about what is being switched off in the sys_perf_event_open
      fallback.
      
      New output (notice the 'switching off' lines):
      
        $ perf stat -e '{cycles,instructions}' -vvv ls
        Using CPUID GenuineIntel-6-3D
        intel_pt default config: tsc
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0x8
        sys_perf_event_open failed, error -22
        switching off cloexec flag
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0
        sys_perf_event_open failed, error -22
        switching off exclude_guest, exclude_host
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
        ------------------------------------------------------------
        sys_perf_event_open: pid 3591  cpu -1  group_fd -1  flags 0
        sys_perf_event_open failed, error -22
        switching off sample_id_all
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          sample_type                      IDENTIFIER
          read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
          disabled                         1
          inherit                          1
          enable_on_exec                   1
        ...
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170721121212.21414-2-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b04e0f8
  10. 19 7月, 2017 3 次提交
    • J
      perf tests attr: Add test_attr__ready function · 10213e2f
      Jiri Olsa 提交于
      We create many test events before the real ones just to test specific
      features. But there's no way for attr tests to separate those test
      events from those it needs to check.
      
      Adding 'ready' call from the events open interface to trigger/start
      events collection for attr test.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20170703145030.12903-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      10213e2f
    • A
      perf evsel: Allow asking for max precise_ip in new_cycles() · 30269dc1
      Arnaldo Carvalho de Melo 提交于
      There are cases where we want to leave attr.precise_ip as zero, such
      as when using 'perf record --no-samples', where this would make the
      kernel return -EINVAL.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-4zq1udecxa51gsapyfwej5fj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      30269dc1
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  11. 11 7月, 2017 2 次提交
  12. 04 7月, 2017 1 次提交
    • A
      perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip · 97365e81
      Arnaldo Carvalho de Melo 提交于
      We should set attr.exclude_kernel when probing for attr.precise_ip
      level, otherwise !CAP_SYS_ADMIN users will not default to skidless
      samples in capable hardware.
      
      The increase in the paranoid level in commit 0161028b ("perf/core:
      Change the default paranoia level to 2") broke this, fix it by excluding
      kernel samples when probing.
      
      Before:
      
        $ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (6 samples) ]
        $ perf evlist -v
        cycles:u: sample_freq: 4000, sample_type: IP|TID|TIME|PERIOD, exclude_kernel: 1
      
      After:
      
        $ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
        $ perf evlist -v
        cycles:ppp: sample_freq: 4000, sample_type: IP|TID|TIME|PERIOD, exclude_kernel: 1, precise_ip: 3
                                                                                           ^^^^^^^^^^^^^
                                                                                           ^^^^^^^^^^^^^
                                                                                           ^^^^^^^^^^^^^
        $
      
      To further clarify: we always set .exclude_kernel when non !CAP_SYS_ADMIN
      users profile, its just on the attr.precise_ip probing that we weren't doing
      so, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 7f8d1ade ("perf tools: By default use the most precise "cycles" hw counter available")
      Link: http://lkml.kernel.org/n/tip-t2qttwhbnua62o5gt75cueml@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97365e81
  13. 20 6月, 2017 2 次提交
  14. 15 6月, 2017 1 次提交
    • A
      perf evsel: Fix probing of precise_ip level for default cycles event · 7a1ac110
      Arnaldo Carvalho de Melo 提交于
      Since commit 18e7a45a ("perf/x86: Reject non sampling events with
      precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
      with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
      in the routine used to probe the max precise level when no events were
      passed to 'perf record' or 'perf top', i.e.:
      
      	perf_evsel__new_cycles()
      		perf_event_attr__set_max_precise_ip()
      
      The x86 code, in x86_pmu_hw_config(), which is called all the way from
      sys_perf_event_open() did, starting with the aforementioned commit:
      
                      /* There's no sense in having PEBS for non sampling events: */
                      if (!is_sampling_event(event))
                              return -EINVAL;
      
      Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
      just the non precise cycles variant.
      
      To make sure that this is the case, I tested it, before this patch,
      with:
      
        # perf probe -L x86_pmu_hw_config
        <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
              0  int x86_pmu_hw_config(struct perf_event *event)
              1  {
              2         if (event->attr.precise_ip) {
      <SNIP>
             17                 if (event->attr.precise_ip > precise)
             18                         return -EOPNOTSUPP;
      
                                /* There's no sense in having PEBS for non sampling events: */
             21                 if (!is_sampling_event(event))
             22                         return -EINVAL;
                        }
      <SNIP>
        # perf probe x86_pmu_hw_config:22
        Added new events:
          probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
          probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
      
        You can now use it in all perf tools, such as:
      
              perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
      
        # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
           0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
          41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
          41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
          41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
          41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
        #
      
      I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
      
      So fix it by just setting attr.sample_period
      
      Now, after this patch:
      
        # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
           0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_open_cloexec_flag (/home/acme/bin/perf)
           0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
           0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
        [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
        #
      
      The probe one called from perf_event_attr__set_max_precise_ip() works
      the first time, with attr.precise_ip = 3, wit hthe next ones being the
      per cpu ones for the cycles:ppp event.
      
      And here is the text from a report and alternative proposed patch by
      Thomas-Mich Richter:
      
       ---
      
      On s390 the counter and sampling facility do not support a precise IP
      skid level and sometimes returns EOPNOTSUPP when structure member
      precise_ip in struct perf_event_attr is not set to zero.
      
      On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
      happens only when no events are specified on command line.
      
      The functions called are
      ...
        --> perf_evlist__add_default
            --> perf_evsel__new_cycles
                --> perf_event_attr__set_max_precise_ip
      
      The last function determines the value of structure member precise_ip by
      invoking the perf_event_open() system call and checking the return code.
      The first successful open is the value for precise_ip.
      
      However the value is determined without setting member sample_period and
      indicates no sampling.
      
      On s390 the counter facility and sampling facility are different.  The
      above procedure determines a precise_ip value of 3 using the counter
      facility. Later it uses the sampling facility with a value of 3 and
      fails with EOPNOTSUPP.
      
       ---
      
      v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
          of unnamed union members in the container struct initialization, so
          move from:
      
      	struct perf_event_attr attr = {
      		...
      		.sample_period = 1,
      	};
      
      to right after it as:
      
      	struct perf_event_attr attr = {
      		...
      	};
      
      	attr.sample_period = 1;
      
      v3: We need to reset .sample_period to 0 to let the users of
      perf_evsel__new_cycles() to properly setup attr.sample_period or
      attr.sample_freq. Reported by Ingo Molnar.
      Reported-and-Acked-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
      Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a1ac110
  15. 26 4月, 2017 1 次提交
  16. 25 4月, 2017 1 次提交
  17. 20 4月, 2017 3 次提交
  18. 13 4月, 2017 1 次提交
  19. 11 4月, 2017 1 次提交
    • J
      perf evsel: Return exact sub event which failed with EPERM for wildcards · 32ccb130
      Jin Yao 提交于
      The kernel has a special check for a specific irq_vectors trace event.
      
      TRACE_EVENT_PERF_PERM(irq_work_exit,
      	is_sampling_event(p_event) ? -EPERM : 0);
      
      The perf-record fails for this irq_vectors event when it is present,
      like when using a wildcard:
      
        root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
      
        To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
      
              kernel.perf_event_paranoid = -1
      
      This patch prints out the exact sub event that failed with EPERM for
      wildcards to help in understanding what went wrong when this event is
      present:
      
      After the patch:
      
        root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
        Error:
        No permission to enable irq_vectors:irq_work_exit event.
      
        You may not have permission to collect system-wide stats.
        ......
      
      Committer notes:
      
      So we have a lot of irq_vectors events:
      
        [root@jouet ~]# perf list irq_vectors:*
      
        List of pre-defined events (to be used in -e):
      
          irq_vectors:call_function_entry                    [Tracepoint event]
          irq_vectors:call_function_exit                     [Tracepoint event]
          irq_vectors:call_function_single_entry             [Tracepoint event]
          irq_vectors:call_function_single_exit              [Tracepoint event]
          irq_vectors:deferred_error_apic_entry              [Tracepoint event]
          irq_vectors:deferred_error_apic_exit               [Tracepoint event]
          irq_vectors:error_apic_entry                       [Tracepoint event]
          irq_vectors:error_apic_exit                        [Tracepoint event]
          irq_vectors:irq_work_entry                         [Tracepoint event]
          irq_vectors:irq_work_exit                          [Tracepoint event]
          irq_vectors:local_timer_entry                      [Tracepoint event]
          irq_vectors:local_timer_exit                       [Tracepoint event]
          irq_vectors:reschedule_entry                       [Tracepoint event]
          irq_vectors:reschedule_exit                        [Tracepoint event]
          irq_vectors:spurious_apic_entry                    [Tracepoint event]
          irq_vectors:spurious_apic_exit                     [Tracepoint event]
          irq_vectors:thermal_apic_entry                     [Tracepoint event]
          irq_vectors:thermal_apic_exit                      [Tracepoint event]
          irq_vectors:threshold_apic_entry                   [Tracepoint event]
          irq_vectors:threshold_apic_exit                    [Tracepoint event]
          irq_vectors:x86_platform_ipi_entry                 [Tracepoint event]
          irq_vectors:x86_platform_ipi_exit                  [Tracepoint event]
        #
      
      And some may be sampled:
      
        [root@jouet ~]# perf record -e irq_vectors:local* sleep 20s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.020 MB perf.data (2 samples) ]
        [root@jouet ~]# perf report -D | egrep 'stats:|events:'
        Aggregated stats:
                   TOTAL events:        155
                    MMAP events:        144
                    COMM events:          2
                    EXIT events:          1
                  SAMPLE events:          2
                   MMAP2 events:          4
          FINISHED_ROUND events:          1
               TIME_CONV events:          1
        irq_vectors:local_timer_entry stats:
                   TOTAL events:          1
                  SAMPLE events:          1
        irq_vectors:local_timer_exit stats:
                   TOTAL events:          1
                  SAMPLE events:          1
        [root@jouet ~]#
      
      But, as shown in the tracepoint definition at the start of this message,
      some, like "irq_vectors:irq_work_exit", may not be sampled, just counted,
      i.e. if we try to sample, as when using 'perf record', we get an error:
      
        [root@jouet ~]# perf record -e irq_vectors:irq_work_exit
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
      <SNIP>
      
      The error message is misleading, this patch will help in pointing out
      what is the event causing such an error, but the error message needs
      improvement, i.e. we need to figure out a way to check if a tracepoint
      is counting only, like this one, when all we can do is to count it with
      'perf stat', at most printing the delta using interval printing, as in:
      
         [root@jouet ~]# perf stat -I 5000 -e irq_vectors:irq_work_*
        #           time             counts unit events
             5.000168871                  0      irq_vectors:irq_work_entry
             5.000168871                  0      irq_vectors:irq_work_exit
            10.000676730                  0      irq_vectors:irq_work_entry
            10.000676730                  0      irq_vectors:irq_work_exit
            15.001122415                  0      irq_vectors:irq_work_entry
            15.001122415                  0      irq_vectors:irq_work_exit
            20.001298051                  0      irq_vectors:irq_work_entry
            20.001298051                  0      irq_vectors:irq_work_exit
            25.001485020                  1      irq_vectors:irq_work_entry
            25.001485020                  1      irq_vectors:irq_work_exit
            30.001658706                  0      irq_vectors:irq_work_entry
            30.001658706                  0      irq_vectors:irq_work_exit
        ^C    32.045711878                  0      irq_vectors:irq_work_entry
            32.045711878                  0      irq_vectors:irq_work_exit
      
        [root@jouet ~]#
      
      But at least, when we use a wildcard, this patch helps a bit.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1491566932-503-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      32ccb130
  20. 23 3月, 2017 2 次提交
    • A
      perf pmu: Add support for MetricName JSON attribute · 96284814
      Andi Kleen 提交于
      Add support for a new JSON event attribute to name MetricExpr for better
      output in perf stat.
      
      If the event has no MetricName it uses the normal event name instead to
      describe the metric.
      
      Before
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time unc_p_freq_max_os_cycles
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      
      After
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
                 time freq_max_os_cycles %
           1.000149775     15.7
           2.000344807     19.3
           3.000502544     16.7
           4.000640656      6.6
           5.000779955      9.9
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      96284814
    • A
      perf stat: Output JSON MetricExpr metric · 37932c18
      Andi Kleen 提交于
      Add generic infrastructure to perf stat to output ratios for
      "MetricExpr" entries in the event lists. Many events are more useful as
      ratios than in raw form, typically some count in relation to total
      ticks.
      
      Transfer the MetricExpr information from the alias to the evsel.
      
      We mark the events that need to be collected for MetricExpr, and also
      link the events using them with a pointer. The code is careful to always
      prefer the right event in the same group to minimize multiplexing
      errors. At the moment only a single relation is supported.
      
      Then add a rblist to the stat shadow code that remembers stats based on
      the cpu and context.
      
      Then finally update and retrieve and print these values similarly to the
      existing hardcoded perf metrics. We use the simple expression parser
      added earlier to evaluate the expression.
      
      Normally we just output the result without further commentary, but for
      --metric-only this would lead to empty columns. So for this case use the
      original event as description.
      
      There is no attempt to automatically add the MetricExpr event, if it is
      missing, however we suggest it to the user, because the user tool
      doesn't have enough information to reliably construct a group that is
      guaranteed to schedule. So we leave that to the user.
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
             1.000147889        800,085,181      unc_p_clockticks
             1.000147889         93,126,241      unc_p_freq_max_os_cycles  #     11.6
             2.000448381        800,218,217      unc_p_clockticks
             2.000448381        142,516,095      unc_p_freq_max_os_cycles  #     17.8
             3.000639852        800,243,057      unc_p_clockticks
             3.000639852        162,292,689      unc_p_freq_max_os_cycles  #     20.3
      
        % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
        #    time         freq_max_os_cycles %
             1.000127077      0.9
             2.000301436      0.7
             3.000456379      0.0
      
      v2: Change from DivideBy to MetricExpr
      v3: Use expr__ prefix.  Support more than one other event.
      v4: Update description
      v5: Only print warning message once for multiple PMUs.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      37932c18
  21. 14 3月, 2017 1 次提交
    • H
      perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info · f3b3614a
      Hari Bathini 提交于
      Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
      by the kernel when fork, clone, setns or unshare are invoked. And update
      perf-record documentation with the new option to record namespace
      events.
      
      Committer notes:
      
      Combined it with a later patch to allow printing it via 'perf report -D'
      and be able to test the feature introduced in this patch. Had to move
      here also perf_ns__name(), that was introduced in another later patch.
      
      Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
      
        util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
           ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
                                               ^
      Testing it:
      
        # perf record --namespaces -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
        #
        # perf report -D
        <SNIP>
        3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
                      [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                       4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      
        0x1151e0 [0x30]: event: 9
        .
        . ... raw event: size 48 bytes
        .  0000:  09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00  ......0..q.h....
        .  0010:  a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00  .9...9...(.c....
        .  0020:  03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00  ................
        <SNIP>
              NAMESPACES events:          1
        <SNIP>
        #
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3b3614a
  22. 15 2月, 2017 1 次提交
    • A
      perf evsel: Do not put a variable sized type not at the end of a struct · c24ae6d9
      Arnaldo Carvalho de Melo 提交于
      As this is a GNU extension and while harmless in this case, we can do
      the same thing in a more clearer way by using a existing thread_map and
      cpu_map constructors:
      
      With this we avoid this while compiling with clang:
      
        util/evsel.c:1659:17: error: field 'map' with variable sized type 'struct cpu_map' not at the end of a struct or class is a GNU extension
              [-Werror,-Wgnu-variable-sized-type-not-at-end]
                struct cpu_map map;
                               ^
        util/evsel.c:1667:20: error: field 'map' with variable sized type 'struct thread_map' not at the end of a struct or class is a GNU extension
              [-Werror,-Wgnu-variable-sized-type-not-at-end]
                struct thread_map map;
                                  ^
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-207juvrqjiar7uvas2s83v5i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c24ae6d9
  23. 14 2月, 2017 1 次提交
  24. 16 12月, 2016 2 次提交
  25. 23 11月, 2016 1 次提交
  26. 24 10月, 2016 1 次提交
  27. 03 10月, 2016 1 次提交
    • A
      perf tools: Experiment with cppcheck · 18ef15c6
      Arnaldo Carvalho de Melo 提交于
      Experimenting a bit using cppcheck[1], a static checker brought to my
      attention by Colin, reducing the scope of some variables, reducing the
      line of source code lines in the process:
      
        $ cppcheck --enable=style tools/perf/util/thread.c
        Checking tools/perf/util/thread.c...
        [tools/perf/util/thread.c:17]: (style) The scope of the variable 'leader' can be reduced.
        [tools/perf/util/thread.c:133]: (style) The scope of the variable 'err' can be reduced.
        [tools/perf/util/thread.c:273]: (style) The scope of the variable 'err' can be reduced.
      
      Will continue later, but these are already useful, keep them.
      
      1: https://sourceforge.net/p/cppcheck/wiki/Home/
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ixws7lbycihhpmq9cc949ti6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      18ef15c6
  28. 29 9月, 2016 1 次提交