1. 06 5月, 2020 1 次提交
  2. 23 4月, 2020 1 次提交
    • S
      perf record: Add num-synthesize-threads option · d99c22ea
      Stephane Eranian 提交于
      To control degree of parallelism of the synthesize_mmap() code which
      is scanning /proc/PID/task/PID/maps and can be time consuming.
      Mimic perf top way of handling the option.
      If not specified will default to 1 thread, i.e. default behavior before
      this option.
      
      On a desktop computer the processing of /proc/PID/task/PID/maps isn't
      slow enough to warrant parallel processing and the thread creation has
      some cost - hence the default of 1. On a loaded server with
      >100 cores it is possible to see synthesis times in the order of
      seconds and in this case having the option is desirable.
      
      As the processing is a synchronization point, it is legitimate to worry if
      Amdahl's law will apply to this patch. Profiling with this patch in
      place:
      https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/
      shows:
      ...
            - 32.59% __perf_event__synthesize_threads
               - 32.54% __event__synthesize_thread
                  + 22.13% perf_event__synthesize_mmap_events
                  + 6.68% perf_event__get_comm_ids.constprop.0
                  + 1.49% process_synthesized_event
                  + 1.29% __GI___readdir64
                  + 0.60% __opendir
      ...
      That is the processing is 1.49% of execution time and there is plenty to
      make parallel. This is shown in the benchmark in this patch:
      
      https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/
      
        Computing performance of multi threaded perf event synthesis by
        synthesizing events on CPU 0:
         Number of synthesis threads: 1
           Average synthesis took: 127729.000 usec (+- 3372.880 usec)
           Average num. events: 21548.600 (+- 0.306)
           Average time per event 5.927 usec
         Number of synthesis threads: 2
           Average synthesis took: 88863.500 usec (+- 385.168 usec)
           Average num. events: 21552.800 (+- 0.327)
           Average time per event 4.123 usec
         Number of synthesis threads: 3
           Average synthesis took: 83257.400 usec (+- 348.617 usec)
           Average num. events: 21553.200 (+- 0.327)
           Average time per event 3.863 usec
         Number of synthesis threads: 4
           Average synthesis took: 75093.000 usec (+- 422.978 usec)
           Average num. events: 21554.200 (+- 0.200)
           Average time per event 3.484 usec
         Number of synthesis threads: 5
           Average synthesis took: 64896.600 usec (+- 353.348 usec)
           Average num. events: 21558.000 (+- 0.000)
           Average time per event 3.010 usec
         Number of synthesis threads: 6
           Average synthesis took: 59210.200 usec (+- 342.890 usec)
           Average num. events: 21560.000 (+- 0.000)
           Average time per event 2.746 usec
         Number of synthesis threads: 7
           Average synthesis took: 54093.900 usec (+- 306.247 usec)
           Average num. events: 21562.000 (+- 0.000)
           Average time per event 2.509 usec
         Number of synthesis threads: 8
           Average synthesis took: 48938.700 usec (+- 341.732 usec)
           Average num. events: 21564.000 (+- 0.000)
           Average time per event 2.269 usec
      
      Where average time per synthesized event goes from 5.927 usec with 1
      thread to 2.269 usec with 8. This isn't a linear speed up as not all of
      synthesize code has been made parallel. If the synthesis time was about
      10 seconds then using 8 threads may bring this down to less than 4.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Jones <tonyj@suse.de>
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d99c22ea
  3. 03 4月, 2020 2 次提交
  4. 06 1月, 2020 1 次提交
  5. 29 11月, 2019 1 次提交
  6. 22 11月, 2019 1 次提交
  7. 18 11月, 2019 1 次提交
  8. 07 11月, 2019 3 次提交
    • J
      perf record: Add support for limit perf output file size · 6d575816
      Jiwei Sun 提交于
      The patch adds a new option to limit the output file size, then based on
      it, we can create a wrapper of the perf command that uses the option to
      avoid exhausting the disk space by the unconscious user.
      
      In order to make the perf.data parsable, we just limit the sample data
      size, since the perf.data consists of many headers and sample data and
      other data, the actual size of the recorded file will bigger than the
      setting value.
      
      Testing it:
      
        # ./perf record -a -g --max-size=10M
        Couldn't synthesize bpf events.
        [ perf record: perf size limit reached (10249 KB), stopping session ]
        [ perf record: Woken up 32 times to write data ]
        [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ]
      
        # ls -lh perf.data
        -rw------- 1 root root 11M Oct 22 14:32 perf.data
      
        # ./perf record -a -g --max-size=10K
        [ perf record: perf size limit reached (10 KB), stopping session ]
        Couldn't synthesize bpf events.
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ]
      
        # ls -l perf.data
        -rw------- 1 root root 1626952 Oct 22 14:36 perf.data
      
      Committer notes:
      
      Fixed the build in multiple distros by using PRIu64 to print u64 struct
      members, fixing this:
      
        builtin-record.c: In function 'record__write':
        builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=]
             rec->bytes_written >> 10);
             ^
          CC       /tmp/build/pe
      Signed-off-by: NJiwei Sun <jiwei.sun@windriver.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Danter <richard.danter@windriver.com>
      Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6d575816
    • A
      perf record: Put a copy of kcore into the perf.data directory · eeb399b5
      Adrian Hunter 提交于
      Add a new 'perf record' option '--kcore' which will put a copy of
      /proc/kcore, kallsyms and modules into a perf.data directory. Note, that
      without the --kcore option, output goes to a file as previously.  The
      tools' -o and -i options work with either a file name or directory name.
      
      Example:
      
        $ sudo perf record --kcore uname
      
        $ sudo tree perf.data
        perf.data
        ├── kcore_dir
        │   ├── kallsyms
        │   ├── kcore
        │   └── modules
        └── data
      
        $ sudo perf script -v
        build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
        build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
        Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
        Using CPUID GenuineIntel-6-8E-A
        Using perf.data/kcore_dir/kcore for kernel data
        Using perf.data/kcore_dir/kallsyms for symbols
                   perf 19058 506778.423729:          1 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
                   perf 19058 506778.423733:          1 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
                   perf 19058 506778.423734:          7 cycles:  ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
                   perf 19058 506778.423736:        117 cycles:  ffffffffa2caa54a native_write_msr+0xa (vmlinux)
                   perf 19058 506778.423738:       2092 cycles:  ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
                   perf 19058 506778.423740:      37380 cycles:  ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
                  uname 19058 506778.423751:     582673 cycles:  ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
                  uname 19058 506778.423892:    2241841 cycles:  ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
                  uname 19058 506778.424430:    2457397 cycles:  ffffffffa3019232 check_memory_region+0x52 (vmlinux)
      
      Committer testing:
      
        # rm -rf perf.data*
        # perf record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
        # ls -l perf.data
        -rw-------. 1 root root 34772 Oct 21 11:08 perf.data
        # perf record --kcore uname
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
        ls[root@quaco ~]# ls -lad perf.data*
        drwx------. 3 root root  4096 Oct 21 11:08 perf.data
        -rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        # perf evlist -v -i perf.data/data
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        #
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eeb399b5
    • A
      perf data: Support single perf.data file directory · 46e201ef
      Adrian Hunter 提交于
      Support directory output that contains a regular perf.data file, named
      "data". By default the directory is named perf.data i.e.
      	perf.data
      	└── data
      
      Most of the infrastructure to support a directory is already there. This
      patch makes the changes needed to support the format above.
      
      Presently there is no 'perf record' option to output a directory.
      
      This is preparation for adding support for putting a copy of /proc/kcore in
      the directory.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191004083121.12182-5-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      46e201ef
  9. 10 10月, 2019 3 次提交
  10. 25 9月, 2019 10 次提交
  11. 23 9月, 2019 1 次提交
    • A
      perf record: Move restricted maps check to after a possible fallback to not collect kernel samples · c8b567c8
      Arnaldo Carvalho de Melo 提交于
      Before:
      
        [acme@quaco ~]$ perf record -b -e cycles date
        WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
        check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
      
        Samples in kernel functions may not be resolved if a suitable vmlinux
        file is not found in the buildid cache or in the vmlinux path.
      
        Samples in kernel modules won't be resolved at all.
      
        If some relocation was applied (e.g. kexec) symbols may be misresolved
        even with a suitable vmlinux or kallsyms file.
      
        Mon 23 Sep 2019 11:00:59 AM -03
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.005 MB perf.data (14 samples) ]
        [acme@quaco ~]$
      
      But we did a fallback and exclude_kernel was set, so no need for
      resolving kernel symbols:
      
        $ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        $
      
      After:
      
        [acme@quaco ~]$ perf record -b -e cycles date
        Mon 23 Sep 2019 11:07:18 AM -03
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.007 MB perf.data (16 samples) ]
        [acme@quaco ~]$ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        [acme@quaco ~]$
      
      No needless warning is emitted.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lkml.kernel.org/n/tip-5yqnr8xcqwhr15xktj2097ac@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c8b567c8
  12. 21 9月, 2019 1 次提交
  13. 20 9月, 2019 2 次提交
  14. 01 9月, 2019 1 次提交
  15. 30 8月, 2019 1 次提交
  16. 29 8月, 2019 2 次提交
  17. 26 8月, 2019 1 次提交
  18. 14 8月, 2019 1 次提交
  19. 30 7月, 2019 6 次提交