1. 25 9月, 2019 7 次提交
  2. 23 9月, 2019 1 次提交
    • A
      perf record: Move restricted maps check to after a possible fallback to not collect kernel samples · c8b567c8
      Arnaldo Carvalho de Melo 提交于
      Before:
      
        [acme@quaco ~]$ perf record -b -e cycles date
        WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
        check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
      
        Samples in kernel functions may not be resolved if a suitable vmlinux
        file is not found in the buildid cache or in the vmlinux path.
      
        Samples in kernel modules won't be resolved at all.
      
        If some relocation was applied (e.g. kexec) symbols may be misresolved
        even with a suitable vmlinux or kallsyms file.
      
        Mon 23 Sep 2019 11:00:59 AM -03
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.005 MB perf.data (14 samples) ]
        [acme@quaco ~]$
      
      But we did a fallback and exclude_kernel was set, so no need for
      resolving kernel symbols:
      
        $ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        $
      
      After:
      
        [acme@quaco ~]$ perf record -b -e cycles date
        Mon 23 Sep 2019 11:07:18 AM -03
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.007 MB perf.data (16 samples) ]
        [acme@quaco ~]$ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        [acme@quaco ~]$
      
      No needless warning is emitted.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lkml.kernel.org/n/tip-5yqnr8xcqwhr15xktj2097ac@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c8b567c8
  3. 21 9月, 2019 1 次提交
  4. 20 9月, 2019 2 次提交
  5. 01 9月, 2019 1 次提交
  6. 30 8月, 2019 1 次提交
  7. 29 8月, 2019 2 次提交
  8. 26 8月, 2019 1 次提交
  9. 14 8月, 2019 1 次提交
  10. 30 7月, 2019 16 次提交
  11. 09 7月, 2019 1 次提交
  12. 11 6月, 2019 1 次提交
    • Y
      perf record: Add support to collect callchains from kernel or user space only · 53651b28
      yuzhoujian 提交于
      One can just record callchains in the kernel or user space with this new
      options.
      
      We can use it together with "--all-kernel" options.
      
      This two options is used just like print_stack(sys) or print_ustack(usr)
      for systemtap.
      
      Shown below is the usage of this new option combined with "--all-kernel"
      options:
      
      1. Configure all used events to run in kernel space and just collect
         kernel callchains.
      
        $ perf record -a -g --all-kernel --kernel-callchains
      
      2. Configure all used events to run in kernel space and just collect
         user callchains.
      
        $ perf record -a -g --all-kernel --user-callchains
      
      Committer notes:
      
      Improved documentation to state that asking for kernel callchains really
      is asking for excluding user callchains, and vice versa.
      
      Further mentioned that using both won't get both, but nothing, as both
      will be excluded.
      Signed-off-by: Nyuzhoujian <yuzhoujian@didichuxing.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1559222962-22891-1-git-send-email-ufo19890607@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      53651b28
  13. 16 5月, 2019 5 次提交
    • K
      perf parse-regs: Split parse_regs · aeea9062
      Kan Liang 提交于
      The available registers for --int-regs and --user-regs may be different,
      e.g. XMM registers.
      
      Split parse_regs into two dedicated functions for --int-regs and
      --user-regs respectively.
      
      Modify the warning message. "--user-regs=?" should be applied to show
      the available registers for --user-regs.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Tested-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1557865174-56264-1-git-send-email-kan.liang@linux.intel.com
      [ Changed docs as suggested by Ravi and agreed by Kan ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aeea9062
    • A
      perf record: Implement -z,--compression_level[=<n>] option · 504c1ad1
      Alexey Budankov 提交于
      Implemented -z,--compression_level[=<n>] option that enables compression
      of mmaped kernel data buffers content in runtime during perf record mode
      collection. Default option value is 1 (fastest compression).
      
      Compression overhead has been measured for serial and AIO streaming when
      profiling matrix multiplication workload:
      
            -------------------------------------------------------------
            | SERIAL			  | AIO-1                       |
        ----------------------------------------------------------------|
        |-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
        |---------------------------------------------------------------|
        | 0 | 1,00   | 1,000    179,424   | 1,00   | 1,000    187,527   |
        | 1 | 1,04   | 8,427    181,148   | 1,01   | 8,474    188,562   |
        | 2 | 1,07   | 8,055    186,953   | 1,03   | 7,912    191,773   |
        | 3 | 1,04   | 8,283    181,908   | 1,03   | 8,220    191,078   |
        | 5 | 1,09   | 8,101    187,705   | 1,05   | 7,780    190,065   |
        | 8 | 1,05   | 9,217    179,191   | 1,12   | 6,111    193,024   |
        -----------------------------------------------------------------
      
      OVH = (Execution time with -z N) / (Execution time with -z 0)
      
      ratio - compression ratio
      size  - number of bytes that was compressed
      
      	size ~= trace size x ratio
      
      Committer notes:
      
      Testing it I noticed that it failed to disable build id processing when
      compression is enabled, and as we'd have to uncompress everything to
      look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
      to read from DSOs, we better disable build id processing when
      compression is enabled, logging with pr_debug() when doing so:
      
      Original patch:
      
        # perf record -z2
        ^C[ perf record: Woken up 1 times to write data ]
        0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
        [ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
        #
      
      After auto-disabling build id processing when compression is enabled:
      
        $ perf record -z2 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
        $ perf record -v -z2 sleep 1
        Compression enabled, disabling build id collection at the end of the session.
        <SNIP extra -v pr_debug() messages>
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
        $
      
      Also, with parts of the patch originally after this one moved to just
      before this one we get:
      
        $ perf record -z2 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
        $ perf report -D | grep COMPRESS
        0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
        0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
              COMPRESSED events:          2
              COMPRESSED events:          0
        $
      
      I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
      to process, we just show it as not being handled, skip them and
      continue, while before we had:
      
        $ perf report -D | grep COMPRESS
        0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
        Error:
        failed to process sample
        0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
        $
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      504c1ad1
    • A
      perf record: Implement compression for AIO trace streaming · ef781128
      Alexey Budankov 提交于
      Compression is implemented using the functions from zstd.c. As the memory
      to operate on the compression uses mmap->aio.data[] buffers. If Zstd
      streaming compression API fails for some reason the data to be compressed
      are just copied into the memory buffers using plain memcpy().
      
      Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED
      records. Each element of the array is not longer that PERF_SAMPLE_MAX_SIZE
      and consists of perf_event_header followed by the compressed chunk
      that is decompressed on the loading stage.
      
      perf_mmap__aio_push() is replaced by perf_mmap__push() which is now used
      in the both serial and AIO streaming cases. perf_mmap__push() is extended
      with positive return values to signify absence of data ready for
      processing.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/77db2b2c-5d03-dbb0-aeac-c4dd92129ab9@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ef781128
    • A
      perf record: Implement compression for serial trace streaming · 5d7f4116
      Alexey Budankov 提交于
      Compression is implemented using the functions from zstd.c. As the
      memory to operate on the compression uses mmap->data buffer.
      
      If Zstd streaming compression API fails for some reason the data to be
      compressed are just copied into the memory buffers using plain memcpy().
      
      Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED
      records. Each element of the array is not longer that
      PERF_SAMPLE_MAX_SIZE and consists of perf_event_header followed by the
      compressed chunk that is decompressed on the loading stage.
      
      Comitter notes:
      
      Undo some unnecessary line breaks, remove some unnecessary () around
      zstd_data to then just get its address, and fix conflicts with
      BPF_PROG_INFO/BPF_BTF patchkits.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/744df43f-3932-2594-ddef-1e99a3cad03a@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5d7f4116
    • A
      perf mmap: Implement dedicated memory buffer for data compression · 51255a8a
      Alexey Budankov 提交于
      Implemented mmap data buffer that is used as the memory to operate
      on when compressing data in case of serial trace streaming.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/49b31321-0f70-392b-9a4f-649d3affe090@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      51255a8a