1. 11 3月, 2019 12 次提交
    • J
      perf header: Add DIR_FORMAT feature to describe directory data · 258031c0
      Jiri Olsa 提交于
      The data files layout is described by HEADER_DIR_FORMAT feature.
      Currently it holds only version number (1):
      
           uint64_t version;
      
      The current version holds only version value (1) means that data files:
      
        - Follow the 'data.*' name format.
      
        - Contain raw events data in standard perf format as read from kernel
          (and need to be sorted)
      
      Future versions are expected to describe different data files layout
      according to special needs.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190308134745.5057-6-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      258031c0
    • J
      perf data: Make perf_data__size() work over directory · 29583c17
      Jiri Olsa 提交于
      Make perf_data__size() return proper size for directory data, summing up
      all the individual file sizes.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190308134745.5057-5-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29583c17
    • J
      perf data: Add perf_data__update_dir() function · e8be1357
      Jiri Olsa 提交于
      Add perf_data__update_dir() to update the size for every file within the
      perf.data directory.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190308134745.5057-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e8be1357
    • J
      perf data: Don't store auxtrace index for directory data file · cd3dd8dd
      Jiri Olsa 提交于
      We can't store the auxtrace index when we store into multiple files,
      because we keep only offset for it, not the file.
      
      The auxtrace data will be processed correctly in the 'pipe' mode.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190308134745.5057-3-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cd3dd8dd
    • J
      perf data: Support having perf.data stored as a directory · ec65def1
      Jiri Olsa 提交于
      The caller needs to set 'struct perf_data::is_dir flag and the path will
      be treated as a directory.
      
      The 'struct perf_data::file' is initialized and open as 'path/header'
      file.
      
      Add a check to the direcory interface functions to check the is_dir flag.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190308134745.5057-2-jolsa@kernel.org
      [ Be consistent on how to signal failure, i.e. use -1 and let users check errno ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ec65def1
    • M
      perf vendor events amd: perf PMU events for AMD Family 17h · 98c07a8f
      Martin Liška 提交于
      Thi patch adds PMC events for AMD Family 17 CPUs as defined in [1].  It
      covers events described in section: 2.1.13. Regex pattern in mapfile.csv
      covers all CPUs of the family.
      
      [1] https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdfSigned-off-by: NMartin Liška <mliska@suse.cz>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jon Grimm <jon.grimm@amd.com>
      Cc: Martin Jambor <mjambor@suse.cz>
      Cc: William Cohen <wcohen@redhat.com>
      Link: https://lkml.kernel.org/r/d65873ca-e402-b198-4fe9-8c4af81258c8@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      98c07a8f
    • A
      perf probe: Fix getting the kernel map · eaeffeb9
      Adrian Hunter 提交于
      Since commit 4d99e413 ("perf machine: Workaround missing maps for
      x86 PTI entry trampolines"), perf tools has been creating more than one
      kernel map, however 'perf probe' assumed there could be only one.
      
      Fix by using machine__kernel_map() to get the main kernel map.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Fixes: 4d99e413 ("perf machine: Workaround missing maps for x86 PTI entry trampolines")
      Fixes: d83212d5 ("kallsyms, x86: Export addresses of PTI entry trampolines")
      Link: http://lkml.kernel.org/r/2ed432de-e904-85d2-5c36-5897ddc5b23b@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      eaeffeb9
    • A
      perf report: Parse time quantum · 2a1292cb
      Andi Kleen 提交于
      Many workloads change over time. 'perf report' currently aggregates the
      whole time range reported in perf.data.
      
      This patch adds an option for a time quantum to quantisize the perf.data
      over time.
      
      This just adds the option, will be used in follow on patches for a time
      sort key.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20190305144758.12397-6-andi@firstfloor.org
      [ Use NSEC_PER_[MU]SEC ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2a1292cb
    • A
      perf time-utils: Add utility function to print time stamps in nanoseconds · f8c856cb
      Andi Kleen 提交于
      Add a utility function to print nanosecond timestamps.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20190305144758.12397-11-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f8c856cb
    • A
      perf report: Support output in nanoseconds · 52bab886
      Andi Kleen 提交于
      Upcoming changes add timestamp output in perf report. Add a --ns
      argument similar to perf script to support nanoseconds resolution when
      needed.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20190305144758.12397-5-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      52bab886
    • A
      perf script: Support insn output for normal samples · 3ab481a1
      Andi Kleen 提交于
      perf script -F +insn was only working for PT traces because the PT
      instruction decoder was filling in the insn/insn_len sample attributes.
      Support it for non PT samples too on x86 using the existing x86
      instruction decoder.
      
      This adds some extra checking to ensure that we don't try to decode
      instructions when using perf.data from a different architecture.
      
        % perf record -a sleep 1
        % perf script -F ip,sym,insn --xed
         ffffffff811704c9 remote_function               movl  %eax, 0x18(%rbx)
         ffffffff8100bb50 intel_bts_enable_local                retq
         ffffffff81048612 native_apic_mem_write                 movl  %esi, -0xa04000(%rdi)
         ffffffff81048612 native_apic_mem_write                 movl  %esi, -0xa04000(%rdi)
         ffffffff81048612 native_apic_mem_write                 movl  %esi, -0xa04000(%rdi)
         ffffffff810f1f79 generic_exec_single           xor %eax, %eax
         ffffffff811704c9 remote_function               movl  %eax, 0x18(%rbx)
         ffffffff8100bb34 intel_bts_enable_local                movl  0x2000(%rax), %edx
         ffffffff81048610 native_apic_mem_write                 mov %edi, %edi
        ...
      
      Committer testing:
      
      Before:
      
        # perf script -F ip,sym,insn --xed | head -5
         ffffffffa4068804 native_write_msr 		addb  %al, (%rax)
         ffffffffa4068804 native_write_msr 		addb  %al, (%rax)
         ffffffffa4068804 native_write_msr 		addb  %al, (%rax)
         ffffffffa4068806 native_write_msr 		addb  %al, (%rax)
         ffffffffa4068806 native_write_msr 		addb  %al, (%rax)
        # perf script -F ip,sym,insn --xed | grep -v "addb  %al, (%rax)"
        #
      
      After:
      
        # perf script -F ip,sym,insn --xed | head -5
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068806 native_write_msr 		nopl  %eax, (%rax,%rax,1)
         ffffffffa4068806 native_write_msr 		nopl  %eax, (%rax,%rax,1)
        # perf script -F ip,sym,insn --xed | grep -v "addb  %al, (%rax)" | head -5
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068804 native_write_msr 		wrmsr
         ffffffffa4068806 native_write_msr 		nopl  %eax, (%rax,%rax,1)
         ffffffffa4068806 native_write_msr 		nopl  %eax, (%rax,%rax,1)
        #
      
      More examples:
      
        # perf script -F ip,sym,insn --xed | grep -v native_write_msr | head
         ffffffffa416b90e tick_check_broadcast_expired 		btq  %rax, 0x1a5f42a(%rip)
         ffffffffa4956bd0 nmi_cpu_backtrace 		pushq  %r13
         ffffffffa415b95e __hrtimer_next_event_base 		movq  0x18(%rax), %rdx
         ffffffffa4956bf3 nmi_cpu_backtrace 		popq  %r12
         ffffffffa4171d5c smp_call_function_single 		pause
         ffffffffa4956bdd nmi_cpu_backtrace 		mov %ebp, %r12d
         ffffffffa4797e4d menu_select 		cmp $0x190, %rax
         ffffffffa4171d5c smp_call_function_single 		pause
         ffffffffa405a7d8 nmi_cpu_backtrace_handler 		callq  0xffffffffa4956bd0
         ffffffffa4797f7a menu_select 		shr $0x3, %rax
        #
      
      Which matches the annotate output modulo resolving callqs:
      
        # perf annotate --stdio2 nmi_cpu_backtrace_handler
        Samples: 4  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 35908, [percent: local period]
        nmi_cpu_backtrace_handler() /lib/modules/5.0.0+/build/vmlinux
        Percent
                    Disassembly of section .text:
      
                    ffffffff8105a7d0 <nmi_cpu_backtrace_handler>:
                    nmi_cpu_backtrace_handler():
                            nmi_trigger_cpumask_backtrace(mask, exclude_self,
                                                          nmi_raise_cpu_backtrace);
                    }
      
                    static int nmi_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
                    {
         24.45      → callq  __fentry__
                            if (nmi_cpu_backtrace(regs))
                      mov    %rsi,%rdi
         75.55      → callq  nmi_cpu_backtrace
                                    return NMI_HANDLED;
                      movzbl %al,%eax
      
                            return NMI_DONE;
                    }
                    ← retq
          #
      
        # perf annotate --stdio2 __hrtimer_next_event_base
        Samples: 4  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 767977, [percent: local period]
        __hrtimer_next_event_base() /lib/modules/5.0.0+/build/vmlinux
        Percent
                    Disassembly of section .text:
      
                    ffffffff8115b910 <__hrtimer_next_event_base>:
                    __hrtimer_next_event_base():
      
                    static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base,
                                                             const struct hrtimer *exclude,
                                                             unsigned int active,
                                                             ktime_t expires_next)
                    {
                    → callq  __fentry__
      <SNIP>
                4a:   add    $0x1,%r14
         77.31        mov    0x18(%rax),%rdx
                      shl    $0x6,%r14
                      sub    0x38(%rbx,%r14,1),%rdx
                                    if (expires < expires_next) {
                      cmp    %r12,%rdx
                    ↓ jge    68
      <SNIP>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20190305144758.12397-3-andi@firstfloor.org
      [ Converted fetch_exe() to use the name it ended up having when merged: thread__memcpy() ]
      [ archinsn.c needs the instruction decoder that is only build when CONFIG_AUXTRACE=y, fix that ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ab481a1
    • S
      perf/core: Restore mmap record type correctly · d9c1bb2f
      Stephane Eranian 提交于
      On mmap(), perf_events generates a RECORD_MMAP record and then checks
      which events are interested in this record. There are currently 2
      versions of mmap records: RECORD_MMAP and RECORD_MMAP2. MMAP2 is larger.
      The event configuration controls which version the user level tool
      accepts.
      
      If the event->attr.mmap2=1 field then MMAP2 record is returned.  The
      perf_event_mmap_output() takes care of this. It checks attr->mmap2 and
      corrects the record fields before putting it in the sampling buffer of
      the event.  At the end the function restores the modified MMAP record
      fields.
      
      The problem is that the function restores the size but not the type.
      Thus, if a subsequent event only accepts MMAP type, then it would
      instead receive an MMAP2 record with a size of MMAP record.
      
      This patch fixes the problem by restoring the record type on exit.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Fixes: 13d7a241 ("perf: Add attr->mmap2 attribute to an event")
      Link: http://lkml.kernel.org/r/20190307185233.225521-1-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d9c1bb2f
  2. 10 3月, 2019 1 次提交
    • I
      Merge tag 'perf-core-for-mingo-5.1-20190307' of... · b339da48
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-5.1-20190307' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core changes from Arnaldo Carvalho de Melo:
      
      perf bpf:
      
        Arnaldo Carvalho de Melo:
      
        - Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
          tools such as 'bpftool map dump' can pretty print map keys and values.
      
      perf c2c:
      
        Jiri Olsa:
      
        - Fix report for empty NUMA node.
      
      perf diff:
      
        Jin Yao:
      
        - Support --time, --cpu, --pid and --tid filter options.
      
      perf probe:
      
        Arnaldo Carvalho de Melo:
      
        - Clarify error message about not finding kernel modules debuginfo.
      
      perf record:
      
        Jiri Olsa:
      
        - Fixup probing for max attr.precise_ip.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.
      
      perf annotate:
      
        Arnaldo Carvalho de Melo:
      
        - Calculate the max instruction name, align column to that, removing the
          hardcoded max 6 chars and cope with instructions with names longer than that,
          such as vpmovmskb, vpcmpeqb, etc.
      
      kernel:
      
        Song Liu:
      
        - Consider events with attr.bpf_event set as side-band.
      
        Gustavo A. R. Silva:
      
        - Mark expected switch fall-through in perf_event_parse_addr_filter().
      
      Libraries:
      
        Jiri Olsa:
      
        - Fix leaks and double frees on error paths.
      
      libtraceevent:
      
        Tony Jones:
      
        - Fix buffer overflow in arg_eval().
      
      python scripting:
      
        Tony Jones:
      
        - More python3 fixes.
      
      Trivial:
      
        Yang Wei:
      
        - Remove needless extra semicolon in clang C++ glue code.
      
      Intel PT/BTS:
      
        Adrian Hunter:
      
        - Improve auxtrace address filter error message when there is no DSO.
      
        - Fix divide by zero when TSC is not available.
      
        - Further improvements to the export to sqlite/posgresql python scripts
          and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
          the creation of call trees.
      
        Andi Kleen:
      
        - Generalize function to copy from thread addr space from intel-bts code.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b339da48
  3. 09 3月, 2019 3 次提交
    • G
      perf/core: Mark expected switch fall-through · 43aa378b
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warning:
      
        kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
        kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
            kernel = 1;
            ~~~~~~~^~~
        kernel/events/core.c:9156:3: note: here
           case IF_SRC_FILEADDR:
           ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190212205430.GA8446@embeddedorSigned-off-by: NIngo Molnar <mingo@kernel.org>
      43aa378b
    • K
      perf/x86/intel/uncore: Fix client IMC events return huge result · 8041ffd3
      Kan Liang 提交于
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8041ffd3
    • A
      perf/ring_buffer: Use high order allocations for AUX buffers optimistically · 5768402f
      Alexander Shishkin 提交于
      Currently, the AUX buffer allocator will use high-order allocations
      for PMUs that don't support hardware scatter-gather chaining to ensure
      large contiguous blocks of pages, and always use an array of single
      pages otherwise.
      
      There is, however, a tangible performance benefit in using larger chunks
      of contiguous memory even in the latter case, that comes from not having
      to fetch the next page's address at every page boundary. In particular,
      a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
      penalty with a single multi-page output region in snapshot mode (no PMI)
      than with multiple single-page output regions, from ~6% down to ~4%. For
      the snapshot mode it does make a difference as it is intended to run over
      long periods of time.
      
      For this reason, change the allocation policy to always optimistically
      start with the highest possible order when allocating pages for the AUX
      buffer, desceding until the allocation succeeds or order zero allocation
      fails.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5768402f
  4. 07 3月, 2019 19 次提交
  5. 06 3月, 2019 5 次提交
    • L
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 203b6609
      Linus Torvalds 提交于
      Pull perf updates from Ingo Molnar:
       "Lots of tooling updates - too many to list, here's a few highlights:
      
         - Various subcommand updates to 'perf trace', 'perf report', 'perf
           record', 'perf annotate', 'perf script', 'perf test', etc.
      
         - CPU and NUMA topology and affinity handling improvements,
      
         - HW tracing and HW support updates:
            - Intel PT updates
            - ARM CoreSight updates
            - vendor HW event updates
      
         - BPF updates
      
         - Tons of infrastructure updates, both on the build system and the
           library support side
      
         - Documentation updates.
      
         - ... and lots of other changes, see the changelog for details.
      
        Kernel side updates:
      
         - Tighten up kprobes blacklist handling, reduce the number of places
           where developers can install a kprobe and hang/crash the system.
      
         - Fix/enhance vma address filter handling.
      
         - Various PMU driver updates, small fixes and additions.
      
         - refcount_t conversions
      
         - BPF updates
      
         - error code propagation enhancements
      
         - misc other changes"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
        perf script python: Add Python3 support to syscall-counts-by-pid.py
        perf script python: Add Python3 support to syscall-counts.py
        perf script python: Add Python3 support to stat-cpi.py
        perf script python: Add Python3 support to stackcollapse.py
        perf script python: Add Python3 support to sctop.py
        perf script python: Add Python3 support to powerpc-hcalls.py
        perf script python: Add Python3 support to net_dropmonitor.py
        perf script python: Add Python3 support to mem-phys-addr.py
        perf script python: Add Python3 support to failed-syscalls-by-pid.py
        perf script python: Add Python3 support to netdev-times.py
        perf tools: Add perf_exe() helper to find perf binary
        perf script: Handle missing fields with -F +..
        perf data: Add perf_data__open_dir_data function
        perf data: Add perf_data__(create_dir|close_dir) functions
        perf data: Fail check_backup in case of error
        perf data: Make check_backup work over directories
        perf tools: Add rm_rf_perf_data function
        perf tools: Add pattern name checking to rm_rf
        perf tools: Add depth checking to rm_rf
        perf data: Add global path holder
        ...
      203b6609
    • L
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3478588b
      Linus Torvalds 提交于
      Pull locking updates from Ingo Molnar:
       "The biggest part of this tree is the new auto-generated atomics API
        wrappers by Mark Rutland.
      
        The primary motivation was to allow instrumentation without uglifying
        the primary source code.
      
        The linecount increase comes from adding the auto-generated files to
        the Git space as well:
      
          include/asm-generic/atomic-instrumented.h     | 1689 ++++++++++++++++--
          include/asm-generic/atomic-long.h             | 1174 ++++++++++---
          include/linux/atomic-fallback.h               | 2295 +++++++++++++++++++++++++
          include/linux/atomic.h                        | 1241 +------------
      
        I preferred this approach, so that the full call stack of the (already
        complex) locking APIs is still fully visible in 'git grep'.
      
        But if this is excessive we could certainly hide them.
      
        There's a separate build-time mechanism to determine whether the
        headers are out of date (they should never be stale if we do our job
        right).
      
        Anyway, nothing from this should be visible to regular kernel
        developers.
      
        Other changes:
      
         - Add support for dynamic keys, which removes a source of false
           positives in the workqueue code, among other things (Bart Van
           Assche)
      
         - Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)
      
         - qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)
      
         - misc other updates and enhancements"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
        locking/lockdep: Shrink struct lock_class_key
        locking/lockdep: Add module_param to enable consistency checks
        lockdep/lib/tests: Test dynamic key registration
        lockdep/lib/tests: Fix run_tests.sh
        kernel/workqueue: Use dynamic lockdep keys for workqueues
        locking/lockdep: Add support for dynamic keys
        locking/lockdep: Verify whether lock objects are small enough to be used as class keys
        locking/lockdep: Check data structure consistency
        locking/lockdep: Reuse lock chains that have been freed
        locking/lockdep: Fix a comment in add_chain_cache()
        locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
        locking/lockdep: Reuse list entries that are no longer in use
        locking/lockdep: Free lock classes that are no longer in use
        locking/lockdep: Update two outdated comments
        locking/lockdep: Make it easy to detect whether or not inside a selftest
        locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
        locking/lockdep: Initialize the locks_before and locks_after lists earlier
        locking/lockdep: Make zap_class() remove all matching lock order entries
        locking/lockdep: Reorder struct lock_class members
        locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
        ...
      3478588b
    • L
      Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c8f5ed6e
      Linus Torvalds 提交于
      Pull EFI updates from Ingo Molnar:
       "The main EFI changes in this cycle were:
      
         - Use 32-bit alignment for efi_guid_t
      
         - Allow the SetVirtualAddressMap() call to be omitted
      
         - Implement earlycon=efifb based on existing earlyprintk code
      
         - Various minor fixes and code cleanups from Sai, Ard and me"
      
      * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi: Fix build error due to enum collision between efi.h and ima.h
        efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation
        x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol
        efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
        efi: Replace GPL license boilerplate with SPDX headers
        efi/fdt: Apply more cleanups
        efi: Use 32-bit alignment for efi_guid_t
        efi/memattr: Don't bail on zero VA if it equals the region's PA
        x86/efi: Mark can_free_region() as an __init function
      c8f5ed6e
    • Y
      perf clang: Remove needless extra semicolon · a53837a5
      Yang Wei 提交于
      Delete a superfluous semicolon in getBPFObjectFromModule().
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Wei <albin_yang@163.com>
      Link: http://lkml.kernel.org/r/1551710174-3349-1-git-send-email-albin_yang@163.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a53837a5
    • A
      perf bpf: Automatically add BTF ELF markers · 3163613c
      Arnaldo Carvalho de Melo 提交于
      The libbpf loader expects that some __btf_map_<MAP_NAME> structs be in
      place with the keys and values types of maps so that one can store the
      struct definitions and have them sent to the kernel via sys_bpf(fd, cmd
      = BTF_LOAD) and then later be retrievable via sys_bpf(fd, cmd =
      BPF_OBJ_GET_INFO_BY_FD) for use by tools such as 'bpftool map dump id
      MAP_ID'.
      
      Since we already have this for defining maps in 'perf trace' BPF events:
      
         bpf_map(name, _type, type_key, type_val, _max_entries)
      
      As used in the tools/perf/examples/bpf/augmented_raw_syscalls.c:
      
       --- 8< ---
      
      struct syscall {
              bool    enabled;
      };
      
      bpf_map(syscalls, ARRAY, int, struct syscall, 512);
      
       --- 8< ---
      
      All we need is to get all that already available info, piggyback on the
      'bpf_map' define in tools/perf/include/bpf/bpf.h, that is included by
      'perf trace' BPF programs and do that without requiring changes to the
      BPF programs already defining maps using 'bpf_map()'.
      
      So this is what we have before this patch:
      
      1) With this in ~/.perfconfig to dump .c events as .o, aka save a copy
         so that we can use the .o later as a pre-compiled BPF bytecode:
      
        # grep '\[llvm\]' -A2 ~/.perfconfig
        [llvm]
      	dump-obj = true
      	clang-opt = -g
      
        #
        # clang --version
        clang version 9.0.0 (https://git.llvm.org/git/clang.git/ 7906282d3afec5dfdc2b27943fd6c0309086c507) (https://git.llvm.org/git/llvm.git/ a1b5de1ff8ae8bc79dc8e86e1f82565229bd0500)
        Target: x86_64-unknown-linux-gnu
        Thread model: posix
        InstalledDir: /opt/llvm/bin
      
      2) Note the -g there so that we get clang to generate debuginfo, and
         since the target is 'bpf' it will generate the BTF info in this
         clang version (9.0).
      
      3) Run a simple 'perf record' specifiying as an event the augmented_raw_syscalls.c
         source code:
      
        # perf record -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
        LLVM: dumping /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data ]
      
        # file /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped
      
      4) Look at the BTF structs encoded in it:
      
        # pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        syscall_enter_args	64	0
        augmented_filename	264	0
        syscall	1	0
        syscall_exit_args	24	0
        bpf_map	28	0
        #
        # pahole -F btf -C syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        # pahole -F btf -C syscall /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        struct syscall {
      	  bool                       enabled;              /*     0     1 */
      
      	  /* size: 1, cachelines: 1, members: 1 */
      	  /* last cacheline: 1 bytes */
        };
        #
      
      5) Ok, with just this we don't have the markers expected by the libbpf
         loader and when we run with this BPF bytecode, because we have:
      
        # grep '\[trace\]' -A1 ~/.perfconfig
        [trace]
      	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        #
      
      6) Lets do a 'perf trace' system wide session using this BPF program:
      
         # perf trace -e *mmsg,open*
        Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR) = 106
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        DNS Res~ver #3/23340 openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 106
        DNS Res~ver #3/23340 sendmmsg(106<socket:[3482690]>, 0x7f252f1fcaf0, 2, MSG_NOSIGNAL) = 2
        Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR) = 106
        lighttpd/18915 openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 12
      
      7) While it runs lets see the maps that 'perf trace' + libbpf's BPF
        loader loaded into the kernel via sys_bpf(fd, BPF_BTF_LOAD, ...):
      
        # bpftool map list | tail -6
        149: perf_event_array  name __augmented_sys  flags 0x0
      	  key 4B  value 4B  max_entries 8  memlock 4096B
        150: array  name syscalls  flags 0x0
      	  key 4B  value 1B  max_entries 512  memlock 8192B
        151: hash  name pids_filtered  flags 0x0
      	  key 4B  value 1B  max_entries 64  memlock 8192B
        #
      
      8) Dump the "pids_filtered", map, that will have one entry per PID that
         'perf trace' wants filtered, which includes its own, to avoid a
         tracing feedback loop (perf trace shows the syscalls it does which
         generates more syscalls that it has to show that...), it also
         auto-filters the 'gnome-terminal' and 'sshd' parent PIDs, for the
         same reason:
      
        # bpftool map dump id 151
        key: a5 0c 00 00  value: 01
        key: 14 63 00 00  value: 01
        Found 2 elements
        #
      
      9) Since there is no BTF info available, it does a generic hex dump :-\
      
      10) Now, with this patch applied, we'll do steps 3 to 6 again and look
          with pahole if there are extra structs encoded in BTF:
      
        # pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        syscall_enter_args	64	0
        augmented_filename	264	0
        syscall	1	0
        syscall_exit_args	24	0
        bpf_map	28	0
        ____btf_map___augmented_syscalls__	8	0
        ____btf_map_syscalls	8	0
        ____btf_map_pids_filtered	8	0
        #
      
      11) Yes, those __btf_map_ + the map names, lets see how they look like:
      
        # pahole -F btf -C ____btf_map_syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        struct ____btf_map_syscalls {
      	  int                        key;                  /*     0     4 */
      	  struct syscall             value;                /*     4     1 */
      
      	  /* size: 8, cachelines: 1, members: 2 */
      	  /* padding: 3 */
      	  /* last cacheline: 8 bytes */
        };
        #
      
      12) Lets repeat step 7 to get the new map ids:
      
        # bpftool map list | tail -6
        155: perf_event_array  name __augmented_sys  flags 0x0
      	  key 4B  value 4B  max_entries 8  memlock 4096B
        156: array  name syscalls  flags 0x0
      	  key 4B  value 1B  max_entries 512  memlock 8192B
        157: hash  name pids_filtered  flags 0x0
      	  key 4B  value 1B  max_entries 64  memlock 8192B
        #
      
      13) And finally lets dump the 'pids_filtered':
      
        # bpftool map dump id 157
        [{
              "key": 3237,
              "value": true
          },{
              "key": 26435,
              "value": true
          }
        ]
        #
      
      Looks much better! BTF info was used to interpret the key as an integer
      and the value as a struct with just one boolean member, so to make it
      more compact, show just the 'true' value where we saw '01'.
      
      Now to make 'perf trace --dump-map' to use BTF!
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-ybuf9wpkm30xk28iq7jbwb40@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3163613c