1. 09 11月, 2017 2 次提交
    • J
      perf tools: Fix eBPF event specification parsing · a271bfaf
      Jiri Olsa 提交于
      Looks like I've reached the new level of stupidity, adding missing braces.
      
      Committer testing:
      
      Given the following eBPF C filter, that will add a record when it
      returns true, i.e. when the tv_nsec variable is > 2000ns, should be
      built and installed via sys_bpf(), but fails to do so before this patch:
      
        # cat filter.c
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
      
        SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
        int func(void *ctx, int err, long nsec)
        {
      	  return nsec > 1000;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        #
      
        # perf trace -e nanosleep,filter.c usleep 1
        invalid or unsupported event: 'filter.c'
        Run 'perf list' for a list of valid events
      
         Usage: perf trace [<options>] [<command>]
            or: perf trace [<options>] -- <command> [<options>]
            or: perf trace record [<options>] [<command>]
            or: perf trace record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event/syscall selector. use 'perf list' to list available events
        #
      
      And works again after it is applied, the nothing is inserted when the co
      
        # perf trace -e *sleep,filter.c usleep 1
           0.000 ( 0.066 ms): usleep/23994 nanosleep(rqtp: 0x7ffead94a0d0) = 0
        # perf trace -e *sleep,filter.c usleep 2
           0.000 ( 0.008 ms): usleep/24378 nanosleep(rqtp: 0x7fffa021ba50) ...
           0.008 (         ): perf_bpf_probe:func:(ffffffffb410cb30) tv_nsec=2000)
           0.000 ( 0.066 ms): usleep/24378  ... [continued]: nanosleep()) = 0
        #
      
      The intent of 9445464b is kept:
      
        # perf stat -e 'cpu/uops_executed.core,krava/'  true
        event syntax error: '..cuted.core,krava/'
                                          \___ unknown term
      
        valid terms: cmask,pc,event,edge,in_tx,any,ldlat,inv,umask,in_tx_cp,offcore_rsp,config,config1,config2,name,period
        Run 'perf list' for a list of valid events
      
         Usage: perf stat [<options>] [<command>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        #
        # perf stat -e 'cpu/uops_executed.core,period=1/'  true
      
         Performance counter stats for 'true':
      
                 808,332      cpu/uops_executed.core,period=1/
      
             0.002997237 seconds time elapsed
      
        #
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-diea0ihbwpxfw6938huv3whj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a271bfaf
    • J
      perf tools: Add "reject" option for parse-events.l · b6af53b7
      Jiri Olsa 提交于
      Arnaldo reported broken builds in some distros using a newer flex
      release, 2.6.4, found in Alpine Linux 3.6 and Edge, with flex not
      spotting the REJECT macro:
      
        CC       /tmp/build/perf/util/parse-events-flex.o
        util/parse-events.l: In function 'parse_events_lex':
        /tmp/build/perf/util/parse-events-flex.c:4734:16: error: \
        'reject_used_but_not_detected' undeclared (first use in this function)
      
      It's happening because we put the REJECT under another USER_REJECT macro
      in following commit:
      
        9445464b perf tools: Unwind properly location after REJECT
      
      Fortunately flex provides option for force it to use REJECT, adding it
      to parse-events.l.
      Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 9445464b ("perf tools: Unwind properly location after REJECT")
      Link: http://lkml.kernel.org/n/tip-7kdont984mw12ijk7rji6b8p@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b6af53b7
  2. 27 10月, 2017 1 次提交
  3. 14 10月, 2017 1 次提交
    • J
      perf tools: Check wether the eBPF file exists in event parsing · 29479bfe
      Jiri Olsa 提交于
      Adding the check wether the eBPF file exists, to consider it
      as eBPF input file. This way we can differentiate eBPF events
      from events that end up with same suffix as eBPF file.
      
      Before:
      
        $ perf stat -e 'cpu/uops_executed.core/'  true
        bpf: builtin compilation failed: -95, try external compiler
        WARNING:        unable to get correct kernel building directory.
        Hint:   Set correct kbuild directory using 'kbuild-dir' option in [llvm]
                section of ~/.perfconfig or set it to "" to suppress kbuild
                detection.
      
        event syntax error: 'cpu/uops_executed.core/'
                             \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost
      
      After:
      
        $ perf stat -e 'cpu/uops_executed.core/'  true
      
         Performance counter stats for 'true':
      
                   181,533      cpu/uops_executed.core/:u
      
               0.002795447 seconds time elapsed
      
      If user makes type in the eBPF file, we prioritize the event syntax
      and show following warning:
      
        $ perf stat -e 'krava.c//'  true
        event syntax error: 'krava.c//'
                             \___ Cannot find PMU `krava.c'. Missing kernel support?
      Reported-and-Tested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29479bfe
  4. 13 9月, 2017 2 次提交
    • A
      perf stat: Support duration_time for metrics · fd48aad9
      Andi Kleen 提交于
      Some of the metrics formulas (like GFLOPs) need to know how long the
      measurement period is. Support an internal event called duration_time,
      which reports time in second. It maps to the dummy event, but is special
      cased for statistics to report the walltime duration.
      
      So far it is not printed, but only used internally for metrics.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fd48aad9
    • A
      perf tools: Support weak groups in 'perf stat' · 5a5dfe4b
      Andi Kleen 提交于
      Setting up groups can be complicated due to the complicated scheduling
      restrictions of different PMUs.
      
      User tools usually don't understand all these restrictions.
      
      Still in many cases it is useful to set up groups and they work most of
      the time. However if the group is set up wrong some members will not
      report any value because they never get scheduled.
      
      Add a concept of a 'weak group': try to set up a group, but if it's not
      schedulable fallback to not using a group. That gives us the best of
      both worlds: groups if they work, but still a usable fallback if they
      don't.
      
      In theory it would be possible to have more complex fallback strategies
      (e.g. try to split the group in half), but the simple fallback of not
      using a group seems to work for now.
      
      So far the weak group is only implemented for perf stat, not for record.
      
      Here's an unschedulable group (on IvyBridge with SMT on)
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
              73,806,067      branches
               4,848,144      branch-misses             #    6.57% of all branches
              14,754,458      l1d.replacement
              24,905,558      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd         <------- will never report anything
      
      With the weak group:
      
        % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1
      
             125,366,055      branches                                                      (80.02%)
               9,208,402      branch-misses             #    7.35% of all branches          (80.01%)
              24,560,249      l1d.replacement                                               (80.00%)
              43,174,971      l2_lines_in.all                                               (80.05%)
              31,891,457      l2_rqsts.all_code_rd                                          (79.92%)
      
      The extra event scheduled with some extra multiplexing
      
      v2: Move fallback code to separate function.
      Add comment on for_each_group_member
      Adjust to new perf_evsel__close interface
      v3: Fix debug print out.
      
      Committer testing:
      
      Before:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
           <not counted>      branches
           <not counted>      branch-misses
           <not counted>      l1d.replacement
           <not counted>      l2_lines_in.all
         <not supported>      l2_rqsts.all_code_rd
      
             1.002147212 seconds time elapsed
      
        # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
      
         Performance counter stats for 'system wide':
      
              83,207,892      branches
              11,065,444      l1d.replacement
              28,484,024      l2_lines_in.all
              12,186,179      l2_rqsts.all_code_rd
      
             1.001739493 seconds time elapsed
      
      After:
      
        # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1
      
         Performance counter stats for 'system wide':
      
             543,323,909      branches                                                      (80.01%)
              27,100,512      branch-misses             #    4.99% of all branches          (80.02%)
              50,402,905      l1d.replacement                                               (80.03%)
              67,385,892      l2_lines_in.all                                               (80.01%)
              21,352,885      l2_rqsts.all_code_rd                                          (79.94%)
      
             1.001086658 seconds time elapsed
      
        #
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org
      [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5a5dfe4b
  5. 22 8月, 2017 1 次提交
    • A
      perf bpf: Tighten detection of BPF events · 77d0871c
      Andi Kleen 提交于
        perf stat -e cpu/uops_executed.core,cmask=1/
      
      would be detected as a BPF source event because the .c matches the .c
      source BPF pattern.
      
      v2:
      
      Originally I tried to use lex lookahead, but it doesn't seem to work.
      
      This now extends the BPF pattern to match longer events, but then does
      an extra check in the C code to reject BPF matches that do not end with
      .c/.o/.obj
      
      This uses REJECT, which makes the flex scanner slower, but that
      shouldn't be a big problem for the perf events.
      
      Committer testing:
      
        # perf trace -e write -e /home/acme/bpf/tracepoint.c cat /etc/passwd > /dev/null
           0.000 ( 0.006 ms): cat/18485 write(fd: 1, buf: 0x7f59eebe1000, count: 3494                         ) ...
           0.006 (         ): raw_syscalls:sys_enter:NR 1 (1, 7f59eebe1000, da6, 22, 7f59eebe0010, 0))
           0.008 (         ): perf_bpf_probe:_write:(ffffffff9626b2c0))
           0.000 ( 0.010 ms): cat/18485  ... [continued]: write()) = 3494
        #
      
      It continues doing what was expected, i.e. identifying
      /home/acme/bpf/tracepoint.c as a BPF event and activates the clang
      machinery to build an eBPF object and then uses sys_bpf() to hook it up
      to the raw_syscalls:sys_enter tracepoint, etc.
      
      Andi forgot to add Wang to the CC list, fix it.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      77d0871c
  6. 17 10月, 2016 1 次提交
    • W
      perf jevents: Handle events including .c and .o · 2d470b62
      Wang Nan 提交于
      This patch helps with Sukadev's vendor event tree where such events can happen.
      
      >From Andi Kleen:
       Any event including a .c/.o/.bpf currently triggers BPF compilation or loading
       and then an error. This can happen for some Intel vendor events, which cannot
       be used.
      
      This patch fixes this problem by forbidding BPF file patch containing '{', '}'
      and ',', make sure flex consumes the leading '{', instead of matching it using
      a BPF file path.
      
      Tested result:
      
        $ perf stat -e '{unc_p_clockticks,unc_p_power_state_occupancy.cores_c0}' -a -I 1000
        invalid or unsupported event: '{unc_p_clockticks,unc_p_power_state_occupancy.cores_c0}'
        Run 'perf list' for a list of valid events
        (as expected, interperted as event)
      
        $ perf stat -e 'aaa.c' -a -I 1000
        ERROR: problems with path aaa.c: No such file or directory
        (as expected, interpreted as BPF source)
      
        $ perf stat -e 'aaa.ccc' -a -I 1000
        invalid or unsupported event: 'aaa.ccc'
        (as expected, interpreted as event)
      
        $ perf stat -e '{aaa.c}' -a -I 1000
        ERROR: problems with path aaa.c: No such file or directory
        event syntax error: '{aaa.c}'
        <SKIP>
        (as expected, interpreted as BPF source)
      
        $ perf stat -e '{cycles,aaa.c}' -a -I 1000
        ERROR: problems with path aaa.c: No such file or directory
        event syntax error: '{cycles,aaa.c}'
        (as expected, interpreted as BPF source)
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1475900185-37967-1-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d470b62
  7. 14 9月, 2016 1 次提交
  8. 16 7月, 2016 1 次提交
    • W
      perf tools: Enable overwrite settings · 626a6b78
      Wang Nan 提交于
      This patch allows following config terms and option:
      
      Globally setting events to overwrite;
      
        # perf record --overwrite ...
      
      Set specific events to be overwrite or no-overwrite.
      
        # perf record --event cycles/overwrite/ ...
        # perf record --event cycles/no-overwrite/ ...
      
      Add missing config terms and update the config term array size because
      the longest string length has changed.
      
      For overwritable events, it automatically selects attr.write_backward
      since perf requires it to be backward for reading.
      
      Test result:
      
        # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
        # perf evlist -v
        syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      626a6b78
  9. 07 6月, 2016 1 次提交
    • A
      perf stat: Basic support for TopDown in perf stat · 44b1e60a
      Andi Kleen 提交于
      Add basic plumbing for TopDown in perf stat
      
      TopDown is intended to replace the frontend cycles idle/ backend cycles
      idle metrics in standard perf stat output.  These metrics are not
      reliable in many workloads, due to out of order effects.
      
      This implements a new --topdown mode in perf stat (similar to
      --transaction) that measures the pipe line bottlenecks using
      standardized formulas. The measurement can be all done with 5 counters
      (one fixed counter)
      
      The result are four metrics:
      
      FrontendBound, BackendBound, BadSpeculation, Retiring
      
      that describe the CPU pipeline behavior on a high level.
      
      The full top down methology has many hierarchical metrics.  This
      implementation only supports level 1 which can be collected without
      multiplexing. A full implementation of top down on top of perf is
      available in pmu-tools toplev.  (http://github.com/andikleen/pmu-tools)
      
      The current version works on Intel Core CPUs starting with Sandy Bridge,
      and Atom CPUs starting with Silvermont.  In principle the generic
      metrics should be also implementable on other out of order CPUs.
      
      TopDown level 1 uses a set of abstracted metrics which are generic to
      out of order CPU cores (although some CPUs may not implement all of
      them):
      
        topdown-total-slots       Available slots in the pipeline
        topdown-slots-issued      Slots issued into the pipeline
        topdown-slots-retired     Slots successfully retired
        topdown-fetch-bubbles     Pipeline gaps in the frontend
        topdown-recovery-bubbles  Pipeline gaps during recovery
                                  from misspeculation
      
      These metrics then allow to compute four useful metrics:
      
      FrontendBound, BackendBound, Retiring, BadSpeculation.
      
      Add a new --topdown options to enable events.  When --topdown is
      specified set up events for all topdown events supported by the kernel.
      Add topdown-* as a special case to the event parser, as is needed for
      all events containing -.
      
      The actual code to compute the metrics is in follow-on patches.
      
      v2: Use standard sysctl read function.
      v3: Move x86 specific code to arch/
      v4: Enable --metric-only implicitly for topdown.
      v5: Add --single-thread option to not force per core mode
      v6: Fix output order of topdown metrics
      v7: Allow combining with -d
      v8: Remove --single-thread again
      v9: Rename functions, adding arch_ and topdown_.
      v10: Expand man page and describe TopDown better
      Paste intro into commit description.
      Print error when malloc fails.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1464119559-17203-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      44b1e60a
  10. 30 5月, 2016 1 次提交
    • A
      perf tools: Per event max-stack settings · 792d48b4
      Arnaldo Carvalho de Melo 提交于
      The tooling counterpart, now it is possible to do:
      
        # perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
        # perf evlist -v
        sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
        cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
        cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
        # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
      
      Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
      be further configurable by means of some .perfconfig knob.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      792d48b4
  11. 23 2月, 2016 1 次提交
    • W
      perf tools: Introduce bpf-output event · 03e0a7df
      Wang Nan 提交于
      Commit a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      adds a helper to enable a BPF program to output data to a perf ring
      buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
      patch enables perf to create events of that type. Now a perf user can
      use the following cmdline to receive output data from BPF programs:
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script
           perf 1560 [004] 347747.086295:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086300:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086315:  evt: ffffffff811fd201 sys_write ...
                  ...
      
      Test result:
      
        # cat test_bpf_output.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
       	struct {
       		u64 ktime;
       		int cpuid;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output: %d\n";
      
       	output_data.cpuid = get_smp_processor_id();
       	output_data.ktime = ktime_get_ns();
       	int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				    &output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data), err);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************ END ***************************/
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script | grep ls
           ls  2242 [003] 347851.557563:   evt: ffffffff811fd201 sys_write ...
           ls  2242 [003] 347851.557571:   evt: ffffffff811fd201 sys_write ...
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03e0a7df
  12. 22 2月, 2016 2 次提交
    • W
      perf tools: Enable indices setting syntax for BPF map · e571e029
      Wang Nan 提交于
      This patch introduces a new syntax to perf event parser:
      
       # perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
      
      By utilizing the basic facilities in bpf-loader.c which allow setting
      different slots in a BPF map separately, the newly introduced syntax
      allows perf to control specific elements in a BPF map.
      
      Test result:
      
        # cat ./test_bpf_map_3.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
      	unsigned int type;
      	unsigned int key_size;
      	unsigned int value_size;
      	unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
       	(void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(unsigned char),
       	.max_entries = 100,
        };
        SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
        int func(void *ctx, int err, long nsec)
        {
       	char fmt[] = "%ld\n";
       	long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
       	int key = (int)usec;
       	unsigned char *pval = map_lookup_elem(&channel, &key);
      
       	if (!pval)
       		return 0;
       	trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Normal case:
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 3
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 15
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
                  usleep-655   [006] d... 2745434.122814: : 102
                  usleep-904   [006] d... 2745439.916264: : 103
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[all]=104/' usleep 99
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
                  usleep-655   [006] d... 2745434.122814: : 102
                  usleep-904   [006] d... 2745439.916264: : 103
                  usleep-1537  [003] d... 2745538.053737: : 104
      
      Error case:
      
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[10...1000]=104/' usleep 99
        event syntax error: '..annel.value[10...1000]=104/'
                                         \___ Index too large
        Hint:	Valid config terms:
            	map:[<arraymap>].value<indices>=[value]
            	map:[<eventmap>].event<indices>=[event]
      
            	where <indices> is something like [0,3...5] or [all]
            	(add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-9-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e571e029
    • W
      perf tools: Enable BPF object configure syntax · a34f3be7
      Wang Nan 提交于
      This patch adds the final step for BPF map configuration. A new syntax
      is appended into parser so user can config BPF objects through '/' '/'
      enclosed config terms.
      
      After this patch, following syntax is available:
      
        # perf record -e ./test_bpf_map_1.c/map:channel.value=10/ ...
      
      It would takes effect after appling following commits.
      
      Test result:
      
        # cat ./test_bpf_map_1.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
            (void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
            .type = BPF_MAP_TYPE_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = 1,
        };
        SEC("func=sys_nanosleep")
        int func(void *ctx)
        {
            int key = 0;
            char fmt[] = "%d\n";
            int *pval = map_lookup_elem(&channel, &key);
            if (!pval)
                return 0;
            trace_printk(fmt, sizeof(fmt), *pval);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
       - Normal case:
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
      
       - Error case:
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value/' usleep 10
        event syntax error: '..ps:channel:value/'
                                         \___ Config value not set (missing '=')
        Hint:	Valid config term:
               map:[<arraymap>]:value=[value]
               (add -v to see detail)
        Run 'perf list' for a list of valid events
      
        Usage: perf record [<options>] [<command>]
           or: perf record [<options>] -- <command> [<options>]
      
           -e, --event <event>   event selector. use 'perf list' to list available events
      
        # ./perf record -e './test_bpf_map_1.c/xmap:channel.value=10/' usleep 10
        event syntax error: '..pf_map_1.c/xmap:channel.value=10/'
                                          \___ Invalid object config option
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:xchannel.value=10/' usleep 10
        event syntax error: '..p_1.c/map:xchannel.value=10/'
                                          \___ Target map not exist
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.xvalue=10/' usleep 10
        event syntax error: '..ps:channel.xvalue=10/'
                                          \___ Invalid object map config option
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=x10/' usleep 10
        event syntax error: '..nnel.value=x10/'
                                          \___ Incorrect value type for map
        [SNIP]
      
        Change BPF_MAP_TYPE_ARRAY to '1' in test_bpf_map_1.c:
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
        event syntax error: '..ps:channel.value=10/'
                                          \___ Can't use this config term to this type of map
      
        Hint:	Valid config term:
            	map:[<arraymap>].value=[value]
            	(add -v to see detail)
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      [for parser part]
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-5-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a34f3be7
  13. 20 2月, 2016 1 次提交
    • W
      perf tools: Create config_term_names array · 17cb5f84
      Wang Nan 提交于
      config_term_names[] is introduced for future commits which will be able
      to retrieve the config name through the config term.
      
      Utilize this array in parse_events_formats_error_string() so the missing
      '{,no-}inherit' terms are added.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1455882283-79592-10-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      17cb5f84
  14. 30 10月, 2015 1 次提交
    • W
      perf tools: Compile scriptlets to BPF objects when passing '.c' to --event · d509db04
      Wang Nan 提交于
      This patch provides infrastructure for passing source files to --event
      directly using:
      
       # perf record --event bpf-file.c command
      
      This patch does following works:
      
       1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
          expanded to allow caller tell it whether the passed file is source
          file or object.
      
       2) llvm__compile_bpf() is called to compile the '.c' file, the result
          is saved into memory. Use bpf_object__open_buffer() to load the
          in-memory object.
      
      Introduces a bpf-script-example.c so we can manually test it:
      
       # perf record --clang-opt "-DLINUX_VERSION_CODE=0x40200" --event ./bpf-script-example.c sleep 1
      
      Note that '--clang-opt' must put before '--event'.
      
      Futher patches will merge it into a testcase so can be tested automatically.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-10-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d509db04
  15. 28 10月, 2015 2 次提交
    • W
      perf tools: Enable passing bpf object file to --event · 84c86ca1
      Wang Nan 提交于
      By introducing new rules in tools/perf/util/parse-events.[ly], this
      patch enables 'perf record --event bpf_file.o' to select events by an
      eBPF object file. It calls parse_events_load_bpf() to load that file,
      which uses bpf__prepare_load() and finally calls bpf_object__open() for
      the object files.
      
      After applying this patch, commands like:
      
       # perf record --event foo.o sleep
      
      become possible.
      
      However, at this point it is unable to link any useful things onto the
      evsel list because the creating of probe points and BPF program
      attaching have not been implemented.  Before real events are possible to
      be extracted, to avoid perf report error because of empty evsel list,
      this patch link a dummy evsel. The dummy event related code will be
      removed when probing and extracting code is ready.
      
      Commiter notes:
      
      Using it:
      
        $ ls -la foo.o
        ls: cannot access foo.o: No such file or directory
        $ perf record --event foo.o sleep
        libbpf: failed to open foo.o: No such file or directory
        event syntax error: 'foo.o'
                             \___ BPF object file 'foo.o' is invalid
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $
      
        $ file /tmp/build/perf/perf.o
        /tmp/build/perf/perf.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
        $ perf record --event /tmp/build/perf/perf.o sleep
        libbpf: /tmp/build/perf/perf.o is not an eBPF object file
        event syntax error: '/tmp/build/perf/perf.o'
                             \___ BPF object file '/tmp/build/perf/perf.o' is invalid
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $
      
        $ file /tmp/foo.o
        /tmp/foo.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
        $ perf record --event /tmp/foo.o sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data ]
        $ perf evlist
        /tmp/foo.o
        $ perf evlist  -v
        /tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        $
      
      So, type 1 is PERF_TYPE_SOFTWARE, config 0x9 is PERF_COUNT_SW_DUMMY, ok.
      
        $ perf report --stdio
        Error:
        The perf.data file has no samples!
        # To display the perf.data header info, please use --header/--header-only options.
        #
        $
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      84c86ca1
    • W
      perf tools: Enable pre-event inherit setting by config terms · 374ce938
      Wang Nan 提交于
      This patch allows perf record setting event's attr.inherit bit by
      config terms like:
      
        # perf record -e cycles/no-inherit/ ...
        # perf record -e cycles/inherit/ ...
      
      So user can control inherit bit for each event separately.
      
      In following example, a.out fork()s in main then do some complex
      CPU intensive computations in both of its children.
      
      Basic result with and without inherit:
      
        # perf record -e cycles -e instructions ./a.out
        [ perf record: Woken up 9 times to write data ]
        [ perf record: Captured and wrote 2.205 MB perf.data (47920 samples) ]
        # perf report --stdio
        # ...
        # Samples: 23K of event 'cycles'
        # Event count (approx.): 23641752891
        ...
        # Samples: 24K of event 'instructions'
        # Event count (approx.): 30428312415
      
        # perf record -i -e cycles -e instructions ./a.out
        [ perf record: Woken up 5 times to write data ]
        [ perf record: Captured and wrote 1.111 MB perf.data (24019 samples) ]
        ...
        # Samples: 12K of event 'cycles'
        # Event count (approx.): 11699501775
        ...
        # Samples: 12K of event 'instructions'
        # Event count (approx.): 15058023559
      
      Cancel inherit for one event when globally enable:
      
        # perf record -e cycles/no-inherit/ -e instructions ./a.out
        [ perf record: Woken up 7 times to write data ]
        [ perf record: Captured and wrote 1.660 MB perf.data (36004 samples) ]
        ...
        # Samples: 12K of event 'cycles/no-inherit/'
        # Event count (approx.): 11895759282
       ...
        # Samples: 24K of event 'instructions'
        # Event count (approx.): 30668000441
      
      Enable inherit for one event when globally disable:
      
        # perf record -i -e cycles/inherit/ -e instructions ./a.out
        [ perf record: Woken up 7 times to write data ]
        [ perf record: Captured and wrote 1.654 MB perf.data (35868 samples) ]
        ...
        # Samples: 23K of event 'cycles/inherit/'
        # Event count (approx.): 23285400229
        ...
        # Samples: 11K of event 'instructions'
        # Event count (approx.): 14969050259
      
      Committer note:
      
      One can check if the bit was set, in addition to seeing the result in
      the perf.data file size as above by doing one of:
      
        # perf record -e cycles -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.911 MB perf.data (63 samples) ]
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
        #
      
      So, the inherit bit was set in both, now, if we disable it globally using
      --no-inherit:
      
        # perf record --no-inherit -e cycles -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.910 MB perf.data (56 samples) ]
        # perf evlist -v
        cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
      
      No inherit bit set, then disabling it and setting just on the cycles event:
      
        # perf record --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.909 MB perf.data (48 samples) ]
        # perf evlist -v
        cycles/inherit/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
        #
      
      We can see it as well in by using a more verbose level of debug messages in
      the tool that sets up the perf_event_attr, 'perf record' in this case:
      
        [root@zoo ~]# perf record -vv --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          inherit                          1
          mmap                             1
          comm                             1
          freq                             1
          task                             1
          sample_id_all                    1
          exclude_guest                    1
          mmap2                            1
          comm_exec                        1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
        ------------------------------------------------------------
        perf_event_attr:
          size                             112
          config                           0x1
          { sample_period, sample_freq }   4000
          sample_type                      IP|TID|TIME|ID|CPU|PERIOD
          read_format                      ID
          disabled                         1
          freq                             1
          sample_id_all                    1
          exclude_guest                    1
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
      
      <SNIP>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446029705-199659-2-git-send-email-wangnan0@huawei.com
      [ s/u64/bool/ for the perf_evsel_config_term inherit field - jolsa]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      374ce938
  16. 06 10月, 2015 1 次提交
  17. 29 9月, 2015 1 次提交
    • H
      perf tools: Show proper error message for wrong terms of hw/sw events · ffeb883e
      He Kuang 提交于
      Show proper error message and show valid terms when wrong config terms
      is specified for hw/sw type perf events.
      
      This patch makes the original error format function formats_error_string()
      more generic, which only outputs the static config terms for hw/sw perf
      events, and prepends pmu formats for pmu events.
      
      Before this patch:
      
        $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
        invalid or unsupported event: 'cpu-clock/freqx=200/'
        Run 'perf list' for a list of valid events
      
         usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      After this patch:
      
        $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
        event syntax error: 'cpu-clock/freqx=200/'
                                       \___ unknown term
      
        valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size
      
        Run 'perf list' for a list of valid events
      
         usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: NHe Kuang <hekuang@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1443412336-120050-2-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ffeb883e
  18. 13 8月, 2015 1 次提交
    • K
      perf callchain: Per-event type selection support · d457c963
      Kan Liang 提交于
      This patchkit adds the ability to set callgraph mode (fp, dwarf, lbr) per
      event. This in term can reduce sampling overhead and the size of the
      perf.data.
      
      Here is an example.
      
        perf record -e 'cpu/cpu-cycles,period=1000,call-graph=fp,time=1/,cpu/instructions,call-graph=lbr/' sleep 1
      
       perf evlist -v
       cpu/cpu-cycles,period=1000,call-graph=fp,time=1/: type: 4, size: 112,
       config: 0x3c, { sample_period, sample_freq }: 1000, sample_type:
       IP|TID|TIME|CALLCHAIN|PERIOD|IDENTIFIER, read_format: ID, disabled: 1,
       inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all:
       1, exclude_guest: 1, mmap2: 1, comm_exec: 1
       cpu/instructions,call-graph=lbr/: type: 4, size: 112, config: 0xc0, {
       sample_period, sample_freq }: 4000, sample_type:
       IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID,
       disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1,
       exclude_guest: 1
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1439289050-40510-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d457c963
  19. 11 8月, 2015 1 次提交
  20. 05 8月, 2015 1 次提交
  21. 24 6月, 2015 1 次提交
  22. 29 4月, 2015 2 次提交
  23. 08 4月, 2015 1 次提交
  24. 03 12月, 2014 1 次提交
  25. 16 10月, 2014 1 次提交
  26. 11 10月, 2013 1 次提交
  27. 03 9月, 2013 1 次提交
  28. 08 8月, 2013 2 次提交
    • M
      perf tools: Add support for pinned modifier · e9a7c414
      Michael Ellerman 提交于
      This commit adds support for a new modifier "D", which requests that the
      event, or group of events, be pinned to the PMU.
      
      The "p" modifier is already taken for precise, and "P" may be used in
      future to mean "fully precise".
      
      So we use "D", which stands for pinneD - and looks like a padlock, or if
      you're using the ":D" syntax perf smiles at you.
      
      This is an oft-requested feature from our HW folks, who want to be able
      to run a large number of events, but also want 100% accurate results for
      instructions per cycle.
      
      Comparison of results with and without pinning:
      
      $ perf stat -e '{cycles,instructions}:D' -e cycles,instructions,...
      
        79,590,480,683 cycles         #  0.000 GHz
       166,123,716,524 instructions   #  2.09  insns per cycle
                                      #  0.11  stalled cycles per insn
      
        79,352,134,463 cycles         #  0.000 GHz                     [11.11%]
       165,178,301,818 instructions   #  2.08  insns per cycle
                                      #  0.11  stalled cycles per insn [11.13%]
      
      As you can see although perf does a very good job of scaling the values
      in the non-pinned case, there is some small discrepancy.
      
      The patch is fairly straight forward, the one detail is that we need to
      make sure we only request pinning for the group leader when we have a
      group.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1375795686-4226-1-git-send-email-michael@ellerman.id.au
      [ Use perf_evsel__is_group_leader instead of open coded equivalent, as
        suggested by Jiri Olsa ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9a7c414
    • J
      perf tools: Add 'S' event/group modifier to read sample value · 3c176311
      Jiri Olsa 提交于
      Adding 'S' event/group modifier to specify that the event value/s are
      read by PERF_SAMPLE_READ sample type processing, instead of the period
      value offered by lower layers.
      
      There's additional behaviour change for 'S' modifier being specified on
      event group:
      
      Currently all the events within a group makes samples. If user now
      specifies 'S' within group modifier, only the leader will trigger
      samples. The rest of events in the group will have sampling disabled.
      
      And same as for single events, values of all events within the group
      (including leader) are read by PERF_SAMPLE_READ sample type processing.
      
      Following example will create event group with cycles and cache-misses
      events, setting the cycles as group leader and the only event to
      actually sample. Both cycles and cache-misses event period values are
      read by PERF_SAMPLE_READ sample type processing with PERF_FORMAT_GROUP
      read format.
      
      Example:
      
        $ perf record -e '{cycles,cache-misses}:S' ls
        ...
        $ perf report --group --show-total-period --stdio
        ...
        # Samples: 36  of event 'anon group { cycles, cache-misses }'
        # Event count (approx.): 12585593
        #
        #       Overhead          Period  Command      Shared Object                      Symbol
        # ..............  ..............  .......  .................  ..........................
        #
          19.92%   1.20%  2505936     31       ls  [kernel.kallsyms]  [k] mark_held_locks
          13.74%   0.47%  1729327     12       ls  [kernel.kallsyms]  [k] sched_clock_local
          13.64%  23.72%  1716147    612       ls  ld-2.14.90.so      [.] check_match.10805
          13.12%  23.22%  1650778    599       ls  libc-2.14.90.so    [.] _nl_intern_locale_data
          11.24%  29.19%  1414554    753       ls  [kernel.kallsyms]  [k] sched_clock_cpu
           8.50%   0.35%  1070150      9       ls  [kernel.kallsyms]  [k] check_chain_key
        ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-iyoinu3axi11mymwnh2b7fxj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3c176311
  29. 15 11月, 2012 1 次提交
  30. 09 11月, 2012 1 次提交
  31. 11 9月, 2012 1 次提交
    • I
      perf tools: Use __maybe_used for unused variables · 1d037ca1
      Irina Tirdea 提交于
      perf defines both __used and __unused variables to use for marking
      unused variables. The variable __used is defined to
      __attribute__((__unused__)), which contradicts the kernel definition to
      __attribute__((__used__)) for new gcc versions. On Android, __used is
      also defined in system headers and this leads to warnings like: warning:
      '__used__' attribute ignored
      
      __unused is not defined in the kernel and is not a standard definition.
      If __unused is included everywhere instead of __used, this leads to
      conflicts with glibc headers, since glibc has a variables with this name
      in its headers.
      
      The best approach is to use __maybe_unused, the definition used in the
      kernel for __attribute__((unused)). In this way there is only one
      definition in perf sources (instead of 2 definitions that point to the
      same thing: __used and __unused) and it works on both Linux and Android.
      This patch simply replaces all instances of __used and __unused with
      __maybe_unused.
      Signed-off-by: NIrina Tirdea <irina.tirdea@intel.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
      [ committer note: fixed up conflict with a116e05d in builtin-sched.c ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1d037ca1
  32. 23 8月, 2012 1 次提交
  33. 15 8月, 2012 1 次提交
    • J
      perf tools: Add support to parse event group syntax · 89efb029
      Jiri Olsa 提交于
      Adding scanner/parser bits to parse event groups.
      
      The grammar for group is:
        groups:      groups ',' group | group
        group:       group_name '{' events '}' group_mod
        group_name:  name | empty
        group_mod:   ':' group_mods | empty
        group_mods:  event_mod
      
      It's possible to use standard event modifier as a modifier
      for group. It'll be used as an update to existing event
      modifiers.
      
      It's necessary to use quoting ("'\) when specifying group on
      command line, since {} characters are interpreted by most of
      the shells.
      
      It is now possible to specify groups in event syntax like:
      
        '{cycles,faults}'
         - anonymous group
      
        'group1{cycles,faults}
         - group with name 'group1'
      
        '{cycles,faults}:k
         - anonymous group with event modifier 'k'
      
        '{cpu-clock,task-clock},{minor-faults,major-faults}'
         - two anonymous groups
      
      The grouping functionality itself is coming shortly.
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/n/tip-p4j8bnvo879uokum4k4zk5q6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      89efb029
  34. 08 8月, 2012 1 次提交