1. 07 6月, 2016 7 次提交
  2. 12 4月, 2016 2 次提交
    • W
      perf bpf: Automatically create bpf-output event __bpf_stdout__ · 72c08098
      Wang Nan 提交于
      This patch removes the need to set a bpf-output event in cmdline.  By
      referencing a map named '__bpf_stdout__', perf automatically creates an
      event for it.
      
      For example:
      
        # perf record -e ./test_bpf_trace.c usleep 100000
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
        # perf script
                 usleep  4639 [000] 261895.307826:        0            __bpf_stdout__:  ffffffff810eb9a1 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
                 usleep  4639 [000] 261895.407883:        0            __bpf_stdout__:  ffffffff8105d609 ...
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
      
        perf record -e ./test_bpf_trace.c usleep 100000
      
        equals to:
      
        perf record -e bpf-output/no-inherit=1,name=__bpf_stdout__/ \
                    -e ./test_bpf_trace.c/map:__bpf_stdout__.event=__bpf_stdout__/ \
                    usleep 100000
      
      Where test_bpf_trace.c is:
      
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
               unsigned int type;
               unsigned int key_size;
               unsigned int value_size;
               unsigned int max_entries;
        };
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
               (void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
               (void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
               (void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
               (void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") __bpf_stdout__ = {
               .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
               .key_size = sizeof(int),
               .value_size = sizeof(u32),
               .max_entries = __NR_CPUS__,
        };
      
        static inline int __attribute__((always_inline))
        func(void *ctx, int type)
        {
      	char output_str[] = "Raise a BPF event!";
      	char err_str[] = "BAD %d\n";
      	int err;
      
              err = perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(),
      			        &output_str, sizeof(output_str));
      	if (err)
      		trace_printk(err_str, sizeof(err_str), err);
              return 1;
        }
        SEC("func_begin=sys_nanosleep")
        int func_begin(void *ctx) {return func(ctx, 1);}
        SEC("func_end=sys_nanosleep%return")
        int func_end(void *ctx) { return func(ctx, 2);}
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Committer note:
      
      Testing with 'perf trace':
      
        # trace -e nanosleep --ev test_bpf_stdout.c usleep 1
           0.007 ( 0.007 ms): usleep/729 nanosleep(rqtp: 0x7ffc5bbc5fe0) ...
           0.007 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.008 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.069 (         ): __bpf_stdout__:Raise a BPF event!..)
           0.070 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.072 ( 0.072 ms): usleep/729  ... [continued]: nanosleep()) = 0
        #
      Suggested-and-Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-5-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      72c08098
    • W
      perf bpf: Clone bpf stdout events in multiple bpf scripts · d7888573
      Wang Nan 提交于
      This patch allows cloning bpf-output event configuration among multiple
      bpf scripts. If there exist a map named '__bpf_output__' and not
      configured using 'map:__bpf_output__.event=', this patch clones the
      configuration of another '__bpf_stdout__' map. For example, following
      command:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/ \
                     --ev ./test_bpf_trace2.c usleep 100000
      
      equals to:
      
        # perf trace --ev bpf-output/no-inherit,name=evt/ \
                     --ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/  \
                     --ev ./test_bpf_trace2.c/map:__bpf_stdout__.event=evt/ \
                     usleep 100000
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Suggested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460128045-97310-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d7888573
  3. 23 2月, 2016 1 次提交
    • W
      perf tools: Introduce bpf-output event · 03e0a7df
      Wang Nan 提交于
      Commit a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      adds a helper to enable a BPF program to output data to a perf ring
      buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
      patch enables perf to create events of that type. Now a perf user can
      use the following cmdline to receive output data from BPF programs:
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script
           perf 1560 [004] 347747.086295:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086300:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086315:  evt: ffffffff811fd201 sys_write ...
                  ...
      
      Test result:
      
        # cat test_bpf_output.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
       	struct {
       		u64 ktime;
       		int cpuid;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output: %d\n";
      
       	output_data.cpuid = get_smp_processor_id();
       	output_data.ktime = ktime_get_ns();
       	int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				    &output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data), err);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************ END ***************************/
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script | grep ls
           ls  2242 [003] 347851.557563:   evt: ffffffff811fd201 sys_write ...
           ls  2242 [003] 347851.557571:   evt: ffffffff811fd201 sys_write ...
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03e0a7df
  4. 22 2月, 2016 4 次提交
    • W
      perf tools: Support setting different slots in a BPF map separately · 2d055bf2
      Wang Nan 提交于
      This patch introduces basic facilities to support config different slots
      in a BPF map one by one.
      
      array.nr_ranges and array.ranges are introduced into 'struct
      parse_events_term', where ranges is an array of indices range (start,
      length) which will be configured by this config term. nr_ranges is the
      size of the array. The array is passed to 'struct bpf_map_priv'.  To
      indicate the new type of configuration, BPF_MAP_KEY_RANGES is added as a
      new key type. bpf_map_config_foreach_key() is extended to iterate over
      those indices instead of all possible keys.
      
      Code in this commit will be enabled by following commit which enables
      the indices syntax for array configuration.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-8-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2d055bf2
    • W
      perf tools: Enable passing event to BPF object · 7630b3e2
      Wang Nan 提交于
      A new syntax is added to the parser so that the user can access
      predefined perf events in BPF objects.
      
      After this patch, BPF programs for perf are finally able to utilize
      bpf_perf_event_read() introduced in commit 35578d79 ("bpf: Implement
      function bpf_perf_event_read() that get the selected hardware PMU
      counter").
      
      Test result:
      
        # cat test_bpf_map_2.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
            (void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_read)(struct bpf_map_def *, int) =
            (void *)BPF_FUNC_perf_event_read;
      
        struct bpf_map_def SEC("maps") pmu_map = {
            .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = __NR_CPUS__,
        };
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
            unsigned long long val;
            char fmt[] = "sys_write:        pmu=%llu\n";
            val = perf_event_read(&pmu_map, get_smp_processor_id());
            trace_printk(fmt, sizeof(fmt), val);
            return 0;
        }
      
        SEC("func_write_return=sys_write%return")
        int func_write_return(void *ctx)
        {
            unsigned long long val = 0;
            char fmt[] = "sys_write_return: pmu=%llu\n";
            val = perf_event_read(&pmu_map, get_smp_processor_id());
            trace_printk(fmt, sizeof(fmt), val);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Normal case:
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
        [SNIP]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
        # cat /sys/kernel/debug/tracing/trace | grep ls
                      ls-17066 [000] d... 938449.863301: : sys_write:        pmu=1157327
                      ls-17066 [000] dN.. 938449.863342: : sys_write_return: pmu=1225218
                      ls-17066 [000] d... 938449.863349: : sys_write:        pmu=1241922
                      ls-17066 [000] dN.. 938449.863369: : sys_write_return: pmu=1267445
      
      Normal case (system wide):
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.811 MB perf.data (120 samples) ]
      
        # cat /sys/kernel/debug/tracing/trace | grep -v '18446744073709551594' | grep -v perf | head -n 20
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   gmain-30828 [002] d... 2740551.068992: : sys_write:        pmu=84373
                   gmain-30828 [002] d... 2740551.068992: : sys_write_return: pmu=87696
                   gmain-30828 [002] d... 2740551.068996: : sys_write:        pmu=100658
                   gmain-30828 [002] d... 2740551.068997: : sys_write_return: pmu=102572
      
      Error case 1:
      
        # perf record -e './test_bpf_map_2.c' ls /
        [SNIP]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.014 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep ls
                      ls-17115 [007] d... 2724279.665625: : sys_write:        pmu=18446744073709551614
                      ls-17115 [007] dN.. 2724279.665651: : sys_write_return: pmu=18446744073709551614
                      ls-17115 [007] d... 2724279.665658: : sys_write:        pmu=18446744073709551614
                      ls-17115 [007] dN.. 2724279.665677: : sys_write_return: pmu=18446744073709551614
      
        (18446744073709551614 is 0xfffffffffffffffe (-2))
      
      Error case 2:
      
        # perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=evt/' -a
        event syntax error: '..ps:pmu_map.event=evt/'
                                          \___ Event not found for map setting
      
        Hint:	Valid config terms:
             	map:[<arraymap>].value=[value]
             	map:[<eventmap>].event=[event]
        [SNIP]
      
      Error case 3:
        # ls /proc/2348/task/
        2348  2505  2506  2507  2508
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -p 2348
        ERROR: Apply config to BPF failed: Cannot set event to BPF map in multi-thread tracing
      
      Error case 4:
        # perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
        ERROR: Apply config to BPF failed: Doesn't support inherit event (Hint: use -i to turn off inherit)
      
      Error case 5:
        # perf record -i -e raw_syscalls:sys_enter -e './test_bpf_map_2.c/map:pmu_map.event=raw_syscalls:sys_enter/' ls
        ERROR: Apply config to BPF failed: Can only put raw, hardware and BPF output event into a BPF map
      
      Error case 6:
        # perf record -i -e './test_bpf_map_2.c/map:pmu_map.event=123/' ls /
        event syntax error: '.._map.event=123/'
                                          \___ Incorrect value type for map
        [SNIP]
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-7-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7630b3e2
    • W
      perf record: Apply config to BPF objects before recording · 8690a2a7
      Wang Nan 提交于
      bpf__apply_obj_config() is introduced as the core API to apply object
      config options to all BPF objects. This patch also does the real work
      for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
      stored in map's private field into the BPF map.
      
      This patch is required because we are not always able to set all BPF
      config during parsing. Further patch will set events created by perf to
      BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist until
      perf_evsel__open().
      
      bpf_map_foreach_key() is introduced to iterate over each key needs to be
      configured. This function would be extended to support more map types
      and different key settings.
      
      In perf record, before start recording, call bpf__apply_config() to turn
      on all BPF config options.
      
      Test result:
      
        # cat ./test_bpf_map_1.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
            (void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
            .type = BPF_MAP_TYPE_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = 1,
        };
        SEC("func=sys_nanosleep")
        int func(void *ctx)
        {
            int key = 0;
            char fmt[] = "%d\n";
            int *pval = map_lookup_elem(&channel, &key);
            if (!pval)
                return 0;
            trace_printk(fmt, sizeof(fmt), *pval);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=11/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace
        # tracer: nop
        #
        # entries-in-buffer/entries-written: 1/1   #P:8
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   usleep-18593 [007] d... 2394714.395539: : 11
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=101/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace
        # tracer: nop
        #
        # entries-in-buffer/entries-written: 1/1   #P:8
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   usleep-18593 [007] d... 2394714.395539: : 11
                   usleep-19000 [006] d... 2394831.057840: : 101
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-6-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8690a2a7
    • W
      perf bpf: Add API to set values to map entries in a bpf object · 066dacbf
      Wang Nan 提交于
      bpf__config_obj() is introduced as a core API to config BPF object after
      loading. One configuration option of maps is introduced. After this
      patch BPF object can accept assignments like:
      
        map:my_map.value=1234
      
      (map.my_map.value looks pretty. However, there's a small but hard to fix
      problem related to flex's greedy matching. Please see [1].  Choose ':'
      to avoid it in a simpler way.)
      
      This patch is more complex than the work it does because the
      consideration of extension. In designing BPF map configuration, the
      following things should be considered:
      
       1. Array indices selection: perf should allow user setting different
          value for different slots in an array, with syntax like:
          map:my_map.value[0,3...6]=1234;
      
       2. A map should be set by different config terms, each for a part
          of it. For example, set each slot to the pid of a thread;
      
       3. Type of value: integer is not the only valid value type. A perf
          counter can also be put into a map after commit 35578d79
          ("bpf: Implement function bpf_perf_event_read() that get the
            selected hardware PMU counter")
      
       4. For a hash table, it should be possible to use a string or other
          value as a key;
      
       5. It is possible that map configuration is unable to be setup
          during parsing. A perf counter is an example.
      
      Therefore, this patch does the following:
      
       1. Instead of updating map element during parsing, this patch stores
          map config options in 'struct bpf_map_priv'. Following patches
          will apply those configs at an appropriate time;
      
       2. Link map operations in a list so a map can have multiple config
          terms attached, so different parts can be configured separately;
      
       3. Make 'struct bpf_map_priv' extensible so that the following patches
          can add new types of keys and operations;
      
       4. Use bpf_obj_config__map_funcs array to support more map config options.
      
      Since the patch changing the event parser to parse BPF object config is
      relative large, I've put it in another commit. Code in this patch can be
      tested after applying the next patch.
      
      [1] http://lkml.kernel.org/g/564ED621.4050500@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-4-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      [ Changes "maps:my_map.value" to "map:my_map.value", improved error messages ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      066dacbf
  5. 20 2月, 2016 1 次提交
    • W
      perf bpf: Rename bpf_prog_priv__clear() to clear_prog_priv() · 80cdce76
      Wang Nan 提交于
      The name of bpf_prog_priv__clear() doesn't follow perf's naming
      convention. bpf_prog_priv__delete() seems to be a better name. However,
      bpf_prog_priv__delete() should be a method of 'struct bpf_prog_priv',
      but its first parameter is 'struct bpf_program'.
      
      It is callback from libbpf to clear priv structures when destroying a
      bpf program. It is actually a method of bpf_program (libbpf object), but
      bpf_program__ functions should be provided by libbpf.
      
      This patch removes the prefix of that function.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1455882283-79592-4-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      80cdce76
  6. 28 11月, 2015 1 次提交
  7. 19 11月, 2015 6 次提交
    • W
      perf bpf: Use same BPF program if arguments are identical · d35b3289
      Wang Nan 提交于
      This patch allows creating only one BPF program for different
      'probe_trace_event'(tev) entries generated by one
      'perf_probe_event'(pev) if their prologues are identical.
      
      This is done by comparing the argument list of different tev instances,
      and the maps type of prologue and tev using a mapping array. This patch
      utilizes qsort to sort the tevs. After sorting, tevs with identical
      argument lists will be grouped together.
      
      Test result:
      
      Sample BPF program:
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        SEC("inlines=no;"
            "func=SyS_dup? oldfd")
        int func(void *ctx)
        {
            return 1;
        }
      
      It would probe at SyS_dup2 and SyS_dup3, obtaining oldfd as its
      argument.
      
      The following cmdline shows a BPF program being loaded into the kernel
      by perf:
      
       # perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
      
      Before this patch:
      
        # perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
        [1] 24858
        lrwx------ 1 root root 64 Nov 14 04:09 3 -> anon_inode:bpf-prog
        lrwx------ 1 root root 64 Nov 14 04:09 4 -> anon_inode:bpf-prog
        ...
      
      After this patch:
      
        # perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
        [1] 25699
        lrwx------ 1 root root 64 Nov 14 04:10 3 -> anon_inode:bpf-prog
        ...
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447749170-175898-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d35b3289
    • W
      perf bpf: Generate prologue for BPF programs · a08357d8
      Wang Nan 提交于
      This patch generates a prologue for each 'struct probe_trace_event' for
      fetching arguments for BPF programs.
      
      After bpf__probe(), iterate over each program to check whether prologues are
      required. If none of the 'struct perf_probe_event' programs will attach to have
      at least one argument, simply skip preprocessor hooking. For those who a
      prologue is required, call bpf__gen_prologue() and paste the original
      instruction after the prologue.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-12-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a08357d8
    • H
      perf bpf: Add prologue for BPF programs for fetching arguments · bfc077b4
      He Kuang 提交于
      This patch generates a prologue for a BPF program which fetches arguments for
      it.  With this patch, the program can have arguments as follow:
      
        SEC("lock_page=__lock_page page->flags")
        int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
        {
       	 return 1;
        }
      
      This patch passes at most 3 arguments from r3, r4 and r5. r1 is still the ctx
      pointer. r2 is used to indicate if dereferencing was done successfully.
      
      This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack pointer
      for result. Result of each arguments first store on stack:
      
       low address
       BPF_REG_FP - 24  ARG3
       BPF_REG_FP - 16  ARG2
       BPF_REG_FP - 8   ARG1
       BPF_REG_FP
       high address
      
      Then loaded into r3, r4 and r5.
      
      The output prologue for offn(...off2(off1(reg)))) should be:
      
           r6 <- r1			// save ctx into a callee saved register
           r7 <- fp
           r7 <- r7 - stack_offset	// pointer to result slot
           /* load r3 with the offset in pt_regs of 'reg' */
           (r7) <- r3			// make slot valid
           r3 <- r3 + off1		// prepare to read unsafe pointer
           r2 <- 8
           r1 <- r7			// result put onto stack
           call probe_read		// read unsafe pointer
           jnei r0, 0, err		// error checking
           r3 <- (r7)			// read result
           r3 <- r3 + off2		// prepare to read unsafe pointer
           r2 <- 8
           r1 <- r7
           call probe_read
           jnei r0, 0, err
           ...
           /* load r2, r3, r4 from stack */
           goto success
      err:
           r2 <- 1
           /* load r3, r4, r5 with 0 */
           goto usercode
      success:
           r2 <- 0
      usercode:
           r1 <- r6	// restore ctx
           // original user code
      
      If all of arguments reside in register (dereferencing is not
      required), gen_prologue_fastpath() will be used to create
      fast prologue:
      
           r3 <- (r1 + offset of reg1)
           r4 <- (r1 + offset of reg2)
           r5 <- (r1 + offset of reg3)
           r2 <- 0
      
      P.S.
      
      eBPF calling convention is defined as:
      
      * r0		- return value from in-kernel function, and exit value
                        for eBPF program
      * r1 - r5	- arguments from eBPF program to in-kernel function
      * r6 - r9	- callee saved registers that in-kernel function will
                        preserve
      * r10		- read-only frame pointer to access stack
      
      Committer note:
      
      At least testing if it builds and loads:
      
        # cat test_probe_arg.c
        struct pt_regs;
      
        __attribute__((section("lock_page=__lock_page page->flags"), used))
        int func(struct pt_regs *ctx, int err, unsigned long flags)
        {
        	return 1;
        }
      
        char _license[] __attribute__((section("license"), used)) = "GPL";
        int _version __attribute__((section("version"), used)) = 0x40300;
        # perf record -e ./test_probe_arg.c usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.016 MB perf.data ]
        # perf evlist
        perf_bpf_probe:lock_page
        #
      Signed-off-by: NHe Kuang <hekuang@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-11-git-send-email-wangnan0@huawei.comSigned-off-by: NWang Nan <wangnan0@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bfc077b4
    • W
      perf bpf: Allow BPF program config probing options · 03e01f56
      Wang Nan 提交于
      By extending the syntax of BPF object section names, this patch allows users to
      config probing options like what they can do in 'perf probe'.
      
      The error message in 'perf probe' is also updated.
      
      Test result:
      
      For following BPF file test_probe_glob.c:
      
        # cat test_probe_glob.c
        __attribute__((section("inlines=no;func=SyS_dup?"), used))
      
        int func(void *ctx)
        {
      	  return 1;
        }
      
        char _license[] __attribute__((section("license"), used)) = "GPL";
        int _version __attribute__((section("version"), used)) = 0x40300;
        #
        # ./perf record  -e ./test_probe_glob.c ls /
        ...
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data ]
        # ./perf evlist
        perf_bpf_probe:func_1
        perf_bpf_probe:func
      
      After changing "inlines=no" to "inlines=yes":
      
        # ./perf record  -e ./test_probe_glob.c ls /
        ...
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data ]
        # ./perf evlist
        perf_bpf_probe:func_3
        perf_bpf_probe:func_2
        perf_bpf_probe:func_1
        perf_bpf_probe:func
      
      Then test 'force':
      
      Use following program:
      
        # cat test_probe_force.c
        __attribute__((section("func=sys_write"), used))
      
        int funca(void *ctx)
        {
      	  return 1;
        }
      
        __attribute__((section("force=yes;func=sys_write"), used))
      
        int funcb(void *ctx)
        {
        	return 1;
        }
      
        char _license[] __attribute__((section("license"), used)) = "GPL";
        int _version __attribute__((section("version"), used)) = 0x40300;
        #
      
        # perf record -e ./test_probe_force.c usleep 1
        Error: event "func" already exists.
         Hint: Remove existing event by 'perf probe -d'
             or force duplicates by 'perf probe -f'
             or set 'force=yes' in BPF source.
        event syntax error: './test_probe_force.c'
                             \___ Probe point exist. Try 'perf probe -d "*"' and set 'force=yes'
      
        (add -v to see detail)
        ...
      
      Then replace 'force=no' to 'force=yes':
      
        # vim test_probe_force.c
        # perf record -e ./test_probe_force.c usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data ]
        # perf evlist
        perf_bpf_probe:func_1
        perf_bpf_probe:func
        #
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-7-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03e01f56
    • W
      perf bpf: Allow attaching BPF programs to modules symbols · 5dbd16c0
      Wang Nan 提交于
      By extending the syntax of BPF object section names, this patch allows
      users to attach BPF programs to symbols in modules. For example:
      
        SEC("module=i915;"
            "parse_cmds=i915_parse_cmds")
        int parse_cmds(void *ctx)
        {
            return 1;
        }
      
      The implementation is very simple: like what 'perf probe' does, for module,
      fill 'uprobe' field in 'struct perf_probe_event'. Other parts will be done
      automatically.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-5-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5dbd16c0
    • W
      perf bpf: Allow BPF program attach to uprobe events · 361f2b1d
      Wang Nan 提交于
      This patch adds a new syntax to the BPF object section name to support
      probing at uprobe event. Now we can use BPF program like this:
      
        SEC(
        "exec=/lib64/libc.so.6;"
        "libcwrite=__write"
        )
        int libcwrite(void *ctx)
        {
            return 1;
        }
      
      Where, in section name of a program, before the main config string, we
      can use 'key=value' style options. Now the only option key is "exec",
      for uprobes.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1447675815-166222-4-git-send-email-wangnan0@huawei.com
      [ Changed the separator from \n to ; ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      361f2b1d
  8. 07 11月, 2015 3 次提交
    • W
      perf test: Add 'perf test BPF' · ba1fae43
      Wang Nan 提交于
      This patch adds BPF testcase for testing BPF event filtering.
      
      By utilizing the result of 'perf test LLVM', this patch compiles the
      eBPF sample program then test its ability. The BPF script in 'perf test
      LLVM' lets only 50% samples generated by epoll_pwait() to be captured.
      This patch runs that system call for 111 times, so the result should
      contain 56 samples.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446817783-86722-8-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba1fae43
    • W
      perf bpf: Improve BPF related error messages · d3e0ce39
      Wang Nan 提交于
      A series of bpf loader related error codes were introduced to help error
      reporting. Functions were improved to return these new error codes.
      
      Functions which return pointers were adjusted to encode error codes into
      return value using the ERR_PTR() interface.
      
      bpf_loader_strerror() was improved to convert these error messages to
      strings. It checks the error codes and calls libbpf_strerror() and
      strerror_r() accordingly, so caller don't need to consider checking the
      range of the error code.
      
      In bpf__strerror_load(), print kernel version of running kernel and the
      object's 'version' section to notify user how to fix his/her program.
      
      v1 -> v2:
       Use macro for error code.
      
       Fetch error message based on array index, eliminate for-loop.
      
       Print version strings.
      
      Before:
      
        # perf record -e ./test_kversion_nomatch_program.o sleep 1
        event syntax error: './test_kversion_nomatch_program.o'
                             \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
        SKIP
      
        After:
      
        # perf record -e ./test_kversion_nomatch_program.o ls
        event syntax error: './test_kversion_nomatch_program.o'
                             \___ 'version' (4.4.0) doesn't match running kernel (4.3.0)
        SKIP
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446818289-87444-1-git-send-email-wangnan0@huawei.com
      [ Add 'static inline' to bpf__strerror_prepare_load() when LIBBPF is disabled ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d3e0ce39
    • W
      bpf tools: Improve libbpf error reporting · 6371ca3b
      Wang Nan 提交于
      In this patch, a series of libbpf specific error numbers and
      libbpf_strerror() are introduced to help reporting errors.
      
      Functions are updated to pass correct the error number through the
      CHECK_ERR() macro.
      
      All users of bpf_object__open{_buffer}() and bpf_program__title() in
      perf are modified accordingly. In addition, due to the error codes
      changing, bpf__strerror_load() is also modified to use them.
      
      bpf__strerror_head() is also changed accordingly so it can parse libbpf
      errors. bpf_loader_strerror() is introduced for that purpose, and will
      be improved by the following patch.
      
      load_program() is improved not to dump log buffer if it is empty. log
      buffer is also used to deduce whether the error was caused by an invalid
      program or other problem.
      
      v1 -> v2:
      
       - Using macro for error code.
      
       - Fetch error message based on array index, eliminate for-loop.
      
       - Use log buffer to detect the reason of failure. 3 new error code
         are introduced to replace LIBBPF_ERRNO__LOAD.
      
      In v1:
      
        # perf record -e ./test_ill_program.o ls
        event syntax error: './test_ill_program.o'
                             \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
        SKIP
      
        # perf record -e ./test_kversion_nomatch_program.o ls
        event syntax error: './test_kversion_nomatch_program.o'
                             \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
        SKIP
      
        # perf record -e ./test_big_program.o ls
        event syntax error: './test_big_program.o'
                             \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
        SKIP
      
        In v2:
      
        # perf record -e ./test_ill_program.o ls
        event syntax error: './test_ill_program.o'
                             \___ Kernel verifier blocks program loading
        SKIP
      
        # perf record -e ./test_kversion_nomatch_program.o
        event syntax error: './test_kversion_nomatch_program.o'
                             \___ Incorrect kernel version
        SKIP
        (Will be further improved by following patches)
      
        # perf record -e ./test_big_program.o
        event syntax error: './test_big_program.o'
                             \___ Program too big
        SKIP
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6371ca3b
  9. 03 11月, 2015 1 次提交
    • W
      perf bpf: Mute libbpf when '-v' not set · 7a011946
      Wang Nan 提交于
      According to [1], libbpf should be muted. This patch reset info and
      warning message level to ensure libbpf doesn't output anything even
      if error happened.
      
      [1] http://lkml.kernel.org/r/20151020151255.GF5119@kernel.org
      
      Committer note:
      
      Before:
      
      Testing it with an incompatible kernel version in the .c file that
      generated foo.o:
      
        [root@zoo ~]# perf record -e /tmp/foo.o sleep 1
        libbpf: load bpf program failed: Invalid argument
        libbpf: -- BEGIN DUMP LOG ---
        libbpf:
      
        libbpf: -- END LOG --
        libbpf: failed to load program 'fork=_do_fork'
        libbpf: failed to load object '/tmp/foo.o'
        event syntax error: '/tmp/foo.o'
                             \___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        [root@zoo ~]#
      
      After:
      
        [root@zoo ~]# perf record -e /tmp/foo.o sleep 1
        event syntax error: '/tmp/foo.o'
                             \___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        [root@zoo ~]#
      
      This, BTW, need fixing to emit a proper message by validating the
      version in the foo.o "version" ELF section against the running kernel,
      warning the user instead of asking the kernel to load a binary that it
      will refuse due to unmatching kernel version.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1446547486-229499-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7a011946
  10. 30 10月, 2015 1 次提交
    • W
      perf tools: Compile scriptlets to BPF objects when passing '.c' to --event · d509db04
      Wang Nan 提交于
      This patch provides infrastructure for passing source files to --event
      directly using:
      
       # perf record --event bpf-file.c command
      
      This patch does following works:
      
       1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
          expanded to allow caller tell it whether the passed file is source
          file or object.
      
       2) llvm__compile_bpf() is called to compile the '.c' file, the result
          is saved into memory. Use bpf_object__open_buffer() to load the
          in-memory object.
      
      Introduces a bpf-script-example.c so we can manually test it:
      
       # perf record --clang-opt "-DLINUX_VERSION_CODE=0x40200" --event ./bpf-script-example.c sleep 1
      
      Note that '--clang-opt' must put before '--event'.
      
      Futher patches will merge it into a testcase so can be tested automatically.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-10-git-send-email-wangnan0@huawei.comSigned-off-by: NHe Kuang <hekuang@huawei.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d509db04
  11. 29 10月, 2015 2 次提交
    • W
      perf bpf: Collect perf_evsel in BPF object files · 4edf30e3
      Wang Nan 提交于
      This patch creates a 'struct perf_evsel' for every probe in a BPF object
      file(s) and fills 'struct evlist' with them. The previously introduced
      dummy event is now removed. After this patch, the following command:
      
       # perf record --event filter.o ls
      
      Can trace on each of the probes defined in filter.o.
      
      The core of this patch is bpf__foreach_tev(), which calls a callback
      function for each 'struct probe_trace_event' event for a bpf program
      with each associated file descriptors. The add_bpf_event() callback
      creates evsels by calling parse_events_add_tracepoint().
      
      Since bpf-loader.c will not be built if libbpf is turned off, an empty
      bpf__foreach_tev() is defined in bpf-loader.h to avoid build errors.
      
      Committer notes:
      
      Before:
      
        # /tmp/oldperf record --event /tmp/foo.o -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.198 MB perf.data ]
        # perf evlist
        /tmp/foo.o
        # perf evlist -v
        /tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
        sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1,
        inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
        exclude_guest: 1, mmap2: 1, comm_exec: 1
      
      I.e. we create just the PERF_TYPE_SOFTWARE (type: 1),
      PERF_COUNT_SW_DUMMY(config 0x9) event, now, with this patch:
      
        # perf record --event /tmp/foo.o -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.210 MB perf.data ]
        # perf evlist -v
        perf_bpf_probe:fork: type: 2, size: 112, config: 0x6bd, { sample_period,
        sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1,
        inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest:
        1, mmap2: 1, comm_exec: 1
        #
      
      We now have a PERF_TYPE_SOFTWARE (type: 1), but the config states 0x6bd,
      which is how, after setting up the event via the kprobes interface, the
      'perf_bpf_probe:fork' event is accessible via the perf_event_open
      syscall. This is all transient, as soon as the 'perf record' session
      ends, these probes will go away.
      
      To see how it looks like, lets try doing a neverending session, one that
      expects a control+C to end:
      
        # perf record --event /tmp/foo.o -a
      
      So, with that in place, we can use 'perf probe' to see what is in place:
      
        # perf probe -l
          perf_bpf_probe:fork  (on _do_fork@acme/git/linux/kernel/fork.c)
      
      We also can use debugfs:
      
        [root@felicio ~]# cat /sys/kernel/debug/tracing/kprobe_events
        p:perf_bpf_probe/fork _text+638512
      
      Ok, now lets stop and see if we got some forks:
      
        [root@felicio linux]# perf record --event /tmp/foo.o -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.325 MB perf.data (111 samples) ]
      
        [root@felicio linux]# perf script
            sshd  1271 [003] 81797.507678: perf_bpf_probe:fork: (ffffffff8109be30)
            sshd 18309 [000] 81797.524917: perf_bpf_probe:fork: (ffffffff8109be30)
            sshd 18309 [001] 81799.381603: perf_bpf_probe:fork: (ffffffff8109be30)
            sshd 18309 [001] 81799.408635: perf_bpf_probe:fork: (ffffffff8109be30)
        <SNIP>
      
      Sure enough, we have 111 forks :-)
      
      Callchains seems to work as well:
      
        # perf report --stdio --no-child
        # To display the perf.data header info, please use --header/--header-only options.
        #
        # Total Lost Samples: 0
        #
        # Samples: 562  of event 'perf_bpf_probe:fork'
        # Event count (approx.): 562
        #
        # Overhead  Command   Shared Object     Symbol
        # ........  ........  ................  ............
        #
            44.66%  sh        [kernel.vmlinux]  [k] _do_fork
                          |
                          ---_do_fork
                             entry_SYSCALL_64_fastpath
                             __libc_fork
                             make_child
      
          26.16%  make      [kernel.vmlinux]  [k] _do_fork
      <SNIP>
        #
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-7-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4edf30e3
    • W
      perf tools: Load eBPF object into kernel · 1e5e3ee8
      Wang Nan 提交于
      This patch utilizes bpf_object__load() provided by libbpf to load all
      objects into kernel.
      
      Committer notes:
      
      Testing it:
      
      When using an incorrect kernel version number, i.e., having this in your
      eBPF proggie:
      
        int _version __attribute__((section("version"), used)) = 0x40100;
      
      For a 4.3.0-rc6+ kernel, say, this happens and needs checking at event
      parsing time, to provide a better error report to the user:
      
        # perf record --event /tmp/foo.o sleep 1
        libbpf: load bpf program failed: Invalid argument
        libbpf: -- BEGIN DUMP LOG ---
        libbpf:
      
        libbpf: -- END LOG --
        libbpf: failed to load program 'fork=_do_fork'
        libbpf: failed to load object '/tmp/foo.o'
        event syntax error: '/tmp/foo.o'
                             \___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      If we instead make it match, i.e. use 0x40300 on this v4.3.0-rc6+
      kernel, the whole process goes thru:
      
        # perf record --event /tmp/foo.o -a usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.202 MB perf.data ]
        # perf evlist -v
        /tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
        sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1,
        inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
        exclude_guest: 1, mmap2: 1, comm_exec: 1
        #
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-6-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1e5e3ee8
  12. 28 10月, 2015 2 次提交
    • W
      perf tools: Create probe points for BPF programs · aa3abf30
      Wang Nan 提交于
      This patch introduces bpf__{un,}probe() functions to enable callers to
      create kprobe points based on section names a BPF program. It parses the
      section names in the program and creates corresponding 'struct
      perf_probe_event' structures. The parse_perf_probe_command() function is
      used to do the main parsing work. The resuling 'struct perf_probe_event'
      is stored into program private data for further using.
      
      By utilizing the new probing API, this patch creates probe points during
      event parsing.
      
      To ensure probe points be removed correctly, register an atexit hook so
      even perf quit through exit() bpf__clear() is still called, so probing
      points are cleared. Note that bpf_clear() should be registered before
      bpf__probe() is called, so failure of bpf__probe() can still trigger
      bpf__clear() to remove probe points which are already probed.
      
      strerror style error reporting scaffold is created by this patch.
      bpf__strerror_probe() is the first error reporting function in
      bpf-loader.c.
      
      Committer note:
      
      Trying it:
      
      To build a test eBPF object file:
      
      I am testing using a script I built from the 'perf test -v LLVM' output:
      
        $ cat ~/bin/hello-ebpf
        export KERNEL_INC_OPTIONS="-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.3/include -I/home/acme/git/linux/arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -I/home/acme/git/linux/include -Iinclude -I/home/acme/git/linux/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -Iinclude/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h"
        export WORKING_DIR=/lib/modules/4.2.0/build
        export CLANG_SOURCE=-
        export CLANG_OPTIONS=-xc
      
        OBJ=/tmp/foo.o
        rm -f $OBJ
        echo '__attribute__((section("fork=do_fork"), used)) int fork(void *ctx) {return 0;} char _license[] __attribute__((section("license"), used)) = "GPL";int _version __attribute__((section("version"), used)) = 0x40100;' | \
        clang -D__KERNEL__ $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o /tmp/foo.o && file $OBJ
      
       ---
      
      First asking to put a probe in a function not present in the kernel
      (misses the initial _):
      
        $ perf record --event /tmp/foo.o sleep 1
        Probe point 'do_fork' not found.
        event syntax error: '/tmp/foo.o'
                             \___ You need to check probing points in BPF file
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
        $
      
       ---
      
      Now, with "__attribute__((section("fork=_do_fork"), used)):
      
       $ grep _do_fork /proc/kallsyms
       ffffffff81099ab0 T _do_fork
       $ perf record --event /tmp/foo.o sleep 1
       Failed to open kprobe_events: Permission denied
       event syntax error: '/tmp/foo.o'
                            \___ Permission denied
      
       ---
      
      Cool, we need to provide some better hints, "kprobe_events" is too low
      level, one doesn't strictly need to know the precise details of how
      these things are put in place, so something that shows the command
      needed to fix the permissions would be more helpful.
      
      Lets try as root instead:
      
        # perf record --event /tmp/foo.o sleep 1
        Lowering default frequency rate to 1000.
        Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data ]
        # perf evlist
        /tmp/foo.o
        [root@felicio ~]# perf evlist -v
        /tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
        sample_freq }: 1000, sample_type: IP|TID|TIME|PERIOD, disabled: 1,
        inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1,
        sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
      
       ---
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-5-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa3abf30
    • W
      perf ebpf: Add the libbpf glue · 69d262a9
      Wang Nan 提交于
      The 'bpf-loader.[ch]' files are introduced in this patch. Which will be
      the interface between perf and libbpf. bpf__prepare_load() resides in
      bpf-loader.c. Following patches will enrich these two files.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1444826502-49291-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69d262a9