1. 10 5月, 2020 23 次提交
    • S
      bpf, runqslower: include proper uapi/bpf.h · b4563fac
      Song Liu 提交于
      runqslower doesn't specify include path for uapi/bpf.h. This causes the
      following warning:
      
      In file included from runqslower.c:10:
      .../tools/testing/selftests/bpf/tools/include/bpf/bpf.h:234:38:
      warning: 'enum bpf_stats_type' declared inside parameter list will not
      be visible outside of this definition or declaration
        234 | LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type);
      
      Fix this by adding -I tools/includ/uapi to the Makefile.
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b4563fac
    • A
      Merge branch 'bpf_iter' · 180139dc
      Alexei Starovoitov 提交于
      Yonghong Song says:
      
      ====================
      Motivation:
        The current way to dump kernel data structures mostly:
          1. /proc system
          2. various specific tools like "ss" which requires kernel support.
          3. drgn
        The dropback for the first two is that whenever you want to dump more, you
        need change the kernel. For example, Martin wants to dump socket local
        storage with "ss". Kernel change is needed for it to work ([1]).
        This is also the direct motivation for this work.
      
        drgn ([2]) solves this proble nicely and no kernel change is not needed.
        But since drgn is not able to verify the validity of a particular pointer value,
        it might present the wrong results in rare cases.
      
        In this patch set, we introduce bpf iterator. Initial kernel changes are
        still needed for interested kernel data, but a later data structure change
        will not require kernel changes any more. bpf program itself can adapt
        to new data structure changes. This will give certain flexibility with
        guaranteed correctness.
      
        In this patch set, kernel seq_ops is used to facilitate iterating through
        kernel data, similar to current /proc and many other lossless kernel
        dumping facilities. In the future, different iterators can be
        implemented to trade off losslessness for other criteria e.g. no
        repeated object visits, etc.
      
      User Interface:
        1. Similar to prog/map/link, the iterator can be pinned into a
           path within a bpffs mount point.
        2. The bpftool command can pin an iterator to a file
               bpftool iter pin <bpf_prog.o> <path>
        3. Use `cat <path>` to dump the contents.
           Use `rm -f <path>` to remove the pinned iterator.
        4. The anonymous iterator can be created as well.
      
        Please see patch #19 andd #20 for bpf programs and bpf iterator
        output examples.
      
        Note that certain iterators are namespace aware. For example,
        task and task_file targets only iterate through current pid namespace.
        ipv6_route and netlink will iterate through current net namespace.
      
        Please see individual patches for implementation details.
      
      Performance:
        The bpf iterator provides in-kernel aggregation abilities
        for kernel data. This can greatly improve performance
        compared to e.g., iterating all process directories under /proc.
        For example, I did an experiment on my VM with an application forking
        different number of tasks and each forked process opening various number
        of files. The following is the result with the latency with unit of microseconds:
      
          # of forked tasks   # of open files    # of bpf_prog calls  # latency (us)
          100                 100                11503                7586
          1000                1000               1013203              709513
          10000               100                1130203              764519
      
        The number of bpf_prog calls may be more than forked tasks multipled by
        open files since there are other tasks running on the system.
        The bpf program is a do-nothing program. One millions of bpf calls takes
        less than one second.
      
        Although the initial motivation is from Martin's sk_local_storage,
        this patch didn't implement tcp6 sockets and sk_local_storage.
        The /proc/net/tcp6 involves three types of sockets, timewait,
        request and tcp6 sockets. Some kind of type casting or other
        mechanism is needed to handle all these socket types in one
        bpf program. This will be addressed in future work.
      
        Currently, we do not support kernel data generated under module.
        This requires some BTF work.
      
        More work for more iterators, e.g., tcp, udp, bpf_map elements, etc.
      
      Changelog:
        v3 -> v4:
          - in bpf_seq_read(), if start() failed with an error, return that
            error to user space (Andrii)
          - in bpf_seq_printf(), if reading kernel memory failed for
            %s and %p{i,I}{4,6}, set buffer to empty string or address 0.
            Documented this behavior in uapi header (Andrii)
          - fix a few error handling issues for bpftool (Andrii)
          - A few other minor fixes and cosmetic changes.
        v2 -> v3:
          - add bpf_iter_unreg_target() to unregister a target, used in the
            error path of the __init functions.
          - handle err != 0 before handling overflow (Andrii)
          - reference count "task" for task_file target (Andrii)
          - remove some redundancy for bpf_map/task/task_file targets
          - add bpf_iter_unreg_target() in ip6_route_cleanup()
          - Handling "%%" format in bpf_seq_printf() (Andrii)
          - implement auto-attach for bpf_iter in libbpf (Andrii)
          - add macros offsetof and container_of in bpf_helpers.h (Andrii)
          - add tests for auto-attach and program-return-1 cases
          - some other minor fixes
        v1 -> v2:
          - removed target_feature, using callback functions instead
          - checking target to ensure program specified btf_id supported (Martin)
          - link_create change with new changes from Andrii
          - better handling of btf_iter vs. seq_file private data (Martin, Andrii)
          - implemented bpf_seq_read() (Andrii, Alexei)
          - percpu buffer for bpf_seq_printf() (Andrii)
          - better syntax for BPF_SEQ_PRINTF macro (Andrii)
          - bpftool fixes (Quentin)
          - a lot of other fixes
        RFC v2 -> v1:
          - rename bpfdump to bpf_iter
          - use bpffs instead of a new file system
          - use bpf_link to streamline and simplify iterator creation.
      
      References:
        [1]: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
        [2]: https://github.com/osandov/drgn
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      180139dc
    • Y
      tools/bpf: selftests: Add bpf_iter selftests · 6879c042
      Yonghong Song 提交于
      The added test includes the following subtests:
        - test verifier change for btf_id_or_null
        - test load/create_iter/read for
          ipv6_route/netlink/bpf_map/task/task_file
        - test anon bpf iterator
        - test anon bpf iterator reading one char at a time
        - test file bpf iterator
        - test overflow (single bpf program output not overflow)
        - test overflow (single bpf program output overflows)
        - test bpf prog returning 1
      
      The ipv6_route tests the following verifier change
        - access fields in the variable length array of the structure.
      
      The netlink load tests the following verifier change
        - put a btf_id ptr value in a stack and accessible to
          tracing/iter programs.
      
      The anon bpf iterator also tests link auto attach through skeleton.
      
        $ test_progs -n 2
        #2/1 btf_id_or_null:OK
        #2/2 ipv6_route:OK
        #2/3 netlink:OK
        #2/4 bpf_map:OK
        #2/5 task:OK
        #2/6 task_file:OK
        #2/7 anon:OK
        #2/8 anon-read-one-char:OK
        #2/9 file:OK
        #2/10 overflow:OK
        #2/11 overflow-e2big:OK
        #2/12 prog-ret-1:OK
        #2 bpf_iter:OK
        Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com
      6879c042
    • Y
      tools/bpf: selftests: Add iter progs for bpf_map/task/task_file · acf61631
      Yonghong Song 提交于
      The implementation is arbitrary, just to show how the bpf programs
      can be written for bpf_map/task/task_file. They can be costomized
      for specific needs.
      
      For example, for bpf_map, the iterator prints out:
        $ cat /sys/fs/bpf/my_bpf_map
            id   refcnt  usercnt  locked_vm
             3        2        0         20
             6        2        0         20
             9        2        0         20
            12        2        0         20
            13        2        0         20
            16        2        0         20
            19        2        0         20
            %%% END %%%
      
      For task, the iterator prints out:
        $ cat /sys/fs/bpf/my_task
          tgid      gid
             1        1
             2        2
          ....
          1944     1944
          1948     1948
          1949     1949
          1953     1953
          === END ===
      
      For task/file, the iterator prints out:
        $ cat /sys/fs/bpf/my_task_file
          tgid      gid       fd      file
             1        1        0 ffffffff95c97600
             1        1        1 ffffffff95c97600
             1        1        2 ffffffff95c97600
          ....
          1895     1895      255 ffffffff95c8fe00
          1932     1932        0 ffffffff95c8fe00
          1932     1932        1 ffffffff95c8fe00
          1932     1932        2 ffffffff95c8fe00
          1932     1932        3 ffffffff95c185c0
      
      This is able to print out all open files (fd and file->f_op), so user can compare
      f_op against a particular kernel file operations to find what it is.
      For example, from /proc/kallsyms, we can find
        ffffffff95c185c0 r eventfd_fops
      so we will know tgid 1932 fd 3 is an eventfd file descriptor.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com
      acf61631
    • Y
      tools/bpf: selftests: Add iterator programs for ipv6_route and netlink · 7c128a6b
      Yonghong Song 提交于
      Two bpf programs are added in this patch for netlink and ipv6_route
      target. On my VM, I am able to achieve identical
      results compared to /proc/net/netlink and /proc/net/ipv6_route.
      
        $ cat /proc/net/netlink
        sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
        000000002c42d58b 0   0          00000000 0        0        0     2        0        7
        00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
        00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
        000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
        ....
        00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
        000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
        00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
        000000008398fb08 16  0          00000000 0        0        0     2        0        27
        $ cat /sys/fs/bpf/my_netlink
        sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
        000000002c42d58b 0   0          00000000 0        0        0     2        0        7
        00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
        00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
        000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
        ....
        00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
        000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
        00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
        000000008398fb08 16  0          00000000 0        0        0     2        0        27
      
        $ cat /proc/net/ipv6_route
        fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
        fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        $ cat /sys/fs/bpf/my_ipv6_route
        fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
        fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com
      7c128a6b
    • Y
      tools/bpftool: Add bpf_iter support for bptool · 9406b485
      Yonghong Song 提交于
      Currently, only one command is supported
        bpftool iter pin <bpf_prog.o> <path>
      
      It will pin the trace/iter bpf program in
      the object file <bpf_prog.o> to the <path>
      where <path> should be on a bpffs mount.
      
      For example,
        $ bpftool iter pin ./bpf_iter_ipv6_route.o \
          /sys/fs/bpf/my_route
      User can then do a `cat` to print out the results:
        $ cat /sys/fs/bpf/my_route
          fe800000000000000000000000000000 40 00000000000000000000000000000000 ...
          00000000000000000000000000000000 00 00000000000000000000000000000000 ...
          00000000000000000000000000000001 80 00000000000000000000000000000000 ...
          fe800000000000008c0162fffebdfd57 80 00000000000000000000000000000000 ...
          ff000000000000000000000000000000 08 00000000000000000000000000000000 ...
          00000000000000000000000000000000 00 00000000000000000000000000000000 ...
      
      The implementation for ipv6_route iterator is in one of subsequent
      patches.
      
      This patch also added BPF_LINK_TYPE_ITER to link query.
      
      In the future, we may add additional parameters to pin command
      by parameterizing the bpf iterator. For example, a map_id or pid
      may be added to let bpf program only traverses a single map or task,
      similar to kernel seq_file single_open().
      
      We may also add introspection command for targets/iterators by
      leveraging the bpf_iter itself.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200509175920.2477247-1-yhs@fb.com
      9406b485
    • Y
      tools/libpf: Add offsetof/container_of macro in bpf_helpers.h · 5fbc2208
      Yonghong Song 提交于
      These two helpers will be used later in bpf_iter bpf program
      bpf_iter_netlink.c. Put them in bpf_helpers.h since they could
      be useful in other cases.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175919.2477104-1-yhs@fb.com
      5fbc2208
    • Y
      tools/libbpf: Add bpf_iter support · c09add2f
      Yonghong Song 提交于
      Two new libbpf APIs are added to support bpf_iter:
        - bpf_program__attach_iter
          Given a bpf program and additional parameters, which is
          none now, returns a bpf_link.
        - bpf_iter_create
          syscall level API to create a bpf iterator.
      
      The macro BPF_SEQ_PRINTF are also introduced. The format
      looks like:
        BPF_SEQ_PRINTF(seq, "task id %d\n", pid);
      
      This macro can help bpf program writers with
      nicer bpf_seq_printf syntax similar to the kernel one.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175917.2476936-1-yhs@fb.com
      c09add2f
    • Y
      bpf: Support variable length array in tracing programs · 9c5f8a10
      Yonghong Song 提交于
      In /proc/net/ipv6_route, we have
        struct fib6_info {
          struct fib6_table *fib6_table;
          ...
          struct fib6_nh fib6_nh[0];
        }
        struct fib6_nh {
          struct fib_nh_common nh_common;
          struct rt6_info **rt6i_pcpu;
          struct rt6_exception_bucket *rt6i_exception_bucket;
        };
        struct fib_nh_common {
          ...
          u8 nhc_gw_family;
          ...
        }
      
      The access:
        struct fib6_nh *fib6_nh = &rt->fib6_nh;
        ... fib6_nh->nh_common.nhc_gw_family ...
      
      This patch ensures such an access is handled properly.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com
      9c5f8a10
    • Y
      bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary · 1d68f22b
      Yonghong Song 提交于
      This specifically to handle the case like below:
         // ptr below is a socket ptr identified by PTR_TO_BTF_ID
         u64 param[2] = { ptr, val };
         bpf_seq_printf(seq, fmt, sizeof(fmt), param, sizeof(param));
      
      In this case, the 16 bytes stack for "param" contains:
         8 bytes for ptr with spilled PTR_TO_BTF_ID
         8 bytes for val as STACK_MISC
      
      The current verifier will complain the ptr should not be visible
      to the helper.
         ...
         16: (7b) *(u64 *)(r10 -64) = r2
         18: (7b) *(u64 *)(r10 -56) = r1
         19: (bf) r4 = r10
         ;
         20: (07) r4 += -64
         ; BPF_SEQ_PRINTF(seq, fmt1, (long)s, s->sk_protocol);
         21: (bf) r1 = r6
         22: (18) r2 = 0xffffa8d00018605a
         24: (b4) w3 = 10
         25: (b4) w5 = 16
         26: (85) call bpf_seq_printf#125
          R0=inv(id=0) R1_w=ptr_seq_file(id=0,off=0,imm=0)
          R2_w=map_value(id=0,off=90,ks=4,vs=144,imm=0) R3_w=inv10
          R4_w=fp-64 R5_w=inv16 R6=ptr_seq_file(id=0,off=0,imm=0)
          R7=ptr_netlink_sock(id=0,off=0,imm=0) R10=fp0 fp-56_w=mmmmmmmm
          fp-64_w=ptr_
         last_idx 26 first_idx 13
         regs=8 stack=0 before 25: (b4) w5 = 16
         regs=8 stack=0 before 24: (b4) w3 = 10
         invalid indirect read from stack off -64+0 size 16
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175915.2476783-1-yhs@fb.com
      1d68f22b
    • Y
      bpf: Add bpf_seq_printf and bpf_seq_write helpers · 492e639f
      Yonghong Song 提交于
      Two helpers bpf_seq_printf and bpf_seq_write, are added for
      writing data to the seq_file buffer.
      
      bpf_seq_printf supports common format string flag/width/type
      fields so at least I can get identical results for
      netlink and ipv6_route targets.
      
      For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
      specifically indicates a write failure due to overflow, which
      means the object will be repeated in the next bpf invocation
      if object collection stays the same. Note that if the object
      collection is changed, depending how collection traversal is
      done, even if the object still in the collection, it may not
      be visited.
      
      For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
      read kernel memory. Reading kernel memory may fail in
      the following two cases:
        - invalid kernel address, or
        - valid kernel address but requiring a major fault
      If reading kernel memory failed, the %s string will be
      an empty string and %p{i,I}{4,6} will be all 0.
      Not returning error to bpf program is consistent with
      what bpf_trace_printk() does for now.
      
      bpf_seq_printf may return -EBUSY meaning that internal percpu
      buffer for memory copy of strings or other pointees is
      not available. Bpf program can return 1 to indicate it
      wants the same object to be repeated. Right now, this should not
      happen on no-RT kernels since migrate_disable(), which guards
      bpf prog call, calls preempt_disable().
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
      492e639f
    • Y
      bpf: Add PTR_TO_BTF_ID_OR_NULL support · b121b341
      Yonghong Song 提交于
      Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support.
      For tracing/iter program, the bpf program context
      definition, e.g., for previous bpf_map target, looks like
        struct bpf_iter__bpf_map {
          struct bpf_iter_meta *meta;
          struct bpf_map *map;
        };
      
      The kernel guarantees that meta is not NULL, but
      map pointer maybe NULL. The NULL map indicates that all
      objects have been traversed, so bpf program can take
      proper action, e.g., do final aggregation and/or send
      final report to user space.
      
      Add btf_id_or_null_non0_off to prog->aux structure, to
      indicate that if the context access offset is not 0,
      set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID.
      This bit is set for tracing/iter program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
      b121b341
    • Y
      bpf: Add task and task/file iterator targets · eaaacd23
      Yonghong Song 提交于
      Only the tasks belonging to "current" pid namespace
      are enumerated.
      
      For task/file target, the bpf program will have access to
        struct task_struct *task
        u32 fd
        struct file *file
      where fd/file is an open file for the task.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175911.2476407-1-yhs@fb.com
      eaaacd23
    • Y
      net: bpf: Add netlink and ipv6_route bpf_iter targets · 138d0be3
      Yonghong Song 提交于
      This patch added netlink and ipv6_route targets, using
      the same seq_ops (except show() and minor changes for stop())
      for /proc/net/{netlink,ipv6_route}.
      
      The net namespace for these targets are the current net
      namespace at file open stage, similar to
      /proc/net/{netlink,ipv6_route} reference counting
      the net namespace at seq_file open stage.
      
      Since module is not supported for now, ipv6_route is
      supported only if the IPV6 is built-in, i.e., not compiled
      as a module. The restriction can be lifted once module
      is properly supported for bpf_iter.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
      138d0be3
    • Y
      bpf: Add bpf_map iterator · 6086d29d
      Yonghong Song 提交于
      Implement seq_file operations to traverse all bpf_maps.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com
      6086d29d
    • Y
      bpf: Implement common macros/helpers for target iterators · e5158d98
      Yonghong Song 提交于
      Macro DEFINE_BPF_ITER_FUNC is implemented so target
      can define an init function to capture the BTF type
      which represents the target.
      
      The bpf_iter_meta is a structure holding meta data, common
      to all targets in the bpf program.
      
      Additional marker functions are called before or after
      bpf_seq_read() show()/next()/stop() callback functions
      to help calculate precise seq_num and whether call bpf_prog
      inside stop().
      
      Two functions, bpf_iter_get_info() and bpf_iter_run_prog(),
      are implemented so target can get needed information from
      bpf_iter infrastructure and can run the program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com
      e5158d98
    • Y
      bpf: Create file bpf iterator · 367ec3e4
      Yonghong Song 提交于
      To produce a file bpf iterator, the fd must be
      corresponding to a link_fd assocciated with a
      trace/iter program. When the pinned file is
      opened, a seq_file will be generated.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com
      367ec3e4
    • Y
      bpf: Create anonymous bpf iterator · ac51d99b
      Yonghong Song 提交于
      A new bpf command BPF_ITER_CREATE is added.
      
      The anonymous bpf iterator is seq_file based.
      The seq_file private data are referenced by targets.
      The bpf_iter infrastructure allocated additional space
      at seq_file->private before the space used by targets
      to store some meta data, e.g.,
        prog:       prog to run
        session_id: an unique id for each opened seq_file
        seq_num:    how many times bpf programs are queried in this session
        done_stop:  an internal state to decide whether bpf program
                    should be called in seq_ops->stop() or not
      
      The seq_num will start from 0 for valid objects.
      The bpf program may see the same seq_num more than once if
       - seq_file buffer overflow happens and the same object
         is retried by bpf_seq_read(), or
       - the bpf program explicitly requests a retry of the
         same object
      
      Since module is not supported for bpf_iter, all target
      registeration happens at __init time, so there is no
      need to change bpf_iter_unreg_target() as it is used
      mostly in error path of the init function at which time
      no bpf iterators have been created yet.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
      ac51d99b
    • Y
      bpf: Implement bpf_seq_read() for bpf iterator · fd4f12bc
      Yonghong Song 提交于
      bpf iterator uses seq_file to provide a lossless
      way to transfer data to user space. But we want to call
      bpf program after all objects have been traversed, and
      bpf program may write additional data to the
      seq_file buffer. The current seq_read() does not work
      for this use case.
      
      Besides allowing stop() function to write to the buffer,
      the bpf_seq_read() also fixed the buffer size to one page.
      If any single call of show() or stop() will emit data
      more than one page to cause overflow, -E2BIG error code
      will be returned to user space.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175904.2475468-1-yhs@fb.com
      fd4f12bc
    • Y
      bpf: Support bpf tracing/iter programs for BPF_LINK_UPDATE · 2057c92b
      Yonghong Song 提交于
      Added BPF_LINK_UPDATE support for tracing/iter programs.
      This way, a file based bpf iterator, which holds a reference
      to the link, can have its bpf program updated without
      creating new files.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175902.2475262-1-yhs@fb.com
      2057c92b
    • Y
      bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE · de4e05ca
      Yonghong Song 提交于
      Given a bpf program, the step to create an anonymous bpf iterator is:
        - create a bpf_iter_link, which combines bpf program and the target.
          In the future, there could be more information recorded in the link.
          A link_fd will be returned to the user space.
        - create an anonymous bpf iterator with the given link_fd.
      
      The bpf_iter_link can be pinned to bpffs mount file system to
      create a file based bpf iterator as well.
      
      The benefit to use of bpf_iter_link:
        - using bpf link simplifies design and implementation as bpf link
          is used for other tracing bpf programs.
        - for file based bpf iterator, bpf_iter_link provides a standard
          way to replace underlying bpf programs.
        - for both anonymous and free based iterators, bpf link query
          capability can be leveraged.
      
      The patch added support of tracing/iter programs for BPF_LINK_CREATE.
      A new link type BPF_LINK_TYPE_ITER is added to facilitate link
      querying. Currently, only prog_id is needed, so there is no
      additional in-kernel show_fdinfo() and fill_link_info() hook
      is needed for BPF_LINK_TYPE_ITER link.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
      de4e05ca
    • Y
      bpf: Allow loading of a bpf_iter program · 15d83c4d
      Yonghong Song 提交于
      A bpf_iter program is a tracing program with attach type
      BPF_TRACE_ITER. The load attribute
        attach_btf_id
      is used by the verifier against a particular kernel function,
      which represents a target, e.g., __bpf_iter__bpf_map
      for target bpf_map which is implemented later.
      
      The program return value must be 0 or 1 for now.
        0 : successful, except potential seq_file buffer overflow
            which is handled by seq_file reader.
        1 : request to restart the same object
      
      In the future, other return values may be used for filtering or
      teminating the iterator.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
      15d83c4d
    • Y
      bpf: Implement an interface to register bpf_iter targets · ae24345d
      Yonghong Song 提交于
      The target can call bpf_iter_reg_target() to register itself.
      The needed information:
        target:           target name
        seq_ops:          the seq_file operations for the target
        init_seq_private  target callback to initialize seq_priv during file open
        fini_seq_private  target callback to clean up seq_priv during file release
        seq_priv_size:    the private_data size needed by the seq_file
                          operations
      
      The target name represents a target which provides a seq_ops
      for iterating objects.
      
      The target can provide two callback functions, init_seq_private
      and fini_seq_private, called during file open/release time.
      For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net
      name space needs to be setup properly during file open and
      released properly during file release.
      
      Function bpf_iter_unreg_target() is also implemented to unregister
      a particular target.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com
      ae24345d
  2. 09 5月, 2020 4 次提交
  3. 07 5月, 2020 2 次提交
  4. 06 5月, 2020 6 次提交
  5. 05 5月, 2020 3 次提交
  6. 04 5月, 2020 2 次提交