1. 10 5月, 2020 19 次提交
    • Y
      tools/bpf: selftests: Add iterator programs for ipv6_route and netlink · 7c128a6b
      Yonghong Song 提交于
      Two bpf programs are added in this patch for netlink and ipv6_route
      target. On my VM, I am able to achieve identical
      results compared to /proc/net/netlink and /proc/net/ipv6_route.
      
        $ cat /proc/net/netlink
        sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
        000000002c42d58b 0   0          00000000 0        0        0     2        0        7
        00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
        00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
        000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
        ....
        00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
        000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
        00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
        000000008398fb08 16  0          00000000 0        0        0     2        0        27
        $ cat /sys/fs/bpf/my_netlink
        sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
        000000002c42d58b 0   0          00000000 0        0        0     2        0        7
        00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
        00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
        000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
        ....
        00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
        000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
        00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
        000000008398fb08 16  0          00000000 0        0        0     2        0        27
      
        $ cat /proc/net/ipv6_route
        fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
        fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        $ cat /sys/fs/bpf/my_ipv6_route
        fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
        fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com
      7c128a6b
    • Y
      tools/bpftool: Add bpf_iter support for bptool · 9406b485
      Yonghong Song 提交于
      Currently, only one command is supported
        bpftool iter pin <bpf_prog.o> <path>
      
      It will pin the trace/iter bpf program in
      the object file <bpf_prog.o> to the <path>
      where <path> should be on a bpffs mount.
      
      For example,
        $ bpftool iter pin ./bpf_iter_ipv6_route.o \
          /sys/fs/bpf/my_route
      User can then do a `cat` to print out the results:
        $ cat /sys/fs/bpf/my_route
          fe800000000000000000000000000000 40 00000000000000000000000000000000 ...
          00000000000000000000000000000000 00 00000000000000000000000000000000 ...
          00000000000000000000000000000001 80 00000000000000000000000000000000 ...
          fe800000000000008c0162fffebdfd57 80 00000000000000000000000000000000 ...
          ff000000000000000000000000000000 08 00000000000000000000000000000000 ...
          00000000000000000000000000000000 00 00000000000000000000000000000000 ...
      
      The implementation for ipv6_route iterator is in one of subsequent
      patches.
      
      This patch also added BPF_LINK_TYPE_ITER to link query.
      
      In the future, we may add additional parameters to pin command
      by parameterizing the bpf iterator. For example, a map_id or pid
      may be added to let bpf program only traverses a single map or task,
      similar to kernel seq_file single_open().
      
      We may also add introspection command for targets/iterators by
      leveraging the bpf_iter itself.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200509175920.2477247-1-yhs@fb.com
      9406b485
    • Y
      tools/libpf: Add offsetof/container_of macro in bpf_helpers.h · 5fbc2208
      Yonghong Song 提交于
      These two helpers will be used later in bpf_iter bpf program
      bpf_iter_netlink.c. Put them in bpf_helpers.h since they could
      be useful in other cases.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175919.2477104-1-yhs@fb.com
      5fbc2208
    • Y
      tools/libbpf: Add bpf_iter support · c09add2f
      Yonghong Song 提交于
      Two new libbpf APIs are added to support bpf_iter:
        - bpf_program__attach_iter
          Given a bpf program and additional parameters, which is
          none now, returns a bpf_link.
        - bpf_iter_create
          syscall level API to create a bpf iterator.
      
      The macro BPF_SEQ_PRINTF are also introduced. The format
      looks like:
        BPF_SEQ_PRINTF(seq, "task id %d\n", pid);
      
      This macro can help bpf program writers with
      nicer bpf_seq_printf syntax similar to the kernel one.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175917.2476936-1-yhs@fb.com
      c09add2f
    • Y
      bpf: Support variable length array in tracing programs · 9c5f8a10
      Yonghong Song 提交于
      In /proc/net/ipv6_route, we have
        struct fib6_info {
          struct fib6_table *fib6_table;
          ...
          struct fib6_nh fib6_nh[0];
        }
        struct fib6_nh {
          struct fib_nh_common nh_common;
          struct rt6_info **rt6i_pcpu;
          struct rt6_exception_bucket *rt6i_exception_bucket;
        };
        struct fib_nh_common {
          ...
          u8 nhc_gw_family;
          ...
        }
      
      The access:
        struct fib6_nh *fib6_nh = &rt->fib6_nh;
        ... fib6_nh->nh_common.nhc_gw_family ...
      
      This patch ensures such an access is handled properly.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com
      9c5f8a10
    • Y
      bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary · 1d68f22b
      Yonghong Song 提交于
      This specifically to handle the case like below:
         // ptr below is a socket ptr identified by PTR_TO_BTF_ID
         u64 param[2] = { ptr, val };
         bpf_seq_printf(seq, fmt, sizeof(fmt), param, sizeof(param));
      
      In this case, the 16 bytes stack for "param" contains:
         8 bytes for ptr with spilled PTR_TO_BTF_ID
         8 bytes for val as STACK_MISC
      
      The current verifier will complain the ptr should not be visible
      to the helper.
         ...
         16: (7b) *(u64 *)(r10 -64) = r2
         18: (7b) *(u64 *)(r10 -56) = r1
         19: (bf) r4 = r10
         ;
         20: (07) r4 += -64
         ; BPF_SEQ_PRINTF(seq, fmt1, (long)s, s->sk_protocol);
         21: (bf) r1 = r6
         22: (18) r2 = 0xffffa8d00018605a
         24: (b4) w3 = 10
         25: (b4) w5 = 16
         26: (85) call bpf_seq_printf#125
          R0=inv(id=0) R1_w=ptr_seq_file(id=0,off=0,imm=0)
          R2_w=map_value(id=0,off=90,ks=4,vs=144,imm=0) R3_w=inv10
          R4_w=fp-64 R5_w=inv16 R6=ptr_seq_file(id=0,off=0,imm=0)
          R7=ptr_netlink_sock(id=0,off=0,imm=0) R10=fp0 fp-56_w=mmmmmmmm
          fp-64_w=ptr_
         last_idx 26 first_idx 13
         regs=8 stack=0 before 25: (b4) w5 = 16
         regs=8 stack=0 before 24: (b4) w3 = 10
         invalid indirect read from stack off -64+0 size 16
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175915.2476783-1-yhs@fb.com
      1d68f22b
    • Y
      bpf: Add bpf_seq_printf and bpf_seq_write helpers · 492e639f
      Yonghong Song 提交于
      Two helpers bpf_seq_printf and bpf_seq_write, are added for
      writing data to the seq_file buffer.
      
      bpf_seq_printf supports common format string flag/width/type
      fields so at least I can get identical results for
      netlink and ipv6_route targets.
      
      For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
      specifically indicates a write failure due to overflow, which
      means the object will be repeated in the next bpf invocation
      if object collection stays the same. Note that if the object
      collection is changed, depending how collection traversal is
      done, even if the object still in the collection, it may not
      be visited.
      
      For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
      read kernel memory. Reading kernel memory may fail in
      the following two cases:
        - invalid kernel address, or
        - valid kernel address but requiring a major fault
      If reading kernel memory failed, the %s string will be
      an empty string and %p{i,I}{4,6} will be all 0.
      Not returning error to bpf program is consistent with
      what bpf_trace_printk() does for now.
      
      bpf_seq_printf may return -EBUSY meaning that internal percpu
      buffer for memory copy of strings or other pointees is
      not available. Bpf program can return 1 to indicate it
      wants the same object to be repeated. Right now, this should not
      happen on no-RT kernels since migrate_disable(), which guards
      bpf prog call, calls preempt_disable().
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
      492e639f
    • Y
      bpf: Add PTR_TO_BTF_ID_OR_NULL support · b121b341
      Yonghong Song 提交于
      Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support.
      For tracing/iter program, the bpf program context
      definition, e.g., for previous bpf_map target, looks like
        struct bpf_iter__bpf_map {
          struct bpf_iter_meta *meta;
          struct bpf_map *map;
        };
      
      The kernel guarantees that meta is not NULL, but
      map pointer maybe NULL. The NULL map indicates that all
      objects have been traversed, so bpf program can take
      proper action, e.g., do final aggregation and/or send
      final report to user space.
      
      Add btf_id_or_null_non0_off to prog->aux structure, to
      indicate that if the context access offset is not 0,
      set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID.
      This bit is set for tracing/iter program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
      b121b341
    • Y
      bpf: Add task and task/file iterator targets · eaaacd23
      Yonghong Song 提交于
      Only the tasks belonging to "current" pid namespace
      are enumerated.
      
      For task/file target, the bpf program will have access to
        struct task_struct *task
        u32 fd
        struct file *file
      where fd/file is an open file for the task.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175911.2476407-1-yhs@fb.com
      eaaacd23
    • Y
      net: bpf: Add netlink and ipv6_route bpf_iter targets · 138d0be3
      Yonghong Song 提交于
      This patch added netlink and ipv6_route targets, using
      the same seq_ops (except show() and minor changes for stop())
      for /proc/net/{netlink,ipv6_route}.
      
      The net namespace for these targets are the current net
      namespace at file open stage, similar to
      /proc/net/{netlink,ipv6_route} reference counting
      the net namespace at seq_file open stage.
      
      Since module is not supported for now, ipv6_route is
      supported only if the IPV6 is built-in, i.e., not compiled
      as a module. The restriction can be lifted once module
      is properly supported for bpf_iter.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
      138d0be3
    • Y
      bpf: Add bpf_map iterator · 6086d29d
      Yonghong Song 提交于
      Implement seq_file operations to traverse all bpf_maps.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com
      6086d29d
    • Y
      bpf: Implement common macros/helpers for target iterators · e5158d98
      Yonghong Song 提交于
      Macro DEFINE_BPF_ITER_FUNC is implemented so target
      can define an init function to capture the BTF type
      which represents the target.
      
      The bpf_iter_meta is a structure holding meta data, common
      to all targets in the bpf program.
      
      Additional marker functions are called before or after
      bpf_seq_read() show()/next()/stop() callback functions
      to help calculate precise seq_num and whether call bpf_prog
      inside stop().
      
      Two functions, bpf_iter_get_info() and bpf_iter_run_prog(),
      are implemented so target can get needed information from
      bpf_iter infrastructure and can run the program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com
      e5158d98
    • Y
      bpf: Create file bpf iterator · 367ec3e4
      Yonghong Song 提交于
      To produce a file bpf iterator, the fd must be
      corresponding to a link_fd assocciated with a
      trace/iter program. When the pinned file is
      opened, a seq_file will be generated.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com
      367ec3e4
    • Y
      bpf: Create anonymous bpf iterator · ac51d99b
      Yonghong Song 提交于
      A new bpf command BPF_ITER_CREATE is added.
      
      The anonymous bpf iterator is seq_file based.
      The seq_file private data are referenced by targets.
      The bpf_iter infrastructure allocated additional space
      at seq_file->private before the space used by targets
      to store some meta data, e.g.,
        prog:       prog to run
        session_id: an unique id for each opened seq_file
        seq_num:    how many times bpf programs are queried in this session
        done_stop:  an internal state to decide whether bpf program
                    should be called in seq_ops->stop() or not
      
      The seq_num will start from 0 for valid objects.
      The bpf program may see the same seq_num more than once if
       - seq_file buffer overflow happens and the same object
         is retried by bpf_seq_read(), or
       - the bpf program explicitly requests a retry of the
         same object
      
      Since module is not supported for bpf_iter, all target
      registeration happens at __init time, so there is no
      need to change bpf_iter_unreg_target() as it is used
      mostly in error path of the init function at which time
      no bpf iterators have been created yet.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
      ac51d99b
    • Y
      bpf: Implement bpf_seq_read() for bpf iterator · fd4f12bc
      Yonghong Song 提交于
      bpf iterator uses seq_file to provide a lossless
      way to transfer data to user space. But we want to call
      bpf program after all objects have been traversed, and
      bpf program may write additional data to the
      seq_file buffer. The current seq_read() does not work
      for this use case.
      
      Besides allowing stop() function to write to the buffer,
      the bpf_seq_read() also fixed the buffer size to one page.
      If any single call of show() or stop() will emit data
      more than one page to cause overflow, -E2BIG error code
      will be returned to user space.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175904.2475468-1-yhs@fb.com
      fd4f12bc
    • Y
      bpf: Support bpf tracing/iter programs for BPF_LINK_UPDATE · 2057c92b
      Yonghong Song 提交于
      Added BPF_LINK_UPDATE support for tracing/iter programs.
      This way, a file based bpf iterator, which holds a reference
      to the link, can have its bpf program updated without
      creating new files.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175902.2475262-1-yhs@fb.com
      2057c92b
    • Y
      bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE · de4e05ca
      Yonghong Song 提交于
      Given a bpf program, the step to create an anonymous bpf iterator is:
        - create a bpf_iter_link, which combines bpf program and the target.
          In the future, there could be more information recorded in the link.
          A link_fd will be returned to the user space.
        - create an anonymous bpf iterator with the given link_fd.
      
      The bpf_iter_link can be pinned to bpffs mount file system to
      create a file based bpf iterator as well.
      
      The benefit to use of bpf_iter_link:
        - using bpf link simplifies design and implementation as bpf link
          is used for other tracing bpf programs.
        - for file based bpf iterator, bpf_iter_link provides a standard
          way to replace underlying bpf programs.
        - for both anonymous and free based iterators, bpf link query
          capability can be leveraged.
      
      The patch added support of tracing/iter programs for BPF_LINK_CREATE.
      A new link type BPF_LINK_TYPE_ITER is added to facilitate link
      querying. Currently, only prog_id is needed, so there is no
      additional in-kernel show_fdinfo() and fill_link_info() hook
      is needed for BPF_LINK_TYPE_ITER link.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
      de4e05ca
    • Y
      bpf: Allow loading of a bpf_iter program · 15d83c4d
      Yonghong Song 提交于
      A bpf_iter program is a tracing program with attach type
      BPF_TRACE_ITER. The load attribute
        attach_btf_id
      is used by the verifier against a particular kernel function,
      which represents a target, e.g., __bpf_iter__bpf_map
      for target bpf_map which is implemented later.
      
      The program return value must be 0 or 1 for now.
        0 : successful, except potential seq_file buffer overflow
            which is handled by seq_file reader.
        1 : request to restart the same object
      
      In the future, other return values may be used for filtering or
      teminating the iterator.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
      15d83c4d
    • Y
      bpf: Implement an interface to register bpf_iter targets · ae24345d
      Yonghong Song 提交于
      The target can call bpf_iter_reg_target() to register itself.
      The needed information:
        target:           target name
        seq_ops:          the seq_file operations for the target
        init_seq_private  target callback to initialize seq_priv during file open
        fini_seq_private  target callback to clean up seq_priv during file release
        seq_priv_size:    the private_data size needed by the seq_file
                          operations
      
      The target name represents a target which provides a seq_ops
      for iterating objects.
      
      The target can provide two callback functions, init_seq_private
      and fini_seq_private, called during file open/release time.
      For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net
      name space needs to be setup properly during file open and
      released properly during file release.
      
      Function bpf_iter_unreg_target() is also implemented to unregister
      a particular target.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com
      ae24345d
  2. 09 5月, 2020 4 次提交
  3. 07 5月, 2020 2 次提交
  4. 06 5月, 2020 6 次提交
  5. 05 5月, 2020 3 次提交
  6. 04 5月, 2020 6 次提交