1. 04 10月, 2018 4 次提交
    • A
      libbpf: Consistent prefixes for interfaces in nlattr.h. · f04bc8a4
      Andrey Ignatov 提交于
      libbpf is used more and more outside kernel tree. That means the library
      should follow good practices in library design and implementation to
      play well with third party code that uses it.
      
      One of such practices is to have a common prefix (or a few) for every
      interface, function or data structure, library provides. I helps to
      avoid name conflicts with other libraries and keeps API consistent.
      
      Inconsistent names in libbpf already cause problems in real life. E.g.
      an application can't use both libbpf and libnl due to conflicting
      symbols.
      
      Having common prefix will help to fix current and avoid future problems.
      
      libbpf already uses the following prefixes for its interfaces:
      * bpf_ for bpf system call wrappers, program/map/elf-object
        abstractions and a few other things;
      * btf_ for BTF related API;
      * libbpf_ for everything else.
      
      The patch adds libbpf_ prefix to interfaces in nlattr.h that use none of
      mentioned above prefixes and doesn't fit well into the first two
      categories.
      
      Since affected part of API is used in bpftool, the patch applies
      corresponding change to bpftool as well. Having it in a separate patch
      will cause a state of tree where bpftool is broken what may not be a
      good idea.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f04bc8a4
    • A
      libbpf: Consistent prefixes for interfaces in libbpf.h. · aae57780
      Andrey Ignatov 提交于
      libbpf is used more and more outside kernel tree. That means the library
      should follow good practices in library design and implementation to
      play well with third party code that uses it.
      
      One of such practices is to have a common prefix (or a few) for every
      interface, function or data structure, library provides. I helps to
      avoid name conflicts with other libraries and keeps API consistent.
      
      Inconsistent names in libbpf already cause problems in real life. E.g.
      an application can't use both libbpf and libnl due to conflicting
      symbols.
      
      Having common prefix will help to fix current and avoid future problems.
      
      libbpf already uses the following prefixes for its interfaces:
      * bpf_ for bpf system call wrappers, program/map/elf-object
        abstractions and a few other things;
      * btf_ for BTF related API;
      * libbpf_ for everything else.
      
      The patch adds libbpf_ prefix to functions and typedef in libbpf.h that
      use none of mentioned above prefixes and doesn't fit well into the first
      two categories.
      
      Since affected part of API is used in bpftool, the patch applies
      corresponding change to bpftool as well. Having it in a separate patch
      will cause a state of tree where bpftool is broken what may not be a
      good idea.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aae57780
    • A
      libbpf: Move __dump_nlmsg_t from API to implementation · 434fe9d4
      Andrey Ignatov 提交于
      This typedef is used only by implementation in netlink.c. Nothing uses
      it in public API. Move it to netlink.c.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      434fe9d4
    • J
      net: core: Fix build with CONFIG_IPV6=m · d71019b5
      Joe Stringer 提交于
      Stephen Rothwell reports the following link failure with IPv6 as module:
      
        x86_64-linux-gnu-ld: net/core/filter.o: in function `sk_lookup':
        (.text+0x19219): undefined reference to `__udp6_lib_lookup'
      
      Fix the build by only enabling the IPv6 socket lookup if IPv6 support is
      compiled into the kernel.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d71019b5
  2. 03 10月, 2018 14 次提交
    • D
      Merge branch 'bpf-sk-lookup' · 33d9a7fd
      Daniel Borkmann 提交于
      Joe Stringer says:
      
      ====================
      This series proposes a new helper for the BPF API which allows BPF programs to
      perform lookups for sockets in a network namespace. This would allow programs
      to determine early on in processing whether the stack is expecting to receive
      the packet, and perform some action (eg drop, forward somewhere) based on this
      information.
      
      The series is structured roughly into:
      * Misc refactor
      * Add the socket pointer type
      * Add reference tracking to ensure that socket references are freed
      * Extend the BPF API to add sk_lookup_xxx() / sk_release() functions
      * Add tests/documentation
      
      The helper proposed in this series includes a parameter for a tuple which must
      be filled in by the caller to determine the socket to look up. The simplest
      case would be filling with the contents of the packet, ie mapping the packet's
      5-tuple into the parameter. In common cases, it may alternatively be useful to
      reverse the direction of the tuple and perform a lookup, to find the socket
      that initiates this connection; and if the BPF program ever performs a form of
      IP address translation, it may further be useful to be able to look up
      arbitrary tuples that are not based upon the packet, but instead based on state
      held in BPF maps or hardcoded in the BPF program.
      
      Currently, access into the socket's fields are limited to those which are
      otherwise already accessible, and are restricted to read-only access.
      
      Changes since v3:
      * New patch: "bpf: Reuse canonical string formatter for ctx errs"
      * Add PTR_TO_SOCKET to is_ctx_reg().
      * Add a few new checks to prevent mixing of socket/non-socket pointers.
      * Swap order of checks in sock_filter_is_valid_access().
      * Prefix register spill macros with "bpf_".
      * Add acks from previous round
      * Rebase
      
      Changes since v2:
      * New patch: "selftests/bpf: Generalize dummy program types".
        This enables adding verifier tests for socket lookup with tail calls.
      * Define the semantics of the new helpers more clearly in uAPI header.
      * Fix release of caller_net when netns is not specified.
      * Use skb->sk to find caller net when skb->dev is unavailable.
      * Fix build with !CONFIG_NET.
      * Replace ptr_id defensive coding when releasing reference state with an
        internal error (-EFAULT).
      * Remove flags argument to sk_release().
      * Add several new assembly tests suggested by Daniel.
      * Add a few new C tests.
      * Fix typo in verifier error message.
      
      Changes since v1:
      * Limit netns_id field to 32 bits
      * Reuse reg_type_mismatch() in more places
      * Reduce the number of passes at convert_ctx_access()
      * Replace ptr_id defensive coding when releasing reference state with an
        internal error (-EFAULT)
      * Rework 'struct bpf_sock_tuple' to allow passing a packet pointer
      * Allow direct packet access from helper
      * Fix compile error with CONFIG_IPV6 enabled
      * Improve commit messages
      
      Changes since RFC:
      * Split up sk_lookup() into sk_lookup_tcp(), sk_lookup_udp().
      * Only take references on the socket when necessary.
        * Make sk_release() only free the socket reference in this case.
      * Fix some runtime reference leaks:
        * Disallow BPF_LD_[ABS|IND] instructions while holding a reference.
        * Disallow bpf_tail_call() while holding a reference.
      * Prevent the same instruction being used for reference and other
        pointer type.
      * Simplify locating copies of a reference during helper calls by caching
        the pointer id from the caller.
      * Fix kbuild compilation warnings with particular configs.
      * Improve code comments describing the new verifier pieces.
      * Tested by Nitin
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      33d9a7fd
    • J
      Documentation: Describe bpf reference tracking · a610b665
      Joe Stringer 提交于
      Document the new pointer types in the verifier and how the pointer ID
      tracking works to ensure that references which are taken are later
      released.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a610b665
    • J
      selftests/bpf: Add C tests for reference tracking · de375f4e
      Joe Stringer 提交于
      Add some tests that demonstrate and test the balanced lookup/free
      nature of socket lookup. Section names that start with "fail" represent
      programs that are expected to fail verification; all others should
      succeed.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      de375f4e
    • J
      libbpf: Support loading individual progs · 29cd77f4
      Joe Stringer 提交于
      Allow the individual program load to be invoked. This will help with
      testing, where a single ELF may contain several sections, some of which
      denote subprograms that are expected to fail verification, along with
      some which are expected to pass verification. By allowing programs to be
      iterated and individually loaded, each program can be independently
      checked against its expected verification result.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      29cd77f4
    • J
      selftests/bpf: Add tests for reference tracking · b584ab88
      Joe Stringer 提交于
      reference tracking: leak potential reference
      reference tracking: leak potential reference on stack
      reference tracking: leak potential reference on stack 2
      reference tracking: zero potential reference
      reference tracking: copy and zero potential references
      reference tracking: release reference without check
      reference tracking: release reference
      reference tracking: release reference twice
      reference tracking: release reference twice inside branch
      reference tracking: alloc, check, free in one subbranch
      reference tracking: alloc, check, free in both subbranches
      reference tracking in call: free reference in subprog
      reference tracking in call: free reference in subprog and outside
      reference tracking in call: alloc & leak reference in subprog
      reference tracking in call: alloc in subprog, release outside
      reference tracking in call: sk_ptr leak into caller stack
      reference tracking in call: sk_ptr spill into caller stack
      reference tracking: allow LD_ABS
      reference tracking: forbid LD_ABS while holding reference
      reference tracking: allow LD_IND
      reference tracking: forbid LD_IND while holding reference
      reference tracking: check reference or tail call
      reference tracking: release reference then tail call
      reference tracking: leak possible reference over tail call
      reference tracking: leak checked reference over tail call
      reference tracking: mangle and release sock_or_null
      reference tracking: mangle and release sock
      reference tracking: access member
      reference tracking: write to member
      reference tracking: invalid 64-bit access of member
      reference tracking: access after release
      reference tracking: direct access for lookup
      unpriv: spill/fill of different pointers stx - ctx and sock
      unpriv: spill/fill of different pointers stx - leak sock
      unpriv: spill/fill of different pointers stx - sock and ctx (read)
      unpriv: spill/fill of different pointers stx - sock and ctx (write)
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b584ab88
    • J
      selftests/bpf: Generalize dummy program types · 0c586079
      Joe Stringer 提交于
      Don't hardcode the dummy program types to SOCKET_FILTER type, as this
      prevents testing bpf_tail_call in conjunction with other program types.
      Instead, use the program type specified in the test case.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0c586079
    • J
      bpf: Add helper to retrieve socket in BPF · 6acc9b43
      Joe Stringer 提交于
      This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and
      bpf_sk_lookup_udp() which allows BPF programs to find out if there is a
      socket listening on this host, and returns a socket pointer which the
      BPF program can then access to determine, for instance, whether to
      forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the
      socket, so when a BPF program makes use of this function, it must
      subsequently pass the returned pointer into the newly added sk_release()
      to return the reference.
      
      By way of example, the following pseudocode would filter inbound
      connections at XDP if there is no corresponding service listening for
      the traffic:
      
        struct bpf_sock_tuple tuple;
        struct bpf_sock_ops *sk;
      
        populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
        sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0);
        if (!sk) {
          // Couldn't find a socket listening for this traffic. Drop.
          return TC_ACT_SHOT;
        }
        bpf_sk_release(sk, 0);
        return TC_ACT_OK;
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6acc9b43
    • J
      bpf: Add reference tracking to verifier · fd978bf7
      Joe Stringer 提交于
      Allow helper functions to acquire a reference and return it into a
      register. Specific pointer types such as the PTR_TO_SOCKET will
      implicitly represent such a reference. The verifier must ensure that
      these references are released exactly once in each path through the
      program.
      
      To achieve this, this commit assigns an id to the pointer and tracks it
      in the 'bpf_func_state', then when the function or program exits,
      verifies that all of the acquired references have been freed. When the
      pointer is passed to a function that frees the reference, it is removed
      from the 'bpf_func_state` and all existing copies of the pointer in
      registers are marked invalid.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fd978bf7
    • J
      bpf: Macrofy stack state copy · 84dbf350
      Joe Stringer 提交于
      An upcoming commit will need very similar copy/realloc boilerplate, so
      refactor the existing stack copy/realloc functions into macros to
      simplify it.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      84dbf350
    • J
      bpf: Add PTR_TO_SOCKET verifier type · c64b7983
      Joe Stringer 提交于
      Teach the verifier a little bit about a new type of pointer, a
      PTR_TO_SOCKET. This pointer type is accessed from BPF through the
      'struct bpf_sock' structure.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c64b7983
    • J
      bpf: Generalize ptr_or_null regs check · 840b9615
      Joe Stringer 提交于
      This check will be reused by an upcoming commit for conditional jump
      checks for sockets. Refactor it a bit to simplify the later commit.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      840b9615
    • J
      bpf: Reuse canonical string formatter for ctx errs · 9d2be44a
      Joe Stringer 提交于
      The array "reg_type_str" provides canonical formatting of register
      types, however a couple of places would previously check whether a
      register represented the context and write the name "context" directly.
      An upcoming commit will add another pointer type to these statements, so
      to provide more accurate error messages in the verifier, update these
      error messages to use "reg_type_str" instead.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9d2be44a
    • J
      bpf: Simplify ptr_min_max_vals adjustment · aad2eeaf
      Joe Stringer 提交于
      An upcoming commit will add another two pointer types that need very
      similar behaviour, so generalise this function now.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aad2eeaf
    • J
      bpf: Add iterator for spilled registers · f3709f69
      Joe Stringer 提交于
      Add this iterator for spilled registers, it concentrates the details of
      how to get the current frame's spilled registers into a single macro
      while clarifying the intention of the code which is calling the macro.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f3709f69
  3. 02 10月, 2018 4 次提交
  4. 01 10月, 2018 11 次提交
  5. 28 9月, 2018 7 次提交
    • Y
      bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id() · 5bf7a60b
      Yonghong Song 提交于
      Currently, helper bpf_get_current_cgroup_id() is not permitted
      for CGROUP_DEVICE type of programs. If the helper is used
      in such cases, the verifier will log the following error:
      
        0: (bf) r6 = r1
        1: (69) r7 = *(u16 *)(r6 +0)
        2: (85) call bpf_get_current_cgroup_id#80
        unknown func bpf_get_current_cgroup_id#80
      
      The bpf_get_current_cgroup_id() is useful for CGROUP_DEVICE
      type of programs in order to customize action based on cgroup id.
      This patch added such a support.
      
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5bf7a60b
    • D
      Merge branch 'bpf-libbpf-attach-by-name' · 78e6e5c1
      Daniel Borkmann 提交于
      Andrey Ignatov says:
      
      ====================
      This patch set introduces libbpf_attach_type_by_name function in libbpf
      to identify attach type by section name.
      
      This is useful to avoid writing same logic over and over again in user
      space applications that leverage libbpf.
      
      Patch 1 has more details on the new function and problem being solved.
      Patches 2 and 3 add support for new section names.
      Patch 4 uses new function in a selftest.
      Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.
      
      As a side note there are a lot of inconsistencies now between names used
      by libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and
      device vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not
      address it but it tries not to make it harder to address it in the future.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      78e6e5c1
    • A
      selftests/bpf: Test libbpf_{prog,attach}_type_by_name · 370920c4
      Andrey Ignatov 提交于
      Add selftest for libbpf functions libbpf_prog_type_by_name and
      libbpf_attach_type_by_name.
      
      Example of output:
        % ./tools/testing/selftests/bpf/test_section_names
        Summary: 35 PASSED, 0 FAILED
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      370920c4
    • A
      selftests/bpf: Use libbpf_attach_type_by_name in test_socket_cookie · c9bf507d
      Andrey Ignatov 提交于
      Use newly introduced libbpf_attach_type_by_name in test_socket_cookie
      selftest.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c9bf507d
    • A
      libbpf: Support sk_skb/stream_{parser, verdict} section names · c6f6851b
      Andrey Ignatov 提交于
      Add section names for BPF_SK_SKB_STREAM_PARSER and
      BPF_SK_SKB_STREAM_VERDICT attach types to be able to identify them in
      libbpf_attach_type_by_name.
      
      "stream_parser" and "stream_verdict" are used instead of simple "parser"
      and "verdict" just to avoid possible confusion in a place where attach
      type is used alone (e.g. in bpftool's show sub-commands) since there is
      another attach point that can be named as "verdict": BPF_SK_MSG_VERDICT.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c6f6851b
    • A
      libbpf: Support cgroup_skb/{e,in}gress section names · bafa7afe
      Andrey Ignatov 提交于
      Add section names for BPF_CGROUP_INET_INGRESS and BPF_CGROUP_INET_EGRESS
      attach types to be able to identify them in libbpf_attach_type_by_name.
      
      "cgroup_skb" is used instead of "cgroup/skb" mostly to easy possible
      unifying of how libbpf and bpftool works with section names:
      * bpftool uses "cgroup_skb" to in "prog list" sub-command;
      * bpftool uses "ingress" and "egress" in "cgroup list" sub-command;
      * having two parts instead of three in a string like "cgroup_skb/ingress"
        can be leveraged to split it to prog_type part and attach_type part,
        or vise versa: use two parts to make a section name.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bafa7afe
    • A
      libbpf: Introduce libbpf_attach_type_by_name · 956b620f
      Andrey Ignatov 提交于
      There is a common use-case when ELF object contains multiple BPF
      programs and every program has its own section name. If it's cgroup-bpf
      then programs have to be 1) loaded and 2) attached to a cgroup.
      
      It's convenient to have information necessary to load BPF program
      together with program itself. This is where section name works fine in
      conjunction with libbpf_prog_type_by_name that identifies prog_type and
      expected_attach_type and these can be used with BPF_PROG_LOAD.
      
      But there is currently no way to identify attach_type by section name
      and it leads to messy code in user space that reinvents guessing logic
      every time it has to identify attach type to use with BPF_PROG_ATTACH.
      
      The patch introduces libbpf_attach_type_by_name that guesses attach type
      by section name if a program can be attached.
      
      The difference between expected_attach_type provided by
      libbpf_prog_type_by_name and attach_type provided by
      libbpf_attach_type_by_name is the former is used at BPF_PROG_LOAD time
      and can be zero if a program of prog_type X has only one corresponding
      attach type Y whether the latter provides specific attach type to use
      with BPF_PROG_ATTACH.
      
      No new section names were added to section_names array. Only existing
      ones were reorganized and attach_type was added where appropriate.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      956b620f