1. 14 3月, 2020 7 次提交
  2. 13 3月, 2020 1 次提交
  3. 11 3月, 2020 1 次提交
    • A
      bpf: Add bpf_link_new_file that doesn't install FD · babf3164
      Andrii Nakryiko 提交于
      Add bpf_link_new_file() API for cases when we need to ensure anon_inode is
      successfully created before we proceed with expensive BPF program attachment
      procedure, which will require equally (if not more so) expensive and
      potentially failing compensation detachment procedure just because anon_inode
      creation failed. This API allows to simplify code by ensuring first that
      anon_inode is created and after BPF program is attached proceed with
      fd_install() that can't fail.
      
      After anon_inode file is created, link can't be just kfree()'d anymore,
      because its destruction will be performed by deferred file_operations->release
      call. For this, bpf_link API required specifying two separate operations:
      release() and dealloc(), former performing detachment only, while the latter
      frees memory used by bpf_link itself. dealloc() needs to be specified, because
      struct bpf_link is frequently embedded into link type-specific container
      struct (e.g., struct bpf_raw_tp_link), so bpf_link itself doesn't know how to
      properly free the memory. In case when anon_inode file was successfully
      created, but subsequent BPF attachment failed, bpf_link needs to be marked as
      "defunct", so that file's release() callback will perform only memory
      deallocation, but no detachment.
      
      Convert raw tracepoint and tracing attachment to new API and eliminate
      detachment from error handling path.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200309231051.1270337-1-andriin@fb.com
      babf3164
  4. 10 3月, 2020 1 次提交
  5. 05 3月, 2020 3 次提交
  6. 03 3月, 2020 1 次提交
    • A
      bpf: Introduce pinnable bpf_link abstraction · 70ed506c
      Andrii Nakryiko 提交于
      Introduce bpf_link abstraction, representing an attachment of BPF program to
      a BPF hook point (e.g., tracepoint, perf event, etc). bpf_link encapsulates
      ownership of attached BPF program, reference counting of a link itself, when
      reference from multiple anonymous inodes, as well as ensures that release
      callback will be called from a process context, so that users can safely take
      mutex locks and sleep.
      
      Additionally, with a new abstraction it's now possible to generalize pinning
      of a link object in BPF FS, allowing to explicitly prevent BPF program
      detachment on process exit by pinning it in a BPF FS and let it open from
      independent other process to keep working with it.
      
      Convert two existing bpf_link-like objects (raw tracepoint and tracing BPF
      program attachments) into utilizing bpf_link framework, making them pinnable
      in BPF FS. More FD-based bpf_links will be added in follow up patches.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200303043159.323675-2-andriin@fb.com
      70ed506c
  7. 28 2月, 2020 2 次提交
    • M
      bpf: INET_DIAG support in bpf_sk_storage · 1ed4d924
      Martin KaFai Lau 提交于
      This patch adds INET_DIAG support to bpf_sk_storage.
      
      1. Although this series adds bpf_sk_storage diag capability to inet sk,
         bpf_sk_storage is in general applicable to all fullsock.  Hence, the
         bpf_sk_storage logic will operate on SK_DIAG_* nlattr.  The caller
         will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
         the argument.
      
      2. The request will be like:
      	INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
      		......
      
         Considering there could have multiple bpf_sk_storages in a sk,
         instead of reusing INET_DIAG_INFO ("ss -i"),  the user can select
         some specific bpf_sk_storage to dump by specifying an array of
         SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
      
         If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
         INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
         of a sk.
      
      3. The reply will be like:
      	INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		SK_DIAG_BPF_STORAGE (nla_nest)
      			SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
      			SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
      		......
      
      4. Unlike other INET_DIAG info of a sk which is pretty static, the size
         required to dump the bpf_sk_storage(s) of a sk is dynamic as the
         system adding more bpf_sk_storage_map.  It is hard to set a static
         min_dump_alloc size.
      
         Hence, this series learns it at the runtime and adjust the
         cb->min_dump_alloc as it iterates all sk(s) of a system.  The
         "unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
         is for this purpose.
      
         The next patch will update the cb->min_dump_alloc as it
         iterates the sk(s).
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200225230421.1975729-1-kafai@fb.com
      1ed4d924
    • G
      bpf: Replace zero-length array with flexible-array member · d7f10df8
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
      d7f10df8
  8. 25 2月, 2020 2 次提交
  9. 29 1月, 2020 1 次提交
  10. 25 1月, 2020 1 次提交
  11. 23 1月, 2020 2 次提交
    • M
      bpf: Add BPF_FUNC_jiffies64 · 5576b991
      Martin KaFai Lau 提交于
      This patch adds a helper to read the 64bit jiffies.  It will be used
      in a later patch to implement the bpf_cubic.c.
      
      The helper is inlined for jit_requested and 64 BITS_PER_LONG
      as the map_gen_lookup().  Other cases could be considered together
      with map_gen_lookup() if needed.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
      5576b991
    • A
      bpf: Introduce dynamic program extensions · be8704ff
      Alexei Starovoitov 提交于
      Introduce dynamic program extensions. The users can load additional BPF
      functions and replace global functions in previously loaded BPF programs while
      these programs are executing.
      
      Global functions are verified individually by the verifier based on their types only.
      Hence the global function in the new program which types match older function can
      safely replace that corresponding function.
      
      This new function/program is called 'an extension' of old program. At load time
      the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
      to be replaced. The BPF program type is derived from the target program into
      extension program. Technically bpf_verifier_ops is copied from target program.
      The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
      The extension program can call the same bpf helper functions as target program.
      Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
      types. The verifier allows only one level of replacement. Meaning that the
      extension program cannot recursively extend an extension. That also means that
      the maximum stack size is increasing from 512 to 1024 bytes and maximum
      function nesting level from 8 to 16. The programs don't always consume that
      much. The stack usage is determined by the number of on-stack variables used by
      the program. The verifier could have enforced 512 limit for combined original
      plus extension program, but it makes for difficult user experience. The main
      use case for extensions is to provide generic mechanism to plug external
      programs into policy program or function call chaining.
      
      BPF trampoline is used to track both fentry/fexit and program extensions
      because both are using the same nop slot at the beginning of every BPF
      function. Attaching fentry/fexit to a function that was replaced is not
      allowed. The opposite is true as well. Replacing a function that currently
      being analyzed with fentry/fexit is not allowed. The executable page allocated
      by BPF trampoline is not used by program extensions. This inefficiency will be
      optimized in future patches.
      
      Function by function verification of global function supports scalars and
      pointer to context only. Hence program extensions are supported for such class
      of global functions only. In the future the verifier will be extended with
      support to pointers to structures, arrays with sizes, etc.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
      be8704ff
  12. 17 1月, 2020 1 次提交
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
  13. 16 1月, 2020 3 次提交
    • Y
      bpf: Add batch ops to all htab bpf map · 05799638
      Yonghong Song 提交于
      htab can't use generic batch support due some problematic behaviours
      inherent to the data structre, i.e. while iterating the bpf map  a
      concurrent program might delete the next entry that batch was about to
      use, in that case there's no easy solution to retrieve the next entry,
      the issue has been discussed multiple times (see [1] and [2]).
      
      The only way hmap can be traversed without the problem previously
      exposed is by making sure that the map is traversing entire buckets.
      This commit implements those strict requirements for hmap, the
      implementation follows the same interaction that generic support with
      some exceptions:
      
       - If keys/values buffer are not big enough to traverse a bucket,
         ENOSPC will be returned.
       - out_batch contains the value of the next bucket in the iteration, not
         the next key, but this is transparent for the user since the user
         should never use out_batch for other than bpf batch syscalls.
      
      This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new
      command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete
      batch ops it is possible to use the generic implementations.
      
      [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
      [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
      05799638
    • B
      bpf: Add generic support for update and delete batch ops · aa2e93b8
      Brian Vazquez 提交于
      This commit adds generic support for update and delete batch ops that
      can be used for almost all the bpf maps. These commands share the same
      UAPI attr that lookup and lookup_and_delete batch ops use and the
      syscall commands are:
      
        BPF_MAP_UPDATE_BATCH
        BPF_MAP_DELETE_BATCH
      
      The main difference between update/delete and lookup batch ops is that
      for update/delete keys/values must be specified for userspace and
      because of that, neither in_batch nor out_batch are used.
      Suggested-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
      aa2e93b8
    • B
      bpf: Add generic support for lookup batch op · cb4d03ab
      Brian Vazquez 提交于
      This commit introduces generic support for the bpf_map_lookup_batch.
      This implementation can be used by almost all the bpf maps since its core
      implementation is relying on the existing map_get_next_key and
      map_lookup_elem. The bpf syscall subcommand introduced is:
      
        BPF_MAP_LOOKUP_BATCH
      
      The UAPI attribute is:
      
        struct { /* struct used by BPF_MAP_*_BATCH commands */
               __aligned_u64   in_batch;       /* start batch,
                                                * NULL to start from beginning
                                                */
               __aligned_u64   out_batch;      /* output: next start batch */
               __aligned_u64   keys;
               __aligned_u64   values;
               __u32           count;          /* input/output:
                                                * input: # of key/value
                                                * elements
                                                * output: # of filled elements
                                                */
               __u32           map_fd;
               __u64           elem_flags;
               __u64           flags;
        } batch;
      
      in_batch/out_batch are opaque values use to communicate between
      user/kernel space, in_batch/out_batch must be of key_size length.
      
      To start iterating from the beginning in_batch must be null,
      count is the # of key/value elements to retrieve. Note that the 'keys'
      buffer must be a buffer of key_size * count size and the 'values' buffer
      must be value_size * count, where value_size must be aligned to 8 bytes
      by userspace if it's dealing with percpu maps. 'count' will contain the
      number of keys/values successfully retrieved. Note that 'count' is an
      input/output variable and it can contain a lower value after a call.
      
      If there's no more entries to retrieve, ENOENT will be returned. If error
      is ENOENT, count might be > 0 in case it copied some values but there were
      no more entries to retrieve.
      
      Note that if the return code is an error and not -EFAULT,
      count indicates the number of elements successfully processed.
      Suggested-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
      cb4d03ab
  14. 11 1月, 2020 1 次提交
    • A
      bpf: Introduce function-by-function verification · 51c39bb1
      Alexei Starovoitov 提交于
      New llvm and old llvm with libbpf help produce BTF that distinguish global and
      static functions. Unlike arguments of static function the arguments of global
      functions cannot be removed or optimized away by llvm. The compiler has to use
      exactly the arguments specified in a function prototype. The argument type
      information allows the verifier validate each global function independently.
      For now only supported argument types are pointer to context and scalars. In
      the future pointers to structures, sizes, pointer to packet data can be
      supported as well. Consider the following example:
      
      static int f1(int ...)
      {
        ...
      }
      
      int f3(int b);
      
      int f2(int a)
      {
        f1(a) + f3(a);
      }
      
      int f3(int b)
      {
        ...
      }
      
      int main(...)
      {
        f1(...) + f2(...) + f3(...);
      }
      
      The verifier will start its safety checks from the first global function f2().
      It will recursively descend into f1() because it's static. Then it will check
      that arguments match for the f3() invocation inside f2(). It will not descend
      into f3(). It will finish f2() that has to be successfully verified for all
      possible values of 'a'. Then it will proceed with f3(). That function also has
      to be safe for all possible values of 'b'. Then it will start subprog 0 (which
      is main() function). It will recursively descend into f1() and will skip full
      check of f2() and f3(), since they are global. The order of processing global
      functions doesn't affect safety, since all global functions must be proven safe
      based on their arguments only.
      
      Such function by function verification can drastically improve speed of the
      verification and reduce complexity.
      
      Note that the stack limit of 512 still applies to the call chain regardless whether
      functions were static or global. The nested level of 8 also still applies. The
      same recursion prevention checks are in place as well.
      
      The type information and static/global kind is preserved after the verification
      hence in the above example global function f2() and f3() can be replaced later
      by equivalent functions with the same types that are loaded and verified later
      without affecting safety of this main() program. Such replacement (re-linking)
      of global functions is a subject of future patches.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
      51c39bb1
  15. 10 1月, 2020 2 次提交
    • M
      bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS · 85d33df3
      Martin KaFai Lau 提交于
      The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
      is a kernel struct with its func ptr implemented in bpf prog.
      This new map is the interface to register/unregister/introspect
      a bpf implemented kernel struct.
      
      The kernel struct is actually embedded inside another new struct
      (or called the "value" struct in the code).  For example,
      "struct tcp_congestion_ops" is embbeded in:
      struct bpf_struct_ops_tcp_congestion_ops {
      	refcount_t refcnt;
      	enum bpf_struct_ops_state state;
      	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
      }
      The map value is "struct bpf_struct_ops_tcp_congestion_ops".
      The "bpftool map dump" will then be able to show the
      state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
      number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
      is created automatically by a macro.  Having a separate "value" struct
      will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
      "void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
      initialization works before registering the struct_ops to the kernel
      subsystem).  The libbpf will take care of finding and populating the
      "struct bpf_struct_ops_XYZ" from "struct XYZ".
      
      Register a struct_ops to a kernel subsystem:
      1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
      2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
         set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
         running kernel.
         Instead of reusing the attr->btf_value_type_id,
         btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
         used as the "user" btf which could store other useful sysadmin/debug
         info that may be introduced in the furture,
         e.g. creation-date/compiler-details/map-creator...etc.
      3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
         in the running kernel btf.  Populate the value of this object.
         The function ptr should be populated with the prog fds.
      4. Call BPF_MAP_UPDATE with the object created in (3) as
         the map value.  The key is always "0".
      
      During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
      args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
      the specific struct_ops to do some final checks in "st_ops->init_member()"
      (e.g. ensure all mandatory func ptrs are implemented).
      If everything looks good, it will register this kernel struct
      to the kernel subsystem.  The map will not allow further update
      from this point.
      
      Unregister a struct_ops from the kernel subsystem:
      BPF_MAP_DELETE with key "0".
      
      Introspect a struct_ops:
      BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
      have the prog _id_ populated as the func ptr.
      
      The map value state (enum bpf_struct_ops_state) will transit from:
      INIT (map created) =>
      INUSE (map updated, i.e. reg) =>
      TOBEFREE (map value deleted, i.e. unreg)
      
      The kernel subsystem needs to call bpf_struct_ops_get() and
      bpf_struct_ops_put() to manage the "refcnt" in the
      "struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
      for the purose of tracking the subsystem usage.  Another approach
      is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
      the subsystem's usage by doing map->refcnt - map->usercnt to filter out
      the map-fd/pinned-map usage.  However, that will also tie down the
      future semantics of map->refcnt and map->usercnt.
      
      The very first subsystem's refcnt (during reg()) holds one
      count to map->refcnt.  When the very last subsystem's refcnt
      is gone, it will also release the map->refcnt.  All bpf_prog will be
      freed when the map->refcnt reaches 0 (i.e. during map_free()).
      
      Here is how the bpftool map command will look like:
      [root@arch-fb-vm1 bpf]# bpftool map show
      6: struct_ops  name dctcp  flags 0x0
      	key 4B  value 256B  max_entries 1  memlock 4096B
      	btf_id 6
      [root@arch-fb-vm1 bpf]# bpftool map dump id 6
      [{
              "value": {
                  "refcnt": {
                      "refs": {
                          "counter": 1
                      }
                  },
                  "state": 1,
                  "data": {
                      "list": {
                          "next": 0,
                          "prev": 0
                      },
                      "key": 0,
                      "flags": 2,
                      "init": 24,
                      "release": 0,
                      "ssthresh": 25,
                      "cong_avoid": 30,
                      "set_state": 27,
                      "cwnd_event": 28,
                      "in_ack_event": 26,
                      "undo_cwnd": 29,
                      "pkts_acked": 0,
                      "min_tso_segs": 0,
                      "sndbuf_expand": 0,
                      "cong_control": 0,
                      "get_info": 0,
                      "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                      ],
                      "owner": 0
                  }
              }
          }
      ]
      
      Misc Notes:
      * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
        It does an inplace update on "*value" instead returning a pointer
        to syscall.c.  Otherwise, it needs a separate copy of "zero" value
        for the BPF_STRUCT_OPS_STATE_INIT to avoid races.
      
      * The bpf_struct_ops_map_delete_elem() is also called without
        preempt_disable() from map_delete_elem().  It is because
        the "->unreg()" may requires sleepable context, e.g.
        the "tcp_unregister_congestion_control()".
      
      * "const" is added to some of the existing "struct btf_func_model *"
        function arg to avoid a compiler warning caused by this patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
      85d33df3
    • M
      bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS · 27ae7997
      Martin KaFai Lau 提交于
      This patch allows the kernel's struct ops (i.e. func ptr) to be
      implemented in BPF.  The first use case in this series is the
      "struct tcp_congestion_ops" which will be introduced in a
      latter patch.
      
      This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
      The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
      func ptr of a kernel struct.  The attr->attach_btf_id is the btf id
      of a kernel struct.  The attr->expected_attach_type is the member
      "index" of that kernel struct.  The first member of a struct starts
      with member index 0.  That will avoid ambiguity when a kernel struct
      has multiple func ptrs with the same func signature.
      
      For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
      to implement the "init" func ptr of the "struct tcp_congestion_ops".
      The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
      of the _running_ kernel.  The attr->expected_attach_type is 3.
      
      The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
      by arch_prepare_bpf_trampoline that will be done in the next
      patch when introducing BPF_MAP_TYPE_STRUCT_OPS.
      
      "struct bpf_struct_ops" is introduced as a common interface for the kernel
      struct that supports BPF_PROG_TYPE_STRUCT_OPS prog.  The supporting kernel
      struct will need to implement an instance of the "struct bpf_struct_ops".
      
      The supporting kernel struct also needs to implement a bpf_verifier_ops.
      During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
      bpf_verifier_ops by searching the attr->attach_btf_id.
      
      A new "btf_struct_access" is also added to the bpf_verifier_ops such
      that the supporting kernel struct can optionally provide its own specific
      check on accessing the func arg (e.g. provide limited write access).
      
      After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
      to initialize some values (e.g. the btf id of the supporting kernel
      struct) and it can only be done once the btf_vmlinux is available.
      
      The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
      if the return type of the prog->aux->attach_func_proto is "void".
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
      27ae7997
  16. 20 12月, 2019 2 次提交
  17. 17 12月, 2019 1 次提交
  18. 14 12月, 2019 4 次提交
  19. 12 12月, 2019 1 次提交
  20. 25 11月, 2019 3 次提交