1. 25 8月, 2020 1 次提交
  2. 26 7月, 2020 1 次提交
  3. 25 7月, 2020 1 次提交
  4. 20 7月, 2020 1 次提交
  5. 18 7月, 2020 3 次提交
    • J
      inet6: Run SK_LOOKUP BPF program on socket lookup · 1122702f
      Jakub Sitnicki 提交于
      Following ipv4 stack changes, run a BPF program attached to netns before
      looking up a listening socket. Program can return a listening socket to use
      as result of socket lookup, fail the lookup, or take no action.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com
      1122702f
    • J
      inet: Run SK_LOOKUP BPF program on socket lookup · 1559b4aa
      Jakub Sitnicki 提交于
      Run a BPF program before looking up a listening socket on the receive path.
      Program selects a listening socket to yield as result of socket lookup by
      calling bpf_sk_assign() helper and returning SK_PASS code. Program can
      revert its decision by assigning a NULL socket with bpf_sk_assign().
      
      Alternatively, BPF program can also fail the lookup by returning with
      SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
      no socket has been selected with bpf_sk_assign().
      
      This lets the user match packets with listening sockets freely at the last
      possible point on the receive path, where we know that packets are destined
      for local delivery after undergoing policing, filtering, and routing.
      
      With BPF code selecting the socket, directing packets destined to an IP
      range or to a port range to a single socket becomes possible.
      
      In case multiple programs are attached, they are run in series in the order
      in which they were attached. The end result is determined from return codes
      of all the programs according to following rules:
      
       1. If any program returned SK_PASS and selected a valid socket, the socket
          is used as result of socket lookup.
       2. If more than one program returned SK_PASS and selected a socket,
          last selection takes effect.
       3. If any program returned SK_DROP, and no program returned SK_PASS and
          selected a socket, socket lookup fails with -ECONNREFUSED.
       4. If all programs returned SK_PASS and none of them selected a socket,
          socket lookup continues to htable-based lookup.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
      1559b4aa
    • J
      bpf: Introduce SK_LOOKUP program type with a dedicated attach point · e9ddbb77
      Jakub Sitnicki 提交于
      Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
      BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
      when looking up a listening socket for a new connection request for
      connection oriented protocols, or when looking up an unconnected socket for
      a packet for connection-less protocols.
      
      When called, SK_LOOKUP BPF program can select a socket that will receive
      the packet. This serves as a mechanism to overcome the limits of what
      bind() API allows to express. Two use-cases driving this work are:
      
       (1) steer packets destined to an IP range, on fixed port to a socket
      
           192.0.2.0/24, port 80 -> NGINX socket
      
       (2) steer packets destined to an IP address, on any port to a socket
      
           198.51.100.1, any port -> L7 proxy socket
      
      In its run-time context program receives information about the packet that
      triggered the socket lookup. Namely IP version, L4 protocol identifier, and
      address 4-tuple. Context can be further extended to include ingress
      interface identifier.
      
      To select a socket BPF program fetches it from a map holding socket
      references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
      helper to record the selection. Transport layer then uses the selected
      socket as a result of socket lookup.
      
      In its basic form, SK_LOOKUP acts as a filter and hence must return either
      SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
      look for a socket to receive the packet, or use the one selected by the
      program if available, while SK_DROP informs the transport layer that the
      lookup should fail.
      
      This patch only enables the user to attach an SK_LOOKUP program to a
      network namespace. Subsequent patches hook it up to run on local delivery
      path in ipv4 and ipv6 stacks.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
      e9ddbb77
  6. 09 7月, 2020 2 次提交
    • K
      bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() · 63960260
      Kees Cook 提交于
      When evaluating access control over kallsyms visibility, credentials at
      open() time need to be used, not the "current" creds (though in BPF's
      case, this has likely always been the same). Plumb access to associated
      file->f_cred down through bpf_dump_raw_ok() and its callers now that
      kallsysm_show_value() has been refactored to take struct cred.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: bpf@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 7105e828 ("bpf: allow for correlation of maps and helpers in dump")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      63960260
    • K
      kallsyms: Refactor kallsyms_show_value() to take cred · 16025184
      Kees Cook 提交于
      In order to perform future tests against the cred saved during open(),
      switch kallsyms_show_value() to operate on a cred, and have all current
      callers pass current_cred(). This makes it very obvious where callers
      are checking the wrong credential in their "read" contexts. These will
      be fixed in the coming patches.
      
      Additionally switch return value to bool, since it is always used as a
      direct permission check, not a 0-on-success, negative-on-error style
      function return.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      16025184
  7. 08 5月, 2020 2 次提交
    • E
      crypto: lib/sha1 - fold linux/cryptohash.h into crypto/sha.h · 228c4f26
      Eric Biggers 提交于
      <linux/cryptohash.h> sounds very generic and important, like it's the
      header to include if you're doing cryptographic hashing in the kernel.
      But actually it only includes the library implementation of the SHA-1
      compression function (not even the full SHA-1).  This should basically
      never be used anymore; SHA-1 is no longer considered secure, and there
      are much better ways to do cryptographic hashing in the kernel.
      
      Remove this header and fold it into <crypto/sha.h> which already
      contains constants and functions for SHA-1 (along with SHA-2).
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      228c4f26
    • E
      crypto: lib/sha1 - rename "sha" to "sha1" · 6b0b0fa2
      Eric Biggers 提交于
      The library implementation of the SHA-1 compression function is
      confusingly called just "sha_transform()".  Alongside it are some "SHA_"
      constants and "sha_init()".  Presumably these are left over from a time
      when SHA just meant SHA-1.  But now there are also SHA-2 and SHA-3, and
      moreover SHA-1 is now considered insecure and thus shouldn't be used.
      
      Therefore, rename these functions and constants to make it very clear
      that they are for SHA-1.  Also add a comment to make it clear that these
      shouldn't be used.
      
      For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to
      SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that
      it matches the same definition in <crypto/sha.h>.  This prepares for
      merging <linux/cryptohash.h> into <crypto/sha.h>.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6b0b0fa2
  8. 05 5月, 2020 1 次提交
    • A
      bpf: Avoid gcc-10 stringop-overflow warning in struct bpf_prog · d26c0cc5
      Arnd Bergmann 提交于
      gcc-10 warns about accesses to zero-length arrays:
      
      kernel/bpf/core.c: In function 'bpf_patch_insn_single':
      cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
      In file included from kernel/bpf/core.c:21:
      include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here
        550 |   struct bpf_insn  insnsi[0];
            |                    ^~~~~~
      
      In this case, we really want to have two flexible-array members,
      but that is not possible. Removing the union to make insnsi a
      flexible-array member while leaving insns as a zero-length array
      fixes the warning, as nothing writes to the other one in that way.
      
      This trick only works on linux-3.18 or higher, as older versions
      had additional members in the union.
      
      Fixes: 60a3b225 ("net: bpf: make eBPF interpreter images read-only")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
      d26c0cc5
  9. 26 4月, 2020 1 次提交
  10. 14 3月, 2020 2 次提交
  11. 25 2月, 2020 4 次提交
  12. 17 1月, 2020 1 次提交
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
  13. 10 1月, 2020 1 次提交
    • M
      bpf: tcp: Support tcp_congestion_ops in bpf · 0baf26b0
      Martin KaFai Lau 提交于
      This patch makes "struct tcp_congestion_ops" to be the first user
      of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
      in bpf.
      
      The BPF implemented tcp_congestion_ops can be used like
      regular kernel tcp-cc through sysctl and setsockopt.  e.g.
      [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
      net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
      net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
      net.ipv4.tcp_congestion_control = bpf_cubic
      
      There has been attempt to move the TCP CC to the user space
      (e.g. CCP in TCP).   The common arguments are faster turn around,
      get away from long-tail kernel versions in production...etc,
      which are legit points.
      
      BPF has been the continuous effort to join both kernel and
      userspace upsides together (e.g. XDP to gain the performance
      advantage without bypassing the kernel).  The recent BPF
      advancements (in particular BTF-aware verifier, BPF trampoline,
      BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
      possible in BPF.  It allows a faster turnaround for testing algorithm
      in the production while leveraging the existing (and continue growing)
      BPF feature/framework instead of building one specifically for
      userspace TCP CC.
      
      This patch allows write access to a few fields in tcp-sock
      (in bpf_tcp_ca_btf_struct_access()).
      
      The optional "get_info" is unsupported now.  It can be added
      later.  One possible way is to output the info with a btf-id
      to describe the content.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
      0baf26b0
  14. 20 12月, 2019 1 次提交
  15. 14 12月, 2019 1 次提交
  16. 10 12月, 2019 1 次提交
  17. 02 12月, 2019 1 次提交
  18. 25 11月, 2019 2 次提交
  19. 16 11月, 2019 1 次提交
  20. 23 10月, 2019 1 次提交
    • D
      bpf: Fix use after free in subprog's jited symbol removal · cd7455f1
      Daniel Borkmann 提交于
      syzkaller managed to trigger the following crash:
      
        [...]
        BUG: unable to handle page fault for address: ffffc90001923030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
        RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
        [...]
        Call Trace:
         kernel_text_address kernel/extable.c:147 [inline]
         __kernel_text_address+0x9a/0x110 kernel/extable.c:102
         unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
         arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
         stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
         save_stack mm/kasan/common.c:69 [inline]
         set_track mm/kasan/common.c:77 [inline]
         __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
         kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
         slab_post_alloc_hook mm/slab.h:584 [inline]
         slab_alloc mm/slab.c:3319 [inline]
         kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
         getname_flags+0xba/0x640 fs/namei.c:138
         getname+0x19/0x20 fs/namei.c:209
         do_sys_open+0x261/0x560 fs/open.c:1091
         __do_sys_open fs/open.c:1115 [inline]
         __se_sys_open fs/open.c:1110 [inline]
         __x64_sys_open+0x87/0x90 fs/open.c:1110
         do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
      
      After further debugging it turns out that we walk kallsyms while in parallel
      we tear down a BPF program which contains subprograms that have been JITed
      though the program itself has not been fully exposed and is eventually bailing
      out with error.
      
      The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
      the symbols, however, bpf_prog_free() tears down the JIT memory too early via
      scheduled work. Instead, it needs to properly respect RCU grace period as the
      kallsyms walk for BPF is under RCU.
      
      Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
      path where we defer final destruction when we have subprogs in the program.
      
      Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup")
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
      cd7455f1
  21. 17 10月, 2019 2 次提交
  22. 16 9月, 2019 1 次提交
  23. 24 7月, 2019 1 次提交
    • I
      bpf: fix narrower loads on s390 · d9b8aada
      Ilya Leoshkevich 提交于
      The very first check in test_pkt_md_access is failing on s390, which
      happens because loading a part of a struct __sk_buff field produces
      an incorrect result.
      
      The preprocessed code of the check is:
      
      {
      	__u8 tmp = *((volatile __u8 *)&skb->len +
      		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
      	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
      };
      
      clang generates the following code for it:
      
            0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
            1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
            2:	57 30 00 00 00 00 00 ff	r3 &= 255
            3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>
      
      Finally, verifier transforms it to:
      
        0: (61) r2 = *(u32 *)(r1 +104)
        1: (bc) w2 = w2
        2: (74) w2 >>= 24
        3: (bc) w2 = w2
        4: (54) w2 &= 255
        5: (bc) w2 = w2
      
      The problem is that when verifier emits the code to replace a partial
      load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
      struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
      bitwise AND, it assumes that the machine is little endian and
      incorrectly decides to use a shift.
      
      Adjust shift count calculation to account for endianness.
      
      Fixes: 31fd8581 ("bpf: permits narrower load from bpf program context fields")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d9b8aada
  24. 16 7月, 2019 1 次提交
  25. 08 7月, 2019 1 次提交
  26. 29 6月, 2019 2 次提交
  27. 28 6月, 2019 1 次提交
    • S
      bpf: implement getsockopt and setsockopt hooks · 0d01da6a
      Stanislav Fomichev 提交于
      Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
      BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
      
      BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
      passing them down to the kernel or bypass kernel completely.
      BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
      kernel returns.
      Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.
      
      The buffer memory is pre-allocated (because I don't think there is
      a precedent for working with __user memory from bpf). This might be
      slow to do for each {s,g}etsockopt call, that's why I've added
      __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
      attached to a cgroup. Note, however, that there is a race between
      __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
      program layout might have changed; this should not be a problem
      because in general there is a race between multiple calls to
      {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
      
      The return code of the BPF program is handled as follows:
      * 0: EPERM
      * 1: success, continue with next BPF program in the cgroup chain
      
      v9:
      * allow overwriting setsockopt arguments (Alexei Starovoitov):
        * use set_fs (same as kernel_setsockopt)
        * buffer is always kzalloc'd (no small on-stack buffer)
      
      v8:
      * use s32 for optlen (Andrii Nakryiko)
      
      v7:
      * return only 0 or 1 (Alexei Starovoitov)
      * always run all progs (Alexei Starovoitov)
      * use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
        (decided to use optval=-1 instead, optval=0 might be a valid input)
      * call getsockopt hook after kernel handlers (Alexei Starovoitov)
      
      v6:
      * rework cgroup chaining; stop as soon as bpf program returns
        0 or 2; see patch with the documentation for the details
      * drop Andrii's and Martin's Acked-by (not sure they are comfortable
        with the new state of things)
      
      v5:
      * skip copy_to_user() and put_user() when ret == 0 (Martin Lau)
      
      v4:
      * don't export bpf_sk_fullsock helper (Martin Lau)
      * size != sizeof(__u64) for uapi pointers (Martin Lau)
      * offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)
      
      v3:
      * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
      * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
        Nakryiko)
      * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
      * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
      * new CG_SOCKOPT_ACCESS macro to wrap repeated parts
      
      v2:
      * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
      * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
      * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
      * added [0,2] return code check to verifier (Martin Lau)
      * dropped unused buf[64] from the stack (Martin Lau)
      * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
      * dropped bpf_target_off from ctx rewrites (Martin Lau)
      * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)
      
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0d01da6a
  28. 01 6月, 2019 1 次提交
    • B
      bpf: cgroup inet skb programs can return 0 to 3 · 5cf1e914
      brakmo 提交于
      Allows cgroup inet skb programs to return values in the range [0, 3].
      The second bit is used to deterine if congestion occurred and higher
      level protocol should decrease rate. E.g. TCP would call tcp_enter_cwr()
      
      The bpf_prog must set expected_attach_type to BPF_CGROUP_INET_EGRESS
      at load time if it uses the new return values (i.e. 2 or 3).
      
      The expected_attach_type is currently not enforced for
      BPF_PROG_TYPE_CGROUP_SKB.  e.g Meaning the current bpf_prog with
      expected_attach_type setting to BPF_CGROUP_INET_EGRESS can attach to
      BPF_CGROUP_INET_INGRESS.  Blindly enforcing expected_attach_type will
      break backward compatibility.
      
      This patch adds a enforce_expected_attach_type bit to only
      enforce the expected_attach_type when it uses the new
      return value.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5cf1e914
  29. 25 5月, 2019 1 次提交
    • J
      bpf: verifier: insert zero extension according to analysis result · a4b1d3c1
      Jiong Wang 提交于
      After previous patches, verifier will mark a insn if it really needs zero
      extension on dst_reg.
      
      It is then for back-ends to decide how to use such information to eliminate
      unnecessary zero extension code-gen during JIT compilation.
      
      One approach is verifier insert explicit zero extension for those insns
      that need zero extension in a generic way, JIT back-ends then do not
      generate zero extension for sub-register write at default.
      
      However, only those back-ends which do not have hardware zero extension
      want this optimization. Back-ends like x86_64 and AArch64 have hardware
      zero extension support that the insertion should be disabled.
      
      This patch introduces new target hook "bpf_jit_needs_zext" which returns
      false at default, meaning verifier zero extension insertion is disabled at
      default. A back-end could override this hook to return true if it doesn't
      have hardware support and want verifier insert zero extension explicitly.
      
      Offload targets do not use this native target hook, instead, they could
      get the optimization results using bpf_prog_offload_ops.finalize.
      
      NOTE: arches could have diversified features, it is possible for one arch
      to have hardware zero extension support for some sub-register write insns
      but not for all. For example, PowerPC, SPARC have zero extended loads, but
      not for alu32. So when verifier zero extension insertion enabled, these JIT
      back-ends need to peephole insns to remove those zero extension inserted
      for insn that actually has hardware zero extension support. The peephole
      could be as simple as looking the next insn, if it is a special zero
      extension insn then it is safe to eliminate it if the current insn has
      hardware zero extension support.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a4b1d3c1