- 15 1月, 2021 3 次提交
-
-
由 Brendan Jackman 提交于
This adds two atomic opcodes, both of which include the BPF_FETCH flag. XCHG without the BPF_FETCH flag would naturally encode atomic_set. This is not supported because it would be of limited value to userspace (it doesn't imply any barriers). CMPXCHG without BPF_FETCH woulud be an atomic compare-and-write. We don't have such an operation in the kernel so it isn't provided to BPF either. There are two significant design decisions made for the CMPXCHG instruction: - To solve the issue that this operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be R0. x86 has similar design (and A64 doesn't have this problem). A potential alternative might be to encode the other operand's register number in the immediate field. - The kernel's atomic_cmpxchg returns the old value, while the C11 userspace APIs return a boolean indicating the comparison result. Which should BPF do? A64 returns the old value. x86 returns the old value in the hard-coded register (and also sets a flag). That means return-old-value is easier to JIT, so that's what we use. Signed-off-by: NBrendan Jackman <jackmanb@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
-
由 Brendan Jackman 提交于
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC instructions, in order to have the previous value of the atomically-modified memory location loaded into the src register after an atomic op is carried out. Suggested-by: NYonghong Song <yhs@fb.com> Signed-off-by: NBrendan Jackman <jackmanb@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
-
由 Brendan Jackman 提交于
A subsequent patch will add additional atomic operations. These new operations will use the same opcode field as the existing XADD, with the immediate discriminating different operations. In preparation, rename the instruction mode BPF_ATOMIC and start calling the zero immediate BPF_ADD. This is possible (doesn't break existing valid BPF progs) because the immediate field is currently reserved MBZ and BPF_ADD is zero. All uses are removed from the tree but the BPF_XADD definition is kept around to avoid breaking builds for people including kernel headers. Signed-off-by: NBrendan Jackman <jackmanb@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NBjörn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
-
- 13 1月, 2021 1 次提交
-
-
由 Andrii Nakryiko 提交于
BPF interpreter uses extra input argument, so re-casts __bpf_call_base into __bpf_call_base_args. Avoid compiler warning about incompatible function prototypes by casting to void * first. Fixes: 1ea47e01 ("bpf: add support for bpf_call to interpreter") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-3-andrii@kernel.org
-
- 20 11月, 2020 1 次提交
-
-
由 Eric Biggers 提交于
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: NEric Biggers <ebiggers@google.com> Acked-by: NArd Biesheuvel <ardb@kernel.org> Acked-by: NJason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 27 10月, 2020 1 次提交
-
-
由 Arnd Bergmann 提交于
There are thousands of warnings about one macro in a W=2 build: include/linux/filter.h:561:6: warning: declaration of 'ret' shadows a previous local [-Wshadow] Prefix all the locals in that macro with __ to avoid most of these warnings. Fixes: 492ecee8 ("bpf: enable program stats") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201026162110.3710415-1-arnd@kernel.org
-
- 22 10月, 2020 1 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen. [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
-
- 11 9月, 2020 1 次提交
-
-
由 Lorenz Bauer 提交于
As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes. This leads to suboptimal instructions being generated (IPv4, x86): 1372 struct bpf_sk_lookup_kern ctx = { 0xffffffff81b87f30 <+624>: xor %eax,%eax 0xffffffff81b87f32 <+626>: mov $0x6,%ecx 0xffffffff81b87f37 <+631>: lea 0x90(%rsp),%rdi 0xffffffff81b87f3f <+639>: movl $0x110002,0x88(%rsp) 0xffffffff81b87f4a <+650>: rep stos %rax,%es:(%rdi) 0xffffffff81b87f4d <+653>: mov 0x8(%rsp),%eax 0xffffffff81b87f51 <+657>: mov %r13d,0x90(%rsp) 0xffffffff81b87f59 <+665>: incl %gs:0x7e4970a0(%rip) 0xffffffff81b87f60 <+672>: mov %eax,0x8c(%rsp) 0xffffffff81b87f67 <+679>: movzwl 0x10(%rsp),%eax 0xffffffff81b87f6c <+684>: mov %ax,0xa8(%rsp) 0xffffffff81b87f74 <+692>: movzwl 0x38(%rsp),%eax 0xffffffff81b87f79 <+697>: mov %ax,0xaa(%rsp) Fix this by moving around sport and dport. pahole confirms there are no more holes: struct bpf_sk_lookup_kern { u16 family; /* 0 2 */ u16 protocol; /* 2 2 */ __be16 sport; /* 4 2 */ u16 dport; /* 6 2 */ struct { __be32 saddr; /* 8 4 */ __be32 daddr; /* 12 4 */ } v4; /* 8 8 */ struct { const struct in6_addr * saddr; /* 16 8 */ const struct in6_addr * daddr; /* 24 8 */ } v6; /* 16 16 */ struct sock * selected_sk; /* 32 8 */ bool no_reuseport; /* 40 1 */ /* size: 48, cachelines: 1, members: 8 */ /* padding: 7 */ /* last cacheline: 48 bytes */ }; The assembly also doesn't contain the pesky rep stos anymore: 1372 struct bpf_sk_lookup_kern ctx = { 0xffffffff81b87f60 <+624>: movzwl 0x10(%rsp),%eax 0xffffffff81b87f65 <+629>: movq $0x0,0xa8(%rsp) 0xffffffff81b87f71 <+641>: movq $0x0,0xb0(%rsp) 0xffffffff81b87f7d <+653>: mov %ax,0x9c(%rsp) 0xffffffff81b87f85 <+661>: movzwl 0x38(%rsp),%eax 0xffffffff81b87f8a <+666>: movq $0x0,0xb8(%rsp) 0xffffffff81b87f96 <+678>: mov %ax,0x9e(%rsp) 0xffffffff81b87f9e <+686>: mov 0x8(%rsp),%eax 0xffffffff81b87fa2 <+690>: movq $0x0,0xc0(%rsp) 0xffffffff81b87fae <+702>: movl $0x110002,0x98(%rsp) 0xffffffff81b87fb9 <+713>: mov %eax,0xa0(%rsp) 0xffffffff81b87fc0 <+720>: mov %r13d,0xa4(%rsp) 1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/ Fixes: e9ddbb77 ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point") Suggested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com
-
- 25 8月, 2020 2 次提交
-
-
由 Martin KaFai Lau 提交于
[ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_*() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. * Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN*). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN*" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN*) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN*) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
A later patch needs to add a few pointers and a few u8 to sock_ops_kern. Hence, this patch saves some spaces by moving some of the existing members from u32 to u8 so that the later patch can still fit everything in a cacheline. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190058.2885640-1-kafai@fb.com
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 26 7月, 2020 1 次提交
-
-
由 Song Liu 提交于
bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain attached to perf_sample_data. If this callchain is not presented, do not allow attaching BPF program that calls bpf_get_[stack|stackid] to this event. In the error case, -EPROTO is returned so that libbpf can identify this error and print proper hint message. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
-
- 25 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Add a helper that copies either a native or compat bpf_fprog from userspace after verifying the length, and remove the compat setsockopt handlers that now aren't required. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 7月, 2020 3 次提交
-
-
由 Jakub Sitnicki 提交于
Following ipv4 stack changes, run a BPF program attached to netns before looking up a listening socket. Program can return a listening socket to use as result of socket lookup, fail the lookup, or take no action. Suggested-by: NMarek Majkowski <marek@cloudflare.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com
-
由 Jakub Sitnicki 提交于
Run a BPF program before looking up a listening socket on the receive path. Program selects a listening socket to yield as result of socket lookup by calling bpf_sk_assign() helper and returning SK_PASS code. Program can revert its decision by assigning a NULL socket with bpf_sk_assign(). Alternatively, BPF program can also fail the lookup by returning with SK_DROP, or let the lookup continue as usual with SK_PASS on return, when no socket has been selected with bpf_sk_assign(). This lets the user match packets with listening sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. With BPF code selecting the socket, directing packets destined to an IP range or to a port range to a single socket becomes possible. In case multiple programs are attached, they are run in series in the order in which they were attached. The end result is determined from return codes of all the programs according to following rules: 1. If any program returned SK_PASS and selected a valid socket, the socket is used as result of socket lookup. 2. If more than one program returned SK_PASS and selected a socket, last selection takes effect. 3. If any program returned SK_DROP, and no program returned SK_PASS and selected a socket, socket lookup fails with -ECONNREFUSED. 4. If all programs returned SK_PASS and none of them selected a socket, socket lookup continues to htable-based lookup. Suggested-by: NMarek Majkowski <marek@cloudflare.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
-
由 Jakub Sitnicki 提交于
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: NMarek Majkowski <marek@cloudflare.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
-
- 09 7月, 2020 2 次提交
-
-
由 Kees Cook 提交于
When evaluating access control over kallsyms visibility, credentials at open() time need to be used, not the "current" creds (though in BPF's case, this has likely always been the same). Plumb access to associated file->f_cred down through bpf_dump_raw_ok() and its callers now that kallsysm_show_value() has been refactored to take struct cred. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7105e828 ("bpf: allow for correlation of maps and helpers in dump") Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 08 5月, 2020 2 次提交
-
-
由 Eric Biggers 提交于
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Remove this header and fold it into <crypto/sha.h> which already contains constants and functions for SHA-1 (along with SHA-2). Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Eric Biggers 提交于
The library implementation of the SHA-1 compression function is confusingly called just "sha_transform()". Alongside it are some "SHA_" constants and "sha_init()". Presumably these are left over from a time when SHA just meant SHA-1. But now there are also SHA-2 and SHA-3, and moreover SHA-1 is now considered insecure and thus shouldn't be used. Therefore, rename these functions and constants to make it very clear that they are for SHA-1. Also add a comment to make it clear that these shouldn't be used. For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that it matches the same definition in <crypto/sha.h>. This prepares for merging <linux/cryptohash.h> into <crypto/sha.h>. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 05 5月, 2020 1 次提交
-
-
由 Arnd Bergmann 提交于
gcc-10 warns about accesses to zero-length arrays: kernel/bpf/core.c: In function 'bpf_patch_insn_single': cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] In file included from kernel/bpf/core.c:21: include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here 550 | struct bpf_insn insnsi[0]; | ^~~~~~ In this case, we really want to have two flexible-array members, but that is not possible. Removing the union to make insnsi a flexible-array member while leaving insns as a zero-length array fixes the warning, as nothing writes to the other one in that way. This trick only works on linux-3.18 or higher, as older versions had additional members in the union. Fixes: 60a3b225 ("net: bpf: make eBPF interpreter images read-only") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
-
- 26 4月, 2020 1 次提交
-
-
由 Stanislav Fomichev 提交于
linux-next build bot reported compile issue [1] with one of its configs. It looks like when we have CONFIG_NET=n and CONFIG_BPF{,_SYSCALL}=y, we are missing the bpf_base_func_proto definition (from net/core/filter.c) in cgroup_base_func_proto. I'm reshuffling the code a bit to make it work. The common helpers are moved into kernel/bpf/helpers.c and the bpf_base_func_proto is exported from there. Also, bpf_get_raw_cpu_id goes into kernel/bpf/core.c akin to existing bpf_user_rnd_u32. [1] https://lore.kernel.org/linux-next/CAKH8qBsBvKHswiX1nx40LgO+BGeTmb1NX8tiTttt_0uu6T3dCA@mail.gmail.com/T/#mff8b0c083314c68c2e2ef0211cb11bc20dc13c72 Fixes: 0456ea17 ("bpf: Enable more helpers for BPF_PROG_TYPE_CGROUP_{DEVICE,SYSCTL,SOCKOPT}") Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200424235941.58382-1-sdf@google.com
-
- 14 3月, 2020 2 次提交
-
-
由 Jiri Olsa 提交于
Adding name to 'struct bpf_ksym' object to carry the name of the symbol for bpf_prog, bpf_trampoline, bpf_dispatcher objects. The current benefit is that name is now generated only when the symbol is added to the list, so we don't need to generate it every time it's accessed. The future benefit is that we will have all the bpf objects symbols represented by struct bpf_ksym. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-5-jolsa@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Björn Töpel 提交于
Adding bpf_trampoline_ name prefix for DECLARE_BPF_DISPATCHER, so all the dispatchers have the common name prefix. And also a small '_' cleanup for bpf_dispatcher_nopfunc function name. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-3-jolsa@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 25 2月, 2020 4 次提交
-
-
由 David Miller 提交于
Replace the preemption disable/enable with migrate_disable/enable() to reflect the actual requirement and to allow PREEMPT_RT to substitute it with an actual migration disable mechanism which does not disable preemption. Including the code paths that go via __bpf_prog_run_save_cb(). Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.998293311@linutronix.de
-
由 David Miller 提交于
All of these cases are strictly of the form: preempt_disable(); BPF_PROG_RUN(...); preempt_enable(); Replace this with bpf_prog_run_pin_on_cpu() which wraps BPF_PROG_RUN() with: migrate_disable(); BPF_PROG_RUN(...); migrate_enable(); On non RT enabled kernels this maps to preempt_disable/enable() and on RT enabled kernels this solely prevents migration, which is sufficient as there is no requirement to prevent reentrancy to any BPF program from a preempting task. The only requirement is that the program stays on the same CPU. Therefore, this is a trivially correct transformation. The seccomp loop does not need protection over the loop. It only needs protection per BPF filter program [ tglx: Converted to bpf_prog_run_pin_on_cpu() ] Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.691493094@linutronix.de
-
由 Thomas Gleixner 提交于
As already discussed in the previous change which introduced BPF_RUN_PROG_PIN_ON_CPU() BPF only requires to disable migration to guarantee per CPUness. If RT substitutes the preempt disable based migration protection then the cant_sleep() check will obviously trigger as preemption is not disabled. Replace it by cant_migrate() which maps to cant_sleep() on a non RT kernel and will verify that migration is disabled on a full RT kernel. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.583038889@linutronix.de
-
由 Thomas Gleixner 提交于
BPF programs require to run on one CPU to completion as they use per CPU storage, but according to Alexei they don't need reentrancy protection as obviously BPF programs running in thread context can always be 'preempted' by hard and soft interrupts and instrumentation and the same program can run concurrently on a different CPU. The currently used mechanism to ensure CPUness is to wrap the invocation into a preempt_disable/enable() pair. Disabling preemption is also disabling migration for a task. preempt_disable/enable() is used because there is no explicit way to reliably disable only migration. Provide a separate macro to invoke a BPF program which can be used in migrateable task context. It wraps BPF_PROG_RUN() in a migrate_disable/enable() pair which maps on non RT enabled kernels to preempt_disable/enable(). On RT enabled kernels this merely disables migration. Both methods ensure that the invoked BPF program runs on one CPU to completion. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.474592620@linutronix.de
-
- 17 1月, 2020 1 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
-
- 10 1月, 2020 1 次提交
-
-
由 Martin KaFai Lau 提交于
This patch makes "struct tcp_congestion_ops" to be the first user of BPF STRUCT_OPS. It allows implementing a tcp_congestion_ops in bpf. The BPF implemented tcp_congestion_ops can be used like regular kernel tcp-cc through sysctl and setsockopt. e.g. [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic net.ipv4.tcp_congestion_control = bpf_cubic There has been attempt to move the TCP CC to the user space (e.g. CCP in TCP). The common arguments are faster turn around, get away from long-tail kernel versions in production...etc, which are legit points. BPF has been the continuous effort to join both kernel and userspace upsides together (e.g. XDP to gain the performance advantage without bypassing the kernel). The recent BPF advancements (in particular BTF-aware verifier, BPF trampoline, BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc) possible in BPF. It allows a faster turnaround for testing algorithm in the production while leveraging the existing (and continue growing) BPF feature/framework instead of building one specifically for userspace TCP CC. This patch allows write access to a few fields in tcp-sock (in bpf_tcp_ca_btf_struct_access()). The optional "get_info" is unsupported now. It can be added later. One possible way is to output the info with a btf-id to describe the content. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
-
- 20 12月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
Now that all XDP maps that can be used with bpf_redirect_map() tracks entries to be flushed in a global fashion, there is not need to track that the map has changed and flush from xdp_do_generic_map() anymore. All entries will be flushed in xdp_do_flush_map(). This means that the map_to_flush can be removed, and the corresponding checks. Moving the flush logic to one place, xdp_do_flush_map(), give a bulking behavior and performance boost. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191219061006.21980-8-bjorn.topel@gmail.com
-
- 14 12月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
This commit adds a BPF dispatcher for XDP. The dispatcher is updated from the XDP control-path, dev_xdp_install(), and used when an XDP program is run via bpf_prog_run_xdp(). Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
-
- 10 12月, 2019 1 次提交
-
-
由 Pankaj Bharadiya 提交于
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
-
- 02 12月, 2019 1 次提交
-
-
由 Daniel Borkmann 提交于
For the case where the interpreter is compiled out or when the prog is jited it is completely unnecessary to set the BPF insn pages as read-only. In fact, on frequent churn of BPF programs, it could lead to performance degradation of the system over time since it would break the direct map down to 4k pages when calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no reverse operation. Thus, avoid breaking up large pages for data maps, and only limit this to the module range used by the JIT where it is necessary to set the image read-only and executable. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191129222911.3710-1-daniel@iogearbox.net
-
- 25 11月, 2019 2 次提交
-
-
由 Daniel Borkmann 提交于
Add a definition of bpf_jit_blinding_enabled() when CONFIG_BPF_JIT is not set in order to fix a recent build regression: [...] CC kernel/bpf/verifier.o CC kernel/bpf/inode.o kernel/bpf/verifier.c: In function ‘fixup_bpf_calls’: kernel/bpf/verifier.c:9132:25: error: implicit declaration of function ‘bpf_jit_blinding_enabled’; did you mean ‘bpf_jit_kallsyms_enabled’? [-Werror=implicit-function-declaration] 9132 | bool expect_blinding = bpf_jit_blinding_enabled(prog); | ^~~~~~~~~~~~~~~~~~~~~~~~ | bpf_jit_kallsyms_enabled CC kernel/bpf/helpers.o CC kernel/bpf/hashtab.o [...] Fixes: d2e4c1e6 ("bpf: Constant map key tracking for prog array pokes") Reported-by: NJakub Sitnicki <jakub@cloudflare.com> Reported-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/40baf8f3507cac4851a310578edfb98ce73b5605.1574541375.git.daniel@iogearbox.net
-
由 Daniel Borkmann 提交于
Add initial poke table data structures and management to the BPF prog that can later be used by JITs. Also add an instance of poke specific data for tail call maps; plan for later work is to extend this also for BPF static keys. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1db285ec2ea4207ee0455b3f8e191a4fc58b9ade.1574452833.git.daniel@iogearbox.net
-
- 16 11月, 2019 1 次提交
-
-
由 Ilya Leoshkevich 提交于
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does not work: in such cases it silently aligns only to 4 bytes. On s390, in order to load a constant from memory in a large (>512k) BPF program, one must use lgrl instruction, whose memory operand must be aligned on an 8-byte boundary. This patch makes it possible to request 8-byte alignment from bpf_jit_binary_alloc, and also makes it issue a warning when an unsupported alignment is requested. Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191115123722.58462-1-iii@linux.ibm.com
-
- 23 10月, 2019 1 次提交
-
-
由 Daniel Borkmann 提交于
syzkaller managed to trigger the following crash: [...] BUG: unable to handle page fault for address: ffffc90001923030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline] RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline] RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline] RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline] RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline] RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline] RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709 [...] Call Trace: kernel_text_address kernel/extable.c:147 [inline] __kernel_text_address+0x9a/0x110 kernel/extable.c:102 unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19 arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26 stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123 save_stack mm/kasan/common.c:69 [inline] set_track mm/kasan/common.c:77 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3319 [inline] kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483 getname_flags+0xba/0x640 fs/namei.c:138 getname+0x19/0x20 fs/namei.c:209 do_sys_open+0x261/0x560 fs/open.c:1091 __do_sys_open fs/open.c:1115 [inline] __se_sys_open fs/open.c:1110 [inline] __x64_sys_open+0x87/0x90 fs/open.c:1110 do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] After further debugging it turns out that we walk kallsyms while in parallel we tear down a BPF program which contains subprograms that have been JITed though the program itself has not been fully exposed and is eventually bailing out with error. The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes the symbols, however, bpf_prog_free() tears down the JIT memory too early via scheduled work. Instead, it needs to properly respect RCU grace period as the kallsyms walk for BPF is under RCU. Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error path where we defer final destruction when we have subprogs in the program. Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup") Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs") Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
-
- 17 10月, 2019 1 次提交
-
-
由 Alexei Starovoitov 提交于
Pointer to BTF object is a pointer to kernel object or NULL. The memory access in the interpreter has to be done via probe_kernel_read to avoid page faults. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andriin@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
-