提交 5b7fe93d 编写于 作者: D David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-10-27

The following pull-request contains BPF updates for your *net-next* tree.

We've added 52 non-merge commits during the last 11 day(s) which contain
a total of 65 files changed, 2604 insertions(+), 1100 deletions(-).

The main changes are:

 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF
    assembly code. The work here teaches BPF verifier to recognize
    kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints
    such that verifier allows direct use of bpf_skb_event_output() helper
    used in tc BPF et al (w/o probing memory access) that dumps skb data
    into perf ring buffer. Also add direct loads to probe memory in order
    to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov.

 2) Big batch of changes to improve libbpf and BPF kselftests. Besides
    others: generalization of libbpf's CO-RE relocation support to now
    also include field existence relocations, revamp the BPF kselftest
    Makefile to add test runner concept allowing to exercise various
    ways to build BPF programs, and teach bpf_object__open() and friends
    to automatically derive BPF program type/expected attach type from
    section names to ease their use, from Andrii Nakryiko.

 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu.

 4) Allow to read BTF as raw data from bpftool. Most notable use case
    is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa.

 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which
    manages to improve "rx_drop" performance by ~4%., from Björn Töpel.

 6) Fix to restore the flow dissector after reattach BPF test and also
    fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki.

 7) Improve verifier's BTF ctx access for use outside of raw_tp, from
    Martin KaFai Lau.

 8) Improve documentation for AF_XDP with new sections and to reflect
    latest features, from Magnus Karlsson.

 9) Add back 'version' section parsing to libbpf for old kernels, from
    John Fastabend.

10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(),
    from KP Singh.

11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order
    to improve insn coverage for built BPF progs, from Yonghong Song.

12) Misc minor cleanups and fixes, from various others.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
......@@ -40,13 +40,13 @@ allocates memory for this UMEM using whatever means it feels is most
appropriate (malloc, mmap, huge pages, etc). This memory area is then
registered with the kernel using the new setsockopt XDP_UMEM_REG. The
UMEM also has two rings: the FILL ring and the COMPLETION ring. The
fill ring is used by the application to send down addr for the kernel
FILL ring is used by the application to send down addr for the kernel
to fill in with RX packet data. References to these frames will then
appear in the RX ring once each packet has been received. The
completion ring, on the other hand, contains frame addr that the
COMPLETION ring, on the other hand, contains frame addr that the
kernel has transmitted completely and can now be used again by user
space, for either TX or RX. Thus, the frame addrs appearing in the
completion ring are addrs that were previously transmitted using the
COMPLETION ring are addrs that were previously transmitted using the
TX ring. In summary, the RX and FILL rings are used for the RX path
and the TX and COMPLETION rings are used for the TX path.
......@@ -91,11 +91,16 @@ Concepts
========
In order to use an AF_XDP socket, a number of associated objects need
to be setup.
to be setup. These objects and their options are explained in the
following sections.
Jonathan Corbet has also written an excellent article on LWN,
"Accelerating networking with AF_XDP". It can be found at
https://lwn.net/Articles/750845/.
For an overview on how AF_XDP works, you can also take a look at the
Linux Plumbers paper from 2018 on the subject:
http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf. Do
NOT consult the paper from 2017 on "AF_PACKET v4", the first attempt
at AF_XDP. Nearly everything changed since then. Jonathan Corbet has
also written an excellent article on LWN, "Accelerating networking
with AF_XDP". It can be found at https://lwn.net/Articles/750845/.
UMEM
----
......@@ -113,22 +118,22 @@ the next socket B can do this by setting the XDP_SHARED_UMEM flag in
struct sockaddr_xdp member sxdp_flags, and passing the file descriptor
of A to struct sockaddr_xdp member sxdp_shared_umem_fd.
The UMEM has two single-producer/single-consumer rings, that are used
The UMEM has two single-producer/single-consumer rings that are used
to transfer ownership of UMEM frames between the kernel and the
user-space application.
Rings
-----
There are a four different kind of rings: Fill, Completion, RX and
There are a four different kind of rings: FILL, COMPLETION, RX and
TX. All rings are single-producer/single-consumer, so the user-space
application need explicit synchronization of multiple
processes/threads are reading/writing to them.
The UMEM uses two rings: Fill and Completion. Each socket associated
The UMEM uses two rings: FILL and COMPLETION. Each socket associated
with the UMEM must have an RX queue, TX queue or both. Say, that there
is a setup with four sockets (all doing TX and RX). Then there will be
one Fill ring, one Completion ring, four TX rings and four RX rings.
one FILL ring, one COMPLETION ring, four TX rings and four RX rings.
The rings are head(producer)/tail(consumer) based rings. A producer
writes the data ring at the index pointed out by struct xdp_ring
......@@ -146,7 +151,7 @@ The size of the rings need to be of size power of two.
UMEM Fill Ring
~~~~~~~~~~~~~~
The Fill ring is used to transfer ownership of UMEM frames from
The FILL ring is used to transfer ownership of UMEM frames from
user-space to kernel-space. The UMEM addrs are passed in the ring. As
an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
16 chunks and can pass addrs between 0 and 64k.
......@@ -164,8 +169,8 @@ chunks mode, then the incoming addr will be left untouched.
UMEM Completion Ring
~~~~~~~~~~~~~~~~~~~~
The Completion Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the Fill ring, UMEM indicies are
The COMPLETION Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the FILL ring, UMEM indices are
used.
Frames passed from the kernel to user-space are frames that has been
......@@ -181,7 +186,7 @@ The RX ring is the receiving side of a socket. Each entry in the ring
is a struct xdp_desc descriptor. The descriptor contains UMEM offset
(addr) and the length of the data (len).
If no frames have been passed to kernel via the Fill ring, no
If no frames have been passed to kernel via the FILL ring, no
descriptors will (or can) appear on the RX ring.
The user application consumes struct xdp_desc descriptors from this
......@@ -199,8 +204,24 @@ be relaxed in the future.
The user application produces struct xdp_desc descriptors to this
ring.
Libbpf
======
Libbpf is a helper library for eBPF and XDP that makes using these
technologies a lot simpler. It also contains specific helper functions
in tools/lib/bpf/xsk.h for facilitating the use of AF_XDP. It
contains two types of functions: those that can be used to make the
setup of AF_XDP socket easier and ones that can be used in the data
plane to access the rings safely and quickly. To see an example on how
to use this API, please take a look at the sample application in
samples/bpf/xdpsock_usr.c which uses libbpf for both setup and data
plane operations.
We recommend that you use this library unless you have become a power
user. It will make your program a lot simpler.
XSKMAP / BPF_MAP_TYPE_XSKMAP
----------------------------
============================
On XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that
is used in conjunction with bpf_redirect_map() to pass the ingress
......@@ -216,21 +237,184 @@ queue 17. Only the XDP program executing for eth0 and queue 17 will
successfully pass data to the socket. Please refer to the sample
application (samples/bpf/) in for an example.
Configuration Flags and Socket Options
======================================
These are the various configuration flags that can be used to control
and monitor the behavior of AF_XDP sockets.
XDP_COPY and XDP_ZERO_COPY bind flags
-------------------------------------
When you bind to a socket, the kernel will first try to use zero-copy
copy. If zero-copy is not supported, it will fall back on using copy
mode, i.e. copying all packets out to user space. But if you would
like to force a certain mode, you can use the following flags. If you
pass the XDP_COPY flag to the bind call, the kernel will force the
socket into copy mode. If it cannot use copy mode, the bind call will
fail with an error. Conversely, the XDP_ZERO_COPY flag will force the
socket into zero-copy mode or fail.
XDP_SHARED_UMEM bind flag
-------------------------
This flag enables you to bind multiple sockets to the same UMEM, but
only if they share the same queue id. In this mode, each socket has
their own RX and TX rings, but the UMEM (tied to the fist socket
created) only has a single FILL ring and a single COMPLETION
ring. To use this mode, create the first socket and bind it in the normal
way. Create a second socket and create an RX and a TX ring, or at
least one of them, but no FILL or COMPLETION rings as the ones from
the first socket will be used. In the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
sockets this way.
What socket will then a packet arrive on? This is decided by the XDP
program. Put all the sockets in the XSK_MAP and just indicate which
index in the array you would like to send each packet to. A simple
round-robin example of distributing packets is shown below:
.. code-block:: c
#include <linux/bpf.h>
#include "bpf_helpers.h"
#define MAX_SOCKS 16
struct {
__uint(type, BPF_MAP_TYPE_XSKMAP);
__uint(max_entries, MAX_SOCKS);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
} xsks_map SEC(".maps");
static unsigned int rr;
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
{
rr = (rr + 1) & (MAX_SOCKS - 1);
return bpf_redirect_map(&xsks_map, rr, 0);
}
Note, that since there is only a single set of FILL and COMPLETION
rings, and they are single producer, single consumer rings, you need
to make sure that multiple processes or threads do not use these rings
concurrently. There are no synchronization primitives in the
libbpf code that protects multiple users at this point in time.
XDP_USE_NEED_WAKEUP bind flag
-----------------------------
This option adds support for a new flag called need_wakeup that is
present in the FILL ring and the TX ring, the rings for which user
space is a producer. When this option is set in the bind call, the
need_wakeup flag will be set if the kernel needs to be explicitly
woken up by a syscall to continue processing packets. If the flag is
zero, no syscall is needed.
If the flag is set on the FILL ring, the application needs to call
poll() to be able to continue to receive packets on the RX ring. This
can happen, for example, when the kernel has detected that there are no
more buffers on the FILL ring and no buffers left on the RX HW ring of
the NIC. In this case, interrupts are turned off as the NIC cannot
receive any packets (as there are no buffers to put them in), and the
need_wakeup flag is set so that user space can put buffers on the
FILL ring and then call poll() so that the kernel driver can put these
buffers on the HW ring and start to receive packets.
If the flag is set for the TX ring, it means that the application
needs to explicitly notify the kernel to send any packets put on the
TX ring. This can be accomplished either by a poll() call, as in the
RX path, or by calling sendto().
An example of how to use this flag can be found in
samples/bpf/xdpsock_user.c. An example with the use of libbpf helpers
would look like this for the TX path:
.. code-block:: c
if (xsk_ring_prod__needs_wakeup(&my_tx_ring))
sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
I.e., only use the syscall if the flag is set.
We recommend that you always enable this mode as it usually leads to
better performance especially if you run the application and the
driver on the same core, but also if you use different cores for the
application and the kernel driver, as it reduces the number of
syscalls needed for the TX path.
XDP_{RX|TX|UMEM_FILL|UMEM_COMPLETION}_RING setsockopts
------------------------------------------------------
These setsockopts sets the number of descriptors that the RX, TX,
FILL, and COMPLETION rings respectively should have. It is mandatory
to set the size of at least one of the RX and TX rings. If you set
both, you will be able to both receive and send traffic from your
application, but if you only want to do one of them, you can save
resources by only setting up one of them. Both the FILL ring and the
COMPLETION ring are mandatory if you have a UMEM tied to your socket,
which is the normal case. But if the XDP_SHARED_UMEM flag is used, any
socket after the first one does not have a UMEM and should in that
case not have any FILL or COMPLETION rings created.
XDP_UMEM_REG setsockopt
-----------------------
This setsockopt registers a UMEM to a socket. This is the area that
contain all the buffers that packet can recide in. The call takes a
pointer to the beginning of this area and the size of it. Moreover, it
also has parameter called chunk_size that is the size that the UMEM is
divided into. It can only be 2K or 4K at the moment. If you have an
UMEM area that is 128K and a chunk size of 2K, this means that you
will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM
area and that your largest packet size can be 2K.
There is also an option to set the headroom of each single buffer in
the UMEM. If you set this to N bytes, it means that the packet will
start N bytes into the buffer leaving the first N bytes for the
application to use. The final option is the flags field, but it will
be dealt with in separate sections for each UMEM flag.
XDP_STATISTICS getsockopt
-------------------------
Gets drop statistics of a socket that can be useful for debug
purposes. The supported statistics are shown below:
.. code-block:: c
struct xdp_statistics {
__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
XDP_OPTIONS getsockopt
----------------------
Gets options from an XDP socket. The only one supported so far is
XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.
Usage
=====
In order to use AF_XDP sockets there are two parts needed. The
In order to use AF_XDP sockets two parts are needed. The
user-space application and the XDP program. For a complete setup and
usage example, please refer to the sample application. The user-space
side is xdpsock_user.c and the XDP side is part of libbpf.
The XDP code sample included in tools/lib/bpf/xsk.c is the following::
The XDP code sample included in tools/lib/bpf/xsk.c is the following:
.. code-block:: c
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
{
int index = ctx->rx_queue_index;
// A set entry here means that the correspnding queue_id
// A set entry here means that the corresponding queue_id
// has an active AF_XDP socket bound to it.
if (bpf_map_lookup_elem(&xsks_map, &index))
return bpf_redirect_map(&xsks_map, index, 0);
......@@ -238,7 +422,10 @@ The XDP code sample included in tools/lib/bpf/xsk.c is the following::
return XDP_PASS;
}
Naive ring dequeue and enqueue could look like this::
A simple but not so performance ring dequeue and enqueue could look
like this:
.. code-block:: c
// struct xdp_rxtx_ring {
// __u32 *producer;
......@@ -287,17 +474,16 @@ Naive ring dequeue and enqueue could look like this::
return 0;
}
For a more optimized version, please refer to the sample application.
But please use the libbpf functions as they are optimized and ready to
use. Will make your life easier.
Sample application
==================
There is a xdpsock benchmarking/test application included that
demonstrates how to use AF_XDP sockets with both private and shared
UMEMs. Say that you would like your UDP traffic from port 4242 to end
up in queue 16, that we will enable AF_XDP on. Here, we use ethtool
for this::
demonstrates how to use AF_XDP sockets with private UMEMs. Say that
you would like your UDP traffic from port 4242 to end up in queue 16,
that we will enable AF_XDP on. Here, we use ethtool for this::
ethtool -N p3p2 rx-flow-hash udp4 fn
ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
......@@ -311,13 +497,18 @@ using::
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
can be displayed with "-h", as usual.
This sample application uses libbpf to make the setup and usage of
AF_XDP simpler. If you want to know how the raw uapi of AF_XDP is
really used to make something more advanced, take a look at the libbpf
code in tools/lib/bpf/xsk.[ch].
FAQ
=======
Q: I am not seeing any traffic on the socket. What am I doing wrong?
A: When a netdev of a physical NIC is initialized, Linux usually
allocates one Rx and Tx queue pair per core. So on a 8 core system,
allocates one RX and TX queue pair per core. So on a 8 core system,
queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
bind call or the xsk_socket__create libbpf function call, you
specify a specific queue id to bind to and it is only the traffic
......@@ -343,9 +534,21 @@ A: When a netdev of a physical NIC is initialized, Linux usually
sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
4242 action 2
A number of other ways are possible all up to the capabilitites of
A number of other ways are possible all up to the capabilities of
the NIC you have.
Q: Can I use the XSKMAP to implement a switch betwen different umems
in copy mode?
A: The short answer is no, that is not supported at the moment. The
XSKMAP can only be used to switch traffic coming in on queue id X
to sockets bound to the same queue id X. The XSKMAP can contain
sockets bound to different queue ids, for example X and Y, but only
traffic goming in from queue id Y can be directed to sockets bound
to the same queue id Y. In zero-copy mode, you should use the
switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket.
Credits
=======
......
......@@ -9,7 +9,7 @@
#include <linux/filter.h>
#include <linux/if_vlan.h>
#include <linux/bpf.h>
#include <asm/extable.h>
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
......@@ -123,6 +123,19 @@ static const int reg2hex[] = {
[AUX_REG] = 3, /* R11 temp register */
};
static const int reg2pt_regs[] = {
[BPF_REG_0] = offsetof(struct pt_regs, ax),
[BPF_REG_1] = offsetof(struct pt_regs, di),
[BPF_REG_2] = offsetof(struct pt_regs, si),
[BPF_REG_3] = offsetof(struct pt_regs, dx),
[BPF_REG_4] = offsetof(struct pt_regs, cx),
[BPF_REG_5] = offsetof(struct pt_regs, r8),
[BPF_REG_6] = offsetof(struct pt_regs, bx),
[BPF_REG_7] = offsetof(struct pt_regs, r13),
[BPF_REG_8] = offsetof(struct pt_regs, r14),
[BPF_REG_9] = offsetof(struct pt_regs, r15),
};
/*
* is_ereg() == true if BPF register 'reg' maps to x86-64 r8..r15
* which need extra byte of encoding.
......@@ -377,6 +390,19 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
*pprog = prog;
}
static bool ex_handler_bpf(const struct exception_table_entry *x,
struct pt_regs *regs, int trapnr,
unsigned long error_code, unsigned long fault_addr)
{
u32 reg = x->fixup >> 8;
/* jump over faulting load and clear dest register */
*(unsigned long *)((void *)regs + reg) = 0;
regs->ip += x->fixup & 0xff;
return true;
}
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
int oldproglen, struct jit_context *ctx)
{
......@@ -384,7 +410,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
int insn_cnt = bpf_prog->len;
bool seen_exit = false;
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
int i, cnt = 0;
int i, cnt = 0, excnt = 0;
int proglen = 0;
u8 *prog = temp;
......@@ -778,14 +804,17 @@ stx: if (is_imm8(insn->off))
/* LDX: dst_reg = *(u8*)(src_reg + off) */
case BPF_LDX | BPF_MEM | BPF_B:
case BPF_LDX | BPF_PROBE_MEM | BPF_B:
/* Emit 'movzx rax, byte ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_H:
case BPF_LDX | BPF_PROBE_MEM | BPF_H:
/* Emit 'movzx rax, word ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_W:
case BPF_LDX | BPF_PROBE_MEM | BPF_W:
/* Emit 'mov eax, dword ptr [rax+0x14]' */
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
......@@ -793,6 +822,7 @@ stx: if (is_imm8(insn->off))
EMIT1(0x8B);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_DW:
case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
/* Emit 'mov rax, qword ptr [rax+0x14]' */
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
ldx: /*
......@@ -805,6 +835,48 @@ stx: if (is_imm8(insn->off))
else
EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
insn->off);
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
struct exception_table_entry *ex;
u8 *_insn = image + proglen;
s64 delta;
if (!bpf_prog->aux->extable)
break;
if (excnt >= bpf_prog->aux->num_exentries) {
pr_err("ex gen bug\n");
return -EFAULT;
}
ex = &bpf_prog->aux->extable[excnt++];
delta = _insn - (u8 *)&ex->insn;
if (!is_simm32(delta)) {
pr_err("extable->insn doesn't fit into 32-bit\n");
return -EFAULT;
}
ex->insn = delta;
delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler;
if (!is_simm32(delta)) {
pr_err("extable->handler doesn't fit into 32-bit\n");
return -EFAULT;
}
ex->handler = delta;
if (dst_reg > BPF_REG_9) {
pr_err("verifier error\n");
return -EFAULT;
}
/*
* Compute size of x86 insn and its target dest x86 register.
* ex_handler_bpf() will use lower 8 bits to adjust
* pt_regs->ip to jump over this x86 instruction
* and upper bits to figure out which pt_regs to zero out.
* End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
* of 4 bytes will be ignored and rbx will be zero inited.
*/
ex->fixup = (prog - temp) | (reg2pt_regs[dst_reg] << 8);
}
break;
/* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
......@@ -1058,6 +1130,11 @@ xadd: if (is_imm8(insn->off))
addrs[i] = proglen;
prog = temp;
}
if (image && excnt != bpf_prog->aux->num_exentries) {
pr_err("extable is not populated\n");
return -EFAULT;
}
return proglen;
}
......@@ -1158,12 +1235,24 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
break;
}
if (proglen == oldproglen) {
header = bpf_jit_binary_alloc(proglen, &image,
1, jit_fill_hole);
/*
* The number of entries in extable is the number of BPF_LDX
* insns that access kernel memory via "pointer to BTF type".
* The verifier changed their opcode from LDX|MEM|size
* to LDX|PROBE_MEM|size to make JITing easier.
*/
u32 align = __alignof__(struct exception_table_entry);
u32 extable_size = prog->aux->num_exentries *
sizeof(struct exception_table_entry);
/* allocate module memory for x86 insns and extable */
header = bpf_jit_binary_alloc(roundup(proglen, align) + extable_size,
&image, align, jit_fill_hole);
if (!header) {
prog = orig_prog;
goto out_addrs;
}
prog->aux->extable = (void *) image + roundup(proglen, align);
}
oldproglen = proglen;
cond_resched();
......
......@@ -16,6 +16,7 @@
#include <linux/u64_stats_sync.h>
struct bpf_verifier_env;
struct bpf_verifier_log;
struct perf_event;
struct bpf_prog;
struct bpf_map;
......@@ -23,6 +24,7 @@ struct sock;
struct seq_file;
struct btf;
struct btf_type;
struct exception_table_entry;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
......@@ -211,6 +213,7 @@ enum bpf_arg_type {
ARG_PTR_TO_INT, /* pointer to int */
ARG_PTR_TO_LONG, /* pointer to long */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */
ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */
};
/* type of values returned from helper functions */
......@@ -233,11 +236,17 @@ struct bpf_func_proto {
bool gpl_only;
bool pkt_access;
enum bpf_return_type ret_type;
enum bpf_arg_type arg1_type;
enum bpf_arg_type arg2_type;
enum bpf_arg_type arg3_type;
enum bpf_arg_type arg4_type;
enum bpf_arg_type arg5_type;
union {
struct {
enum bpf_arg_type arg1_type;
enum bpf_arg_type arg2_type;
enum bpf_arg_type arg3_type;
enum bpf_arg_type arg4_type;
enum bpf_arg_type arg5_type;
};
enum bpf_arg_type arg_type[5];
};
u32 *btf_id; /* BTF ids of arguments */
};
/* bpf_context is intentionally undefined structure. Pointer to bpf_context is
......@@ -281,6 +290,7 @@ enum bpf_reg_type {
PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
PTR_TO_TP_BUFFER, /* reg points to a writable raw tp's buffer */
PTR_TO_XDP_SOCK, /* reg points to struct xdp_sock */
PTR_TO_BTF_ID, /* reg points to kernel struct */
};
/* The information passed from prog-specific *_is_valid_access
......@@ -288,7 +298,11 @@ enum bpf_reg_type {
*/
struct bpf_insn_access_aux {
enum bpf_reg_type reg_type;
int ctx_field_size;
union {
int ctx_field_size;
u32 btf_id;
};
struct bpf_verifier_log *log; /* for verbose logs */
};
static inline void
......@@ -375,8 +389,14 @@ struct bpf_prog_aux {
u32 id;
u32 func_cnt; /* used by non-func prog as the number of func progs */
u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
u32 attach_btf_id; /* in-kernel BTF type id to attach to */
bool verifier_zext; /* Zero extensions has been inserted by verifier. */
bool offload_requested;
bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
const char *attach_func_name;
struct bpf_prog **func;
void *jit_data; /* JIT specific data. arch dependent */
struct latch_tree_node ksym_tnode;
......@@ -416,6 +436,8 @@ struct bpf_prog_aux {
* main prog always has linfo_idx == 0
*/
u32 linfo_idx;
u32 num_exentries;
struct exception_table_entry *extable;
struct bpf_prog_stats __percpu *stats;
union {
struct work_struct work;
......@@ -482,6 +504,7 @@ struct bpf_event_entry {
bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
int bpf_prog_calc_tag(struct bpf_prog *fp);
const char *kernel_type_name(u32 btf_type_id);
const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
......@@ -747,6 +770,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr);
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info);
int btf_struct_access(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id);
u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *, int);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
......
......@@ -52,6 +52,8 @@ struct bpf_reg_state {
*/
struct bpf_map *map_ptr;
u32 btf_id; /* for PTR_TO_BTF_ID */
/* Max size from any of the above. */
unsigned long raw;
};
......@@ -330,10 +332,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
#define BPF_LOG_STATS 4
#define BPF_LOG_LEVEL (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
#define BPF_LOG_MASK (BPF_LOG_LEVEL | BPF_LOG_STATS)
#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1) /* kernel internal flag */
static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
{
return log->level && log->ubuf && !bpf_verifier_log_full(log);
return (log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
log->level == BPF_LOG_KERNEL;
}
#define BPF_MAX_SUBPROGS 256
......@@ -397,6 +401,8 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
const char *fmt, va_list args);
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...);
__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
const char *fmt, ...);
static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
{
......
......@@ -5,6 +5,7 @@
#define _LINUX_BTF_H 1
#include <linux/types.h>
#include <uapi/linux/btf.h>
struct btf;
struct btf_member;
......@@ -53,9 +54,40 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
static inline bool btf_type_is_ptr(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_PTR;
}
static inline bool btf_type_is_int(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_INT;
}
static inline bool btf_type_is_enum(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
}
static inline bool btf_type_is_typedef(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
}
static inline bool btf_type_is_func(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC;
}
static inline bool btf_type_is_func_proto(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
}
#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void);
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)
......
......@@ -33,4 +33,14 @@ search_module_extables(unsigned long addr)
}
#endif /*CONFIG_MODULES*/
#ifdef CONFIG_BPF_JIT
const struct exception_table_entry *search_bpf_extables(unsigned long addr);
#else
static inline const struct exception_table_entry *
search_bpf_extables(unsigned long addr)
{
return NULL;
}
#endif
#endif /* _LINUX_EXTABLE_H */
......@@ -65,6 +65,9 @@ struct ctl_table_header;
/* unused opcode to mark special call to bpf_tail_call() helper */
#define BPF_TAIL_CALL 0xf0
/* unused opcode to mark special load instruction. Same as BPF_ABS */
#define BPF_PROBE_MEM 0x20
/* unused opcode to mark call to interpreter with arguments */
#define BPF_CALL_ARGS 0xe0
......@@ -464,10 +467,11 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
#define BPF_CALL_x(x, name, ...) \
static __always_inline \
u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \
u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \
{ \
return ____##name(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
} \
static __always_inline \
u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
......
......@@ -74,11 +74,12 @@ static inline void bpf_test_probe_##call(void) \
{ \
check_trace_callback_type_##call(__bpf_trace_##template); \
} \
typedef void (*btf_trace_##call)(void *__data, proto); \
static struct bpf_raw_event_map __used \
__attribute__((section("__bpf_raw_tp_map"))) \
__bpf_trace_tp_map_##call = { \
.tp = &__tracepoint_##call, \
.bpf_func = (void *)__bpf_trace_##template, \
.bpf_func = (void *)(btf_trace_##call)__bpf_trace_##template, \
.num_args = COUNT_ARGS(args), \
.writable_size = size, \
};
......
......@@ -22,7 +22,7 @@
#define __XDP_ACT_SYM_FN(x) \
{ XDP_##x, #x },
#define __XDP_ACT_SYM_TAB \
__XDP_ACT_MAP(__XDP_ACT_SYM_FN) { -1, 0 }
__XDP_ACT_MAP(__XDP_ACT_SYM_FN) { -1, NULL }
__XDP_ACT_MAP(__XDP_ACT_TP_FN)
TRACE_EVENT(xdp_exception,
......
......@@ -420,6 +420,7 @@ union bpf_attr {
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
......@@ -2750,6 +2751,30 @@ union bpf_attr {
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
*
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
*
* int bpf_skb_output(void *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
* Description
* Write raw *data* blob into a special BPF perf event held by
* *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
* event must have the following attributes: **PERF_SAMPLE_RAW**
* as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
* **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
*
* The *flags* are used to indicate the index in *map* for which
* the value must be put, masked with **BPF_F_INDEX_MASK**.
* Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
* to indicate that the index of the current CPU core should be
* used.
*
* The value to write, of *size*, is passed through eBPF stack and
* pointed by *data*.
*
* *ctx* is a pointer to in-kernel struct sk_buff.
*
* This helper is similar to **bpf_perf_event_output**\ () but
* restricted to raw_tracepoint bpf programs.
* Return
* 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -2862,7 +2887,8 @@ union bpf_attr {
FN(sk_storage_get), \
FN(sk_storage_delete), \
FN(send_signal), \
FN(tcp_gen_syncookie),
FN(tcp_gen_syncookie), \
FN(skb_output),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
......
......@@ -336,16 +336,6 @@ static bool btf_type_is_fwd(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
}
static bool btf_type_is_func(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC;
}
static bool btf_type_is_func_proto(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
}
static bool btf_type_nosize(const struct btf_type *t)
{
return btf_type_is_void(t) || btf_type_is_fwd(t) ||
......@@ -377,16 +367,6 @@ static bool btf_type_is_array(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
}
static bool btf_type_is_ptr(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_PTR;
}
static bool btf_type_is_int(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_INT;
}
static bool btf_type_is_var(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
......@@ -698,6 +678,13 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
if (!bpf_verifier_log_needed(log))
return;
/* btf verifier prints all types it is processing via
* btf_verifier_log_type(..., fmt = NULL).
* Skip those prints for in-kernel BTF verification.
*/
if (log->level == BPF_LOG_KERNEL && !fmt)
return;
__btf_verifier_log(log, "[%u] %s %s%s",
env->log_type_id,
btf_kind_str[kind],
......@@ -735,6 +722,8 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
if (!bpf_verifier_log_needed(log))
return;
if (log->level == BPF_LOG_KERNEL && !fmt)
return;
/* The CHECK_META phase already did a btf dump.
*
* If member is logged again, it must hit an error in
......@@ -777,6 +766,8 @@ static void btf_verifier_log_vsi(struct btf_verifier_env *env,
if (!bpf_verifier_log_needed(log))
return;
if (log->level == BPF_LOG_KERNEL && !fmt)
return;
if (env->phase != CHECK_META)
btf_verifier_log_type(env, datasec_type, NULL);
......@@ -802,6 +793,8 @@ static void btf_verifier_log_hdr(struct btf_verifier_env *env,
if (!bpf_verifier_log_needed(log))
return;
if (log->level == BPF_LOG_KERNEL)
return;
hdr = &btf->hdr;
__btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
__btf_verifier_log(log, "version: %u\n", hdr->version);
......@@ -2405,7 +2398,8 @@ static s32 btf_enum_check_meta(struct btf_verifier_env *env,
return -EINVAL;
}
if (env->log.level == BPF_LOG_KERNEL)
continue;
btf_verifier_log(env, "\t%s val=%d\n",
__btf_name_by_offset(btf, enums[i].name_off),
enums[i].val);
......@@ -3367,6 +3361,292 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
return ERR_PTR(err);
}
extern char __weak _binary__btf_vmlinux_bin_start[];
extern char __weak _binary__btf_vmlinux_bin_end[];
struct btf *btf_parse_vmlinux(void)
{
struct btf_verifier_env *env = NULL;
struct bpf_verifier_log *log;
struct btf *btf = NULL;
int err;
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
if (!env)
return ERR_PTR(-ENOMEM);
log = &env->log;
log->level = BPF_LOG_KERNEL;
btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
if (!btf) {
err = -ENOMEM;
goto errout;
}
env->btf = btf;
btf->data = _binary__btf_vmlinux_bin_start;
btf->data_size = _binary__btf_vmlinux_bin_end -
_binary__btf_vmlinux_bin_start;
err = btf_parse_hdr(env);
if (err)
goto errout;
btf->nohdr_data = btf->data + btf->hdr.hdr_len;
err = btf_parse_str_sec(env);
if (err)
goto errout;
err = btf_check_all_metas(env);
if (err)
goto errout;
btf_verifier_env_free(env);
refcount_set(&btf->refcnt, 1);
return btf;
errout:
btf_verifier_env_free(env);
if (btf) {
kvfree(btf->types);
kfree(btf);
}
return ERR_PTR(err);
}
extern struct btf *btf_vmlinux;
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
const struct btf_type *t = prog->aux->attach_func_proto;
const char *tname = prog->aux->attach_func_name;
struct bpf_verifier_log *log = info->log;
const struct btf_param *args;
u32 nr_args, arg;
if (off % 8) {
bpf_log(log, "func '%s' offset %d is not multiple of 8\n",
tname, off);
return false;
}
arg = off / 8;
args = (const struct btf_param *)(t + 1);
nr_args = btf_type_vlen(t);
if (prog->aux->attach_btf_trace) {
/* skip first 'void *__data' argument in btf_trace_##name typedef */
args++;
nr_args--;
}
if (arg >= nr_args) {
bpf_log(log, "func '%s' doesn't have %d-th argument\n",
tname, arg);
return false;
}
t = btf_type_by_id(btf_vmlinux, args[arg].type);
/* skip modifiers */
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf_vmlinux, t->type);
if (btf_type_is_int(t))
/* accessing a scalar */
return true;
if (!btf_type_is_ptr(t)) {
bpf_log(log,
"func '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
tname, arg,
__btf_name_by_offset(btf_vmlinux, t->name_off),
btf_kind_str[BTF_INFO_KIND(t->info)]);
return false;
}
if (t->type == 0)
/* This is a pointer to void.
* It is the same as scalar from the verifier safety pov.
* No further pointer walking is allowed.
*/
return true;
/* this is a pointer to another type */
info->reg_type = PTR_TO_BTF_ID;
info->btf_id = t->type;
t = btf_type_by_id(btf_vmlinux, t->type);
/* skip modifiers */
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf_vmlinux, t->type);
if (!btf_type_is_struct(t)) {
bpf_log(log,
"func '%s' arg%d type %s is not a struct\n",
tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
return false;
}
bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)],
__btf_name_by_offset(btf_vmlinux, t->name_off));
return true;
}
int btf_struct_access(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id)
{
const struct btf_member *member;
const struct btf_type *mtype;
const char *tname, *mname;
int i, moff = 0, msize;
again:
tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
if (!btf_type_is_struct(t)) {
bpf_log(log, "Type '%s' is not a struct", tname);
return -EINVAL;
}
for_each_member(i, t, member) {
/* offset of the field in bits */
moff = btf_member_bit_offset(t, member);
if (btf_member_bitfield_size(t, member))
/* bitfields are not supported yet */
continue;
if (off + size <= moff / 8)
/* won't find anything, field is already too far */
break;
/* type of the field */
mtype = btf_type_by_id(btf_vmlinux, member->type);
mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
/* skip modifiers */
while (btf_type_is_modifier(mtype))
mtype = btf_type_by_id(btf_vmlinux, mtype->type);
if (btf_type_is_array(mtype))
/* array deref is not supported yet */
continue;
if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
bpf_log(log, "field %s doesn't have size\n", mname);
return -EFAULT;
}
if (btf_type_is_ptr(mtype))
msize = 8;
else
msize = mtype->size;
if (off >= moff / 8 + msize)
/* no overlap with member, keep iterating */
continue;
/* the 'off' we're looking for is either equal to start
* of this field or inside of this struct
*/
if (btf_type_is_struct(mtype)) {
/* our field must be inside that union or struct */
t = mtype;
/* adjust offset we're looking for */
off -= moff / 8;
goto again;
}
if (msize != size) {
/* field access size doesn't match */
bpf_log(log,
"cannot access %d bytes in struct %s field %s that has size %d\n",
size, tname, mname, msize);
return -EACCES;
}
if (btf_type_is_ptr(mtype)) {
const struct btf_type *stype;
stype = btf_type_by_id(btf_vmlinux, mtype->type);
/* skip modifiers */
while (btf_type_is_modifier(stype))
stype = btf_type_by_id(btf_vmlinux, stype->type);
if (btf_type_is_struct(stype)) {
*next_btf_id = mtype->type;
return PTR_TO_BTF_ID;
}
}
/* all other fields are treated as scalars */
return SCALAR_VALUE;
}
bpf_log(log, "struct %s doesn't have field at offset %d\n", tname, off);
return -EINVAL;
}
u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *fn, int arg)
{
char fnname[KSYM_SYMBOL_LEN + 4] = "btf_";
const struct btf_param *args;
const struct btf_type *t;
const char *tname, *sym;
u32 btf_id, i;
if (IS_ERR(btf_vmlinux)) {
bpf_log(log, "btf_vmlinux is malformed\n");
return -EINVAL;
}
sym = kallsyms_lookup((long)fn, NULL, NULL, NULL, fnname + 4);
if (!sym) {
bpf_log(log, "kernel doesn't have kallsyms\n");
return -EFAULT;
}
for (i = 1; i <= btf_vmlinux->nr_types; i++) {
t = btf_type_by_id(btf_vmlinux, i);
if (BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF)
continue;
tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
if (!strcmp(tname, fnname))
break;
}
if (i > btf_vmlinux->nr_types) {
bpf_log(log, "helper %s type is not found\n", fnname);
return -ENOENT;
}
t = btf_type_by_id(btf_vmlinux, t->type);
if (!btf_type_is_ptr(t))
return -EFAULT;
t = btf_type_by_id(btf_vmlinux, t->type);
if (!btf_type_is_func_proto(t))
return -EFAULT;
args = (const struct btf_param *)(t + 1);
if (arg >= btf_type_vlen(t)) {
bpf_log(log, "bpf helper %s doesn't have %d-th argument\n",
fnname, arg);
return -EINVAL;
}
t = btf_type_by_id(btf_vmlinux, args[arg].type);
if (!btf_type_is_ptr(t) || !t->type) {
/* anything but the pointer to struct is a helper config bug */
bpf_log(log, "ARG_PTR_TO_BTF is misconfigured\n");
return -EFAULT;
}
btf_id = t->type;
t = btf_type_by_id(btf_vmlinux, t->type);
/* skip modifiers */
while (btf_type_is_modifier(t)) {
btf_id = t->type;
t = btf_type_by_id(btf_vmlinux, t->type);
}
if (!btf_type_is_struct(t)) {
bpf_log(log, "ARG_PTR_TO_BTF is not a struct\n");
return -EFAULT;
}
bpf_log(log, "helper %s arg%d has btf_id %d struct %s\n", fnname + 4,
arg, btf_id, __btf_name_by_offset(btf_vmlinux, t->name_off));
return btf_id;
}
void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
struct seq_file *m)
{
......
......@@ -30,7 +30,7 @@
#include <linux/kallsyms.h>
#include <linux/rcupdate.h>
#include <linux/perf_event.h>
#include <linux/extable.h>
#include <asm/unaligned.h>
/* Registers */
......@@ -712,6 +712,24 @@ bool is_bpf_text_address(unsigned long addr)
return ret;
}
const struct exception_table_entry *search_bpf_extables(unsigned long addr)
{
const struct exception_table_entry *e = NULL;
struct bpf_prog *prog;
rcu_read_lock();
prog = bpf_prog_kallsyms_find(addr);
if (!prog)
goto out;
if (!prog->aux->num_exentries)
goto out;
e = search_extable(prog->aux->extable, prog->aux->num_exentries, addr);
out:
rcu_read_unlock();
return e;
}
int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
char *sym)
{
......@@ -1291,6 +1309,11 @@ bool bpf_opcode_in_insntable(u8 code)
}
#ifndef CONFIG_BPF_JIT_ALWAYS_ON
u64 __weak bpf_probe_read(void * dst, u32 size, const void * unsafe_ptr)
{
memset(dst, 0, size);
return -EFAULT;
}
/**
* __bpf_prog_run - run eBPF program on a given context
* @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
......@@ -1310,6 +1333,10 @@ static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u6
/* Non-UAPI available opcodes. */
[BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
[BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
[BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
};
#undef BPF_INSN_3_LBL
#undef BPF_INSN_2_LBL
......@@ -1542,6 +1569,16 @@ static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u6
LDST(W, u32)
LDST(DW, u64)
#undef LDST
#define LDX_PROBE(SIZEOP, SIZE) \
LDX_PROBE_MEM_##SIZEOP: \
bpf_probe_read(&DST, SIZE, (const void *)(long) SRC); \
CONT;
LDX_PROBE(B, 1)
LDX_PROBE(H, 2)
LDX_PROBE(W, 4)
LDX_PROBE(DW, 8)
#undef LDX_PROBE
STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
atomic_add((u32) SRC, (atomic_t *)(unsigned long)
(DST + insn->off));
......
......@@ -287,7 +287,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
bool irq_work_busy = false;
struct stack_map_irq_work *work = NULL;
if (in_nmi()) {
if (irqs_disabled()) {
work = this_cpu_ptr(&up_read_work);
if (work->irq_work.flags & IRQ_WORK_BUSY)
/* cannot queue more up_read, fallback */
......@@ -295,8 +295,9 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
}
/*
* We cannot do up_read() in nmi context. To do build_id lookup
* in nmi context, we need to run up_read() in irq_work. We use
* We cannot do up_read() when the irq is disabled, because of
* risk to deadlock with rq_lock. To do build_id lookup when the
* irqs are disabled, we need to run up_read() in irq_work. We use
* a percpu variable to do the irq_work. If the irq_work is
* already used by another lookup, we fall back to report ips.
*
......
......@@ -23,6 +23,7 @@
#include <linux/timekeeping.h>
#include <linux/ctype.h>
#include <linux/nospec.h>
#include <uapi/linux/btf.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
......@@ -1565,9 +1566,21 @@ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr)
}
static int
bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
enum bpf_attach_type expected_attach_type)
bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
enum bpf_attach_type expected_attach_type,
u32 btf_id)
{
switch (prog_type) {
case BPF_PROG_TYPE_RAW_TRACEPOINT:
if (btf_id > BTF_MAX_TYPE)
return -EINVAL;
break;
default:
if (btf_id)
return -EINVAL;
break;
}
switch (prog_type) {
case BPF_PROG_TYPE_CGROUP_SOCK:
switch (expected_attach_type) {
......@@ -1614,7 +1627,7 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD line_info_cnt
#define BPF_PROG_LOAD_LAST_FIELD attach_btf_id
static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
{
......@@ -1656,7 +1669,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
return -EPERM;
bpf_prog_load_fixup_attach_type(attr);
if (bpf_prog_load_check_attach_type(type, attr->expected_attach_type))
if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
attr->attach_btf_id))
return -EINVAL;
/* plain bpf_prog allocation */
......@@ -1665,6 +1679,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
return -ENOMEM;
prog->expected_attach_type = attr->expected_attach_type;
prog->aux->attach_btf_id = attr->attach_btf_id;
prog->aux->offload_requested = !!attr->prog_ifindex;
......@@ -1806,17 +1821,50 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
struct bpf_raw_tracepoint *raw_tp;
struct bpf_raw_event_map *btp;
struct bpf_prog *prog;
char tp_name[128];
const char *tp_name;
char buf[128];
int tp_fd, err;
if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
sizeof(tp_name) - 1) < 0)
return -EFAULT;
tp_name[sizeof(tp_name) - 1] = 0;
if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN))
return -EINVAL;
prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
if (IS_ERR(prog))
return PTR_ERR(prog);
if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
err = -EINVAL;
goto out_put_prog;
}
if (prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT &&
prog->aux->attach_btf_id) {
if (attr->raw_tracepoint.name) {
/* raw_tp name should not be specified in raw_tp
* programs that were verified via in-kernel BTF info
*/
err = -EINVAL;
goto out_put_prog;
}
/* raw_tp name is taken from type name instead */
tp_name = prog->aux->attach_func_name;
} else {
if (strncpy_from_user(buf,
u64_to_user_ptr(attr->raw_tracepoint.name),
sizeof(buf) - 1) < 0) {
err = -EFAULT;
goto out_put_prog;
}
buf[sizeof(buf) - 1] = 0;
tp_name = buf;
}
btp = bpf_get_raw_tracepoint(tp_name);
if (!btp)
return -ENOENT;
if (!btp) {
err = -ENOENT;
goto out_put_prog;
}
raw_tp = kzalloc(sizeof(*raw_tp), GFP_USER);
if (!raw_tp) {
......@@ -1824,38 +1872,27 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
goto out_put_btp;
}
raw_tp->btp = btp;
prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
if (IS_ERR(prog)) {
err = PTR_ERR(prog);
goto out_free_tp;
}
if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT &&
prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) {
err = -EINVAL;
goto out_put_prog;
}
raw_tp->prog = prog;
err = bpf_probe_register(raw_tp->btp, prog);
if (err)
goto out_put_prog;
goto out_free_tp;
raw_tp->prog = prog;
tp_fd = anon_inode_getfd("bpf-raw-tracepoint", &bpf_raw_tp_fops, raw_tp,
O_CLOEXEC);
if (tp_fd < 0) {
bpf_probe_unregister(raw_tp->btp, prog);
err = tp_fd;
goto out_put_prog;
goto out_free_tp;
}
return tp_fd;
out_put_prog:
bpf_prog_put(prog);
out_free_tp:
kfree(raw_tp);
out_put_btp:
bpf_put_raw_tracepoint(btp);
out_put_prog:
bpf_prog_put(prog);
return err;
}
......
......@@ -205,8 +205,11 @@ struct bpf_call_arg_meta {
u64 msize_umax_value;
int ref_obj_id;
int func_id;
u32 btf_id;
};
struct btf *btf_vmlinux;
static DEFINE_MUTEX(bpf_verifier_lock);
static const struct bpf_line_info *
......@@ -243,6 +246,10 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
n = min(log->len_total - log->len_used - 1, n);
log->kbuf[n] = '\0';
if (log->level == BPF_LOG_KERNEL) {
pr_err("BPF:%s\n", log->kbuf);
return;
}
if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
log->len_used += n;
else
......@@ -280,6 +287,19 @@ __printf(2, 3) static void verbose(void *private_data, const char *fmt, ...)
va_end(args);
}
__printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
const char *fmt, ...)
{
va_list args;
if (!bpf_verifier_log_needed(log))
return;
va_start(args, fmt);
bpf_verifier_vlog(log, fmt, args);
va_end(args);
}
static const char *ltrim(const char *s)
{
while (isspace(*s))
......@@ -400,6 +420,7 @@ static const char * const reg_type_str[] = {
[PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
[PTR_TO_TP_BUFFER] = "tp_buffer",
[PTR_TO_XDP_SOCK] = "xdp_sock",
[PTR_TO_BTF_ID] = "ptr_",
};
static char slot_type_char[] = {
......@@ -430,6 +451,12 @@ static struct bpf_func_state *func(struct bpf_verifier_env *env,
return cur->frame[reg->frameno];
}
const char *kernel_type_name(u32 id)
{
return btf_name_by_offset(btf_vmlinux,
btf_type_by_id(btf_vmlinux, id)->name_off);
}
static void print_verifier_state(struct bpf_verifier_env *env,
const struct bpf_func_state *state)
{
......@@ -454,6 +481,8 @@ static void print_verifier_state(struct bpf_verifier_env *env,
/* reg->off should be 0 for SCALAR_VALUE */
verbose(env, "%lld", reg->var_off.value + reg->off);
} else {
if (t == PTR_TO_BTF_ID)
verbose(env, "%s", kernel_type_name(reg->btf_id));
verbose(env, "(id=%d", reg->id);
if (reg_type_may_be_refcounted_or_null(t))
verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
......@@ -2331,10 +2360,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
/* check access to 'struct bpf_context' fields. Supports fixed offsets only */
static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
enum bpf_access_type t, enum bpf_reg_type *reg_type)
enum bpf_access_type t, enum bpf_reg_type *reg_type,
u32 *btf_id)
{
struct bpf_insn_access_aux info = {
.reg_type = *reg_type,
.log = &env->log,
};
if (env->ops->is_valid_access &&
......@@ -2348,7 +2379,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
*/
*reg_type = info.reg_type;
env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
if (*reg_type == PTR_TO_BTF_ID)
*btf_id = info.btf_id;
else
env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
/* remember the offset of last byte accessed in ctx */
if (env->prog->aux->max_ctx_offset < off + size)
env->prog->aux->max_ctx_offset = off + size;
......@@ -2774,6 +2808,53 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
return 0;
}
static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
struct bpf_reg_state *regs,
int regno, int off, int size,
enum bpf_access_type atype,
int value_regno)
{
struct bpf_reg_state *reg = regs + regno;
const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
u32 btf_id;
int ret;
if (atype != BPF_READ) {
verbose(env, "only read is supported\n");
return -EACCES;
}
if (off < 0) {
verbose(env,
"R%d is ptr_%s invalid negative access: off=%d\n",
regno, tname, off);
return -EACCES;
}
if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
char tn_buf[48];
tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
verbose(env,
"R%d is ptr_%s invalid variable offset: off=%d, var_off=%s\n",
regno, tname, off, tn_buf);
return -EACCES;
}
ret = btf_struct_access(&env->log, t, off, size, atype, &btf_id);
if (ret < 0)
return ret;
if (ret == SCALAR_VALUE) {
mark_reg_unknown(env, regs, value_regno);
return 0;
}
mark_reg_known_zero(env, regs, value_regno);
regs[value_regno].type = PTR_TO_BTF_ID;
regs[value_regno].btf_id = btf_id;
return 0;
}
/* check whether memory at (regno + off) is accessible for t = (read | write)
* if t==write, value_regno is a register which value is stored into memory
* if t==read, value_regno is a register which will receive the value from memory
......@@ -2834,6 +2915,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
}
} else if (reg->type == PTR_TO_CTX) {
enum bpf_reg_type reg_type = SCALAR_VALUE;
u32 btf_id = 0;
if (t == BPF_WRITE && value_regno >= 0 &&
is_pointer_value(env, value_regno)) {
......@@ -2845,7 +2927,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
if (err < 0)
return err;
err = check_ctx_access(env, insn_idx, off, size, t, &reg_type);
err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
if (err)
verbose_linfo(env, insn_idx, "; ");
if (!err && t == BPF_READ && value_regno >= 0) {
/* ctx access returns either a scalar, or a
* PTR_TO_PACKET[_META,_END]. In the latter
......@@ -2864,6 +2948,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
* a sub-register.
*/
regs[value_regno].subreg_def = DEF_NOT_SUBREG;
if (reg_type == PTR_TO_BTF_ID)
regs[value_regno].btf_id = btf_id;
}
regs[value_regno].type = reg_type;
}
......@@ -2923,6 +3009,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
err = check_tp_buffer_access(env, reg, regno, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
} else if (reg->type == PTR_TO_BTF_ID) {
err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
value_regno);
} else {
verbose(env, "R%d invalid mem access '%s'\n", regno,
reg_type_str[reg->type]);
......@@ -3351,6 +3440,22 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
expected_type = PTR_TO_SOCKET;
if (type != expected_type)
goto err_type;
} else if (arg_type == ARG_PTR_TO_BTF_ID) {
expected_type = PTR_TO_BTF_ID;
if (type != expected_type)
goto err_type;
if (reg->btf_id != meta->btf_id) {
verbose(env, "Helper has type %s got %s in R%d\n",
kernel_type_name(meta->btf_id),
kernel_type_name(reg->btf_id), regno);
return -EACCES;
}
if (!tnum_is_const(reg->var_off) || reg->var_off.value || reg->off) {
verbose(env, "R%d is a pointer to in-kernel struct with non-zero offset\n",
regno);
return -EACCES;
}
} else if (arg_type == ARG_PTR_TO_SPIN_LOCK) {
if (meta->func_id == BPF_FUNC_spin_lock) {
if (process_spin_lock(env, regno, true))
......@@ -3498,6 +3603,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
if (func_id != BPF_FUNC_perf_event_read &&
func_id != BPF_FUNC_perf_event_output &&
func_id != BPF_FUNC_skb_output &&
func_id != BPF_FUNC_perf_event_read_value)
goto error;
break;
......@@ -3585,6 +3691,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
case BPF_FUNC_perf_event_read:
case BPF_FUNC_perf_event_output:
case BPF_FUNC_perf_event_read_value:
case BPF_FUNC_skb_output:
if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
goto error;
break;
......@@ -4039,21 +4146,16 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
meta.func_id = func_id;
/* check args */
err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &meta);
if (err)
return err;
err = check_func_arg(env, BPF_REG_2, fn->arg2_type, &meta);
if (err)
return err;
err = check_func_arg(env, BPF_REG_3, fn->arg3_type, &meta);
if (err)
return err;
err = check_func_arg(env, BPF_REG_4, fn->arg4_type, &meta);
if (err)
return err;
err = check_func_arg(env, BPF_REG_5, fn->arg5_type, &meta);
if (err)
return err;
for (i = 0; i < 5; i++) {
if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID) {
if (!fn->btf_id[i])
fn->btf_id[i] = btf_resolve_helper_id(&env->log, fn->func, i);
meta.btf_id = fn->btf_id[i];
}
err = check_func_arg(env, BPF_REG_1 + i, fn->arg_type[i], &meta);
if (err)
return err;
}
err = record_func_map(env, &meta, func_id, insn_idx);
if (err)
......@@ -7493,6 +7595,7 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
case PTR_TO_TCP_SOCK:
case PTR_TO_TCP_SOCK_OR_NULL:
case PTR_TO_XDP_SOCK:
case PTR_TO_BTF_ID:
return false;
default:
return true;
......@@ -8634,6 +8737,14 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
case PTR_TO_XDP_SOCK:
convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
break;
case PTR_TO_BTF_ID:
if (type == BPF_WRITE) {
verbose(env, "Writes through BTF pointers are not allowed\n");
return -EINVAL;
}
insn->code = BPF_LDX | BPF_PROBE_MEM | BPF_SIZE((insn)->code);
env->prog->aux->num_exentries++;
continue;
default:
continue;
}
......@@ -9261,6 +9372,52 @@ static void print_verification_stats(struct bpf_verifier_env *env)
env->peak_states, env->longest_mark_read_walk);
}
static int check_attach_btf_id(struct bpf_verifier_env *env)
{
struct bpf_prog *prog = env->prog;
u32 btf_id = prog->aux->attach_btf_id;
const struct btf_type *t;
const char *tname;
if (prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT && btf_id) {
const char prefix[] = "btf_trace_";
t = btf_type_by_id(btf_vmlinux, btf_id);
if (!t) {
verbose(env, "attach_btf_id %u is invalid\n", btf_id);
return -EINVAL;
}
if (!btf_type_is_typedef(t)) {
verbose(env, "attach_btf_id %u is not a typedef\n",
btf_id);
return -EINVAL;
}
tname = btf_name_by_offset(btf_vmlinux, t->name_off);
if (!tname || strncmp(prefix, tname, sizeof(prefix) - 1)) {
verbose(env, "attach_btf_id %u points to wrong type name %s\n",
btf_id, tname);
return -EINVAL;
}
tname += sizeof(prefix) - 1;
t = btf_type_by_id(btf_vmlinux, t->type);
if (!btf_type_is_ptr(t))
/* should never happen in valid vmlinux build */
return -EINVAL;
t = btf_type_by_id(btf_vmlinux, t->type);
if (!btf_type_is_func_proto(t))
/* should never happen in valid vmlinux build */
return -EINVAL;
/* remember two read only pointers that are valid for
* the life time of the kernel
*/
prog->aux->attach_func_name = tname;
prog->aux->attach_func_proto = t;
prog->aux->attach_btf_trace = true;
}
return 0;
}
int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
union bpf_attr __user *uattr)
{
......@@ -9294,6 +9451,13 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
env->ops = bpf_verifier_ops[env->prog->type];
is_priv = capable(CAP_SYS_ADMIN);
if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
mutex_lock(&bpf_verifier_lock);
if (!btf_vmlinux)
btf_vmlinux = btf_parse_vmlinux();
mutex_unlock(&bpf_verifier_lock);
}
/* grab the mutex to protect few globals used by verifier */
if (!is_priv)
mutex_lock(&bpf_verifier_lock);
......@@ -9313,6 +9477,17 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
goto err_unlock;
}
if (IS_ERR(btf_vmlinux)) {
/* Either gcc or pahole or kernel are broken. */
verbose(env, "in-kernel BTF is malformed\n");
ret = PTR_ERR(btf_vmlinux);
goto skip_full_check;
}
ret = check_attach_btf_id(env);
if (ret)
goto skip_full_check;
env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
......
......@@ -56,6 +56,8 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr)
e = search_kernel_exception_table(addr);
if (!e)
e = search_module_extables(addr);
if (!e)
e = search_bpf_extables(addr);
return e;
}
......
......@@ -995,6 +995,8 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
extern const struct bpf_func_proto bpf_skb_output_proto;
BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
struct bpf_map *, map, u64, flags)
{
......@@ -1053,6 +1055,10 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
switch (func_id) {
case BPF_FUNC_perf_event_output:
return &bpf_perf_event_output_proto_raw_tp;
#ifdef CONFIG_NET
case BPF_FUNC_skb_output:
return &bpf_skb_output_proto;
#endif
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto_raw_tp;
case BPF_FUNC_get_stack:
......@@ -1074,7 +1080,9 @@ static bool raw_tp_prog_is_valid_access(int off, int size,
return false;
if (off % size != 0)
return false;
return true;
if (!prog->aux->attach_btf_id)
return true;
return btf_ctx_access(off, size, type, prog, info);
}
const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
......
......@@ -218,10 +218,18 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
if (!range_is_zero(__skb, offsetof(struct __sk_buff, cb) +
FIELD_SIZEOF(struct __sk_buff, cb),
offsetof(struct __sk_buff, tstamp)))
return -EINVAL;
/* tstamp is allowed */
if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
FIELD_SIZEOF(struct __sk_buff, tstamp),
sizeof(struct __sk_buff)))
return -EINVAL;
skb->priority = __skb->priority;
skb->tstamp = __skb->tstamp;
memcpy(&cb->data, __skb->cb, QDISC_CB_PRIV_LEN);
return 0;
......@@ -235,6 +243,7 @@ static void convert_skb_to___skb(struct sk_buff *skb, struct __sk_buff *__skb)
return;
__skb->priority = skb->priority;
__skb->tstamp = skb->tstamp;
memcpy(__skb->cb, &cb->data, QDISC_CB_PRIV_LEN);
}
......
......@@ -3798,7 +3798,7 @@ BPF_CALL_5(bpf_skb_event_output, struct sk_buff *, skb, struct bpf_map *, map,
if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
return -EINVAL;
if (unlikely(skb_size > skb->len))
if (unlikely(!skb || skb_size > skb->len))
return -EFAULT;
return bpf_event_output(map, flags, meta, meta_size, skb, skb_size,
......@@ -3816,6 +3816,19 @@ static const struct bpf_func_proto bpf_skb_event_output_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};
static u32 bpf_skb_output_btf_ids[5];
const struct bpf_func_proto bpf_skb_output_proto = {
.func = bpf_skb_event_output,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID,
.arg2_type = ARG_CONST_MAP_PTR,
.arg3_type = ARG_ANYTHING,
.arg4_type = ARG_PTR_TO_MEM,
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
.btf_id = bpf_skb_output_btf_ids,
};
static unsigned short bpf_tunnel_key_af(u64 flags)
{
return flags & BPF_F_TUNINFO_IPV6 ? AF_INET6 : AF_INET;
......
......@@ -488,8 +488,8 @@ class PrinterHelpers(Printer):
return t
if t in self.mapped_types:
return self.mapped_types[t]
print("")
print("Unrecognized type '%s', please add it to known types!" % t)
print("Unrecognized type '%s', please add it to known types!" % t,
file=sys.stderr)
sys.exit(1)
seen_helpers = set()
......
......@@ -12,6 +12,9 @@
#include <libbpf.h>
#include <linux/btf.h>
#include <linux/hashtable.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include "btf.h"
#include "json_writer.h"
......@@ -388,6 +391,54 @@ static int dump_btf_c(const struct btf *btf,
return err;
}
static struct btf *btf__parse_raw(const char *file)
{
struct btf *btf;
struct stat st;
__u8 *buf;
FILE *f;
if (stat(file, &st))
return NULL;
f = fopen(file, "rb");
if (!f)
return NULL;
buf = malloc(st.st_size);
if (!buf) {
btf = ERR_PTR(-ENOMEM);
goto exit_close;
}
if ((size_t) st.st_size != fread(buf, 1, st.st_size, f)) {
btf = ERR_PTR(-EINVAL);
goto exit_free;
}
btf = btf__new(buf, st.st_size);
exit_free:
free(buf);
exit_close:
fclose(f);
return btf;
}
static bool is_btf_raw(const char *file)
{
__u16 magic = 0;
int fd;
fd = open(file, O_RDONLY);
if (fd < 0)
return false;
read(fd, &magic, sizeof(magic));
close(fd);
return magic == BTF_MAGIC;
}
static int do_dump(int argc, char **argv)
{
struct btf *btf = NULL;
......@@ -465,7 +516,11 @@ static int do_dump(int argc, char **argv)
}
NEXT_ARG();
} else if (is_prefix(src, "file")) {
btf = btf__parse_elf(*argv, NULL);
if (is_btf_raw(*argv))
btf = btf__parse_raw(*argv);
else
btf = btf__parse_elf(*argv, NULL);
if (IS_ERR(btf)) {
err = PTR_ERR(btf);
btf = NULL;
......
......@@ -1091,8 +1091,11 @@ static int do_run(int argc, char **argv)
static int load_with_options(int argc, char **argv, bool first_prog_only)
{
struct bpf_object_load_attr load_attr = { 0 };
enum bpf_prog_type common_prog_type = BPF_PROG_TYPE_UNSPEC;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
.relaxed_maps = relaxed_maps,
);
struct bpf_object_load_attr load_attr = { 0 };
enum bpf_attach_type expected_attach_type;
struct map_replace *map_replace = NULL;
struct bpf_program *prog = NULL, *pos;
......@@ -1106,9 +1109,6 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
const char *file;
int idx, err;
LIBBPF_OPTS(bpf_object_open_opts, open_opts,
.relaxed_maps = relaxed_maps,
);
if (!REQ_ARGS(2))
return -1;
......
......@@ -420,6 +420,7 @@ union bpf_attr {
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
......@@ -2750,6 +2751,30 @@ union bpf_attr {
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
*
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
*
* int bpf_skb_output(void *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
* Description
* Write raw *data* blob into a special BPF perf event held by
* *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
* event must have the following attributes: **PERF_SAMPLE_RAW**
* as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
* **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
*
* The *flags* are used to indicate the index in *map* for which
* the value must be put, masked with **BPF_F_INDEX_MASK**.
* Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
* to indicate that the index of the current CPU core should be
* used.
*
* The value to write, of *size*, is passed through eBPF stack and
* pointed by *data*.
*
* *ctx* is a pointer to in-kernel struct sk_buff.
*
* This helper is similar to **bpf_perf_event_output**\ () but
* restricted to raw_tracepoint bpf programs.
* Return
* 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -2862,7 +2887,8 @@ union bpf_attr {
FN(sk_storage_get), \
FN(sk_storage_delete), \
FN(send_signal), \
FN(tcp_gen_syncookie),
FN(tcp_gen_syncookie), \
FN(skb_output),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
......
......@@ -167,6 +167,8 @@ enum {
IFLA_NEW_IFINDEX,
IFLA_MIN_MTU,
IFLA_MAX_MTU,
IFLA_PROP_LIST,
IFLA_ALT_IFNAME, /* Alternative ifname */
__IFLA_MAX
};
......
......@@ -300,3 +300,6 @@ tags:
# Declare the contents of the .PHONY variable as phony. We keep that
# information in a variable so we can use it in if_changed and friends.
.PHONY: $(PHONY)
# Delete partially updated (corrupted) files on error
.DELETE_ON_ERROR:
......@@ -228,6 +228,9 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
memset(&attr, 0, sizeof(attr));
attr.prog_type = load_attr->prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
if (attr.prog_type == BPF_PROG_TYPE_RAW_TRACEPOINT)
/* expected_attach_type is ignored for tracing progs */
attr.attach_btf_id = attr.expected_attach_type;
attr.insn_cnt = (__u32)load_attr->insns_cnt;
attr.insns = ptr_to_u64(load_attr->insns);
attr.license = ptr_to_u64(load_attr->license);
......
......@@ -2,6 +2,28 @@
#ifndef __BPF_CORE_READ_H__
#define __BPF_CORE_READ_H__
/*
* enum bpf_field_info_kind is passed as a second argument into
* __builtin_preserve_field_info() built-in to get a specific aspect of
* a field, captured as a first argument. __builtin_preserve_field_info(field,
* info_kind) returns __u32 integer and produces BTF field relocation, which
* is understood and processed by libbpf during BPF object loading. See
* selftests/bpf for examples.
*/
enum bpf_field_info_kind {
BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_FIELD_EXISTS = 2, /* field existence in target kernel */
};
/*
* Convenience macro to check that field actually exists in target kernel's.
* Returns:
* 1, if matching field is present in target kernel;
* 0, if no matching field found.
*/
#define bpf_core_field_exists(field) \
__builtin_preserve_field_info(field, BPF_FIELD_EXISTS)
/*
* bpf_core_read() abstracts away bpf_probe_read() call and captures offset
* relocation for source address using __builtin_preserve_access_index()
......@@ -12,7 +34,7 @@
* a relocation, which records BTF type ID describing root struct/union and an
* accessor string which describes exact embedded field that was used to take
* an address. See detailed description of this relocation format and
* semantics in comments to struct bpf_offset_reloc in libbpf_internal.h.
* semantics in comments to struct bpf_field_reloc in libbpf_internal.h.
*
* This relocation allows libbpf to adjust BPF instruction to use correct
* actual field offset, based on target kernel BTF type that matches original
......
......@@ -390,14 +390,14 @@ struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext)
GElf_Ehdr ehdr;
if (elf_version(EV_CURRENT) == EV_NONE) {
pr_warning("failed to init libelf for %s\n", path);
pr_warn("failed to init libelf for %s\n", path);
return ERR_PTR(-LIBBPF_ERRNO__LIBELF);
}
fd = open(path, O_RDONLY);
if (fd < 0) {
err = -errno;
pr_warning("failed to open %s: %s\n", path, strerror(errno));
pr_warn("failed to open %s: %s\n", path, strerror(errno));
return ERR_PTR(err);
}
......@@ -405,19 +405,19 @@ struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext)
elf = elf_begin(fd, ELF_C_READ, NULL);
if (!elf) {
pr_warning("failed to open %s as ELF file\n", path);
pr_warn("failed to open %s as ELF file\n", path);
goto done;
}
if (!gelf_getehdr(elf, &ehdr)) {
pr_warning("failed to get EHDR from %s\n", path);
pr_warn("failed to get EHDR from %s\n", path);
goto done;
}
if (!btf_check_endianness(&ehdr)) {
pr_warning("non-native ELF endianness is not supported\n");
pr_warn("non-native ELF endianness is not supported\n");
goto done;
}
if (!elf_rawdata(elf_getscn(elf, ehdr.e_shstrndx), NULL)) {
pr_warning("failed to get e_shstrndx from %s\n", path);
pr_warn("failed to get e_shstrndx from %s\n", path);
goto done;
}
......@@ -427,29 +427,29 @@ struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext)
idx++;
if (gelf_getshdr(scn, &sh) != &sh) {
pr_warning("failed to get section(%d) header from %s\n",
idx, path);
pr_warn("failed to get section(%d) header from %s\n",
idx, path);
goto done;
}
name = elf_strptr(elf, ehdr.e_shstrndx, sh.sh_name);
if (!name) {
pr_warning("failed to get section(%d) name from %s\n",
idx, path);
pr_warn("failed to get section(%d) name from %s\n",
idx, path);
goto done;
}
if (strcmp(name, BTF_ELF_SEC) == 0) {
btf_data = elf_getdata(scn, 0);
if (!btf_data) {
pr_warning("failed to get section(%d, %s) data from %s\n",
idx, name, path);
pr_warn("failed to get section(%d, %s) data from %s\n",
idx, name, path);
goto done;
}
continue;
} else if (btf_ext && strcmp(name, BTF_EXT_ELF_SEC) == 0) {
btf_ext_data = elf_getdata(scn, 0);
if (!btf_ext_data) {
pr_warning("failed to get section(%d, %s) data from %s\n",
idx, name, path);
pr_warn("failed to get section(%d, %s) data from %s\n",
idx, name, path);
goto done;
}
continue;
......@@ -600,9 +600,9 @@ int btf__load(struct btf *btf)
log_buf, log_buf_size, false);
if (btf->fd < 0) {
err = -errno;
pr_warning("Error loading BTF: %s(%d)\n", strerror(errno), errno);
pr_warn("Error loading BTF: %s(%d)\n", strerror(errno), errno);
if (*log_buf)
pr_warning("%s\n", log_buf);
pr_warn("%s\n", log_buf);
goto done;
}
......@@ -707,8 +707,8 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
if (snprintf(container_name, max_name, "____btf_map_%s", map_name) ==
max_name) {
pr_warning("map:%s length of '____btf_map_%s' is too long\n",
map_name, map_name);
pr_warn("map:%s length of '____btf_map_%s' is too long\n",
map_name, map_name);
return -EINVAL;
}
......@@ -721,14 +721,14 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
container_type = btf__type_by_id(btf, container_id);
if (!container_type) {
pr_warning("map:%s cannot find BTF type for container_id:%u\n",
map_name, container_id);
pr_warn("map:%s cannot find BTF type for container_id:%u\n",
map_name, container_id);
return -EINVAL;
}
if (!btf_is_struct(container_type) || btf_vlen(container_type) < 2) {
pr_warning("map:%s container_name:%s is an invalid container struct\n",
map_name, container_name);
pr_warn("map:%s container_name:%s is an invalid container struct\n",
map_name, container_name);
return -EINVAL;
}
......@@ -737,25 +737,25 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
key_size = btf__resolve_size(btf, key->type);
if (key_size < 0) {
pr_warning("map:%s invalid BTF key_type_size\n", map_name);
pr_warn("map:%s invalid BTF key_type_size\n", map_name);
return key_size;
}
if (expected_key_size != key_size) {
pr_warning("map:%s btf_key_type_size:%u != map_def_key_size:%u\n",
map_name, (__u32)key_size, expected_key_size);
pr_warn("map:%s btf_key_type_size:%u != map_def_key_size:%u\n",
map_name, (__u32)key_size, expected_key_size);
return -EINVAL;
}
value_size = btf__resolve_size(btf, value->type);
if (value_size < 0) {
pr_warning("map:%s invalid BTF value_type_size\n", map_name);
pr_warn("map:%s invalid BTF value_type_size\n", map_name);
return value_size;
}
if (expected_value_size != value_size) {
pr_warning("map:%s btf_value_type_size:%u != map_def_value_size:%u\n",
map_name, (__u32)value_size, expected_value_size);
pr_warn("map:%s btf_value_type_size:%u != map_def_value_size:%u\n",
map_name, (__u32)value_size, expected_value_size);
return -EINVAL;
}
......@@ -888,14 +888,14 @@ static int btf_ext_setup_line_info(struct btf_ext *btf_ext)
return btf_ext_setup_info(btf_ext, &param);
}
static int btf_ext_setup_offset_reloc(struct btf_ext *btf_ext)
static int btf_ext_setup_field_reloc(struct btf_ext *btf_ext)
{
struct btf_ext_sec_setup_param param = {
.off = btf_ext->hdr->offset_reloc_off,
.len = btf_ext->hdr->offset_reloc_len,
.min_rec_size = sizeof(struct bpf_offset_reloc),
.ext_info = &btf_ext->offset_reloc_info,
.desc = "offset_reloc",
.off = btf_ext->hdr->field_reloc_off,
.len = btf_ext->hdr->field_reloc_len,
.min_rec_size = sizeof(struct bpf_field_reloc),
.ext_info = &btf_ext->field_reloc_info,
.desc = "field_reloc",
};
return btf_ext_setup_info(btf_ext, &param);
......@@ -975,9 +975,9 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
goto done;
if (btf_ext->hdr->hdr_len <
offsetofend(struct btf_ext_header, offset_reloc_len))
offsetofend(struct btf_ext_header, field_reloc_len))
goto done;
err = btf_ext_setup_offset_reloc(btf_ext);
err = btf_ext_setup_field_reloc(btf_ext);
if (err)
goto done;
......
......@@ -60,8 +60,8 @@ struct btf_ext_header {
__u32 line_info_len;
/* optional part of .BTF.ext header */
__u32 offset_reloc_off;
__u32 offset_reloc_len;
__u32 field_reloc_off;
__u32 field_reloc_len;
};
LIBBPF_API void btf__free(struct btf *btf);
......
......@@ -428,7 +428,7 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
/* type loop, but resolvable through fwd declaration */
if (btf_is_composite(t) && through_ptr && t->name_off != 0)
return 0;
pr_warning("unsatisfiable type cycle, id:[%u]\n", id);
pr_warn("unsatisfiable type cycle, id:[%u]\n", id);
return -ELOOP;
}
......@@ -636,8 +636,8 @@ static void btf_dump_emit_type(struct btf_dump *d, __u32 id, __u32 cont_id)
if (id == cont_id)
return;
if (t->name_off == 0) {
pr_warning("anonymous struct/union loop, id:[%u]\n",
id);
pr_warn("anonymous struct/union loop, id:[%u]\n",
id);
return;
}
btf_dump_emit_struct_fwd(d, id, t);
......@@ -782,7 +782,7 @@ static int btf_align_of(const struct btf *btf, __u32 id)
return align;
}
default:
pr_warning("unsupported BTF_KIND:%u\n", btf_kind(t));
pr_warn("unsupported BTF_KIND:%u\n", btf_kind(t));
return 1;
}
}
......@@ -1067,7 +1067,7 @@ static void btf_dump_emit_type_decl(struct btf_dump *d, __u32 id,
* chain, restore stack, emit warning, and try to
* proceed nevertheless
*/
pr_warning("not enough memory for decl stack:%d", err);
pr_warn("not enough memory for decl stack:%d", err);
d->decl_stack_cnt = stack_start;
return;
}
......@@ -1096,8 +1096,8 @@ static void btf_dump_emit_type_decl(struct btf_dump *d, __u32 id,
case BTF_KIND_TYPEDEF:
goto done;
default:
pr_warning("unexpected type in decl chain, kind:%u, id:[%u]\n",
btf_kind(t), id);
pr_warn("unexpected type in decl chain, kind:%u, id:[%u]\n",
btf_kind(t), id);
goto done;
}
}
......@@ -1323,8 +1323,8 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
return;
}
default:
pr_warning("unexpected type in decl chain, kind:%u, id:[%u]\n",
kind, id);
pr_warn("unexpected type in decl chain, kind:%u, id:[%u]\n",
kind, id);
return;
}
......
此差异已折叠。
......@@ -75,14 +75,19 @@ struct bpf_object_open_attr {
* have all the padding bytes initialized to zero. It's not guaranteed though,
* when copying literal, that compiler won't copy garbage in literal's padding
* bytes, but that's the best way I've found and it seems to work in practice.
*
* Macro declares opts struct of given type and name, zero-initializes,
* including any extra padding, it with memset() and then assigns initial
* values provided by users in struct initializer-syntax as varargs.
*/
#define LIBBPF_OPTS(TYPE, NAME, ...) \
struct TYPE NAME; \
memset(&NAME, 0, sizeof(struct TYPE)); \
NAME = (struct TYPE) { \
.sz = sizeof(struct TYPE), \
__VA_ARGS__ \
}
#define DECLARE_LIBBPF_OPTS(TYPE, NAME, ...) \
struct TYPE NAME = ({ \
memset(&NAME, 0, sizeof(struct TYPE)); \
(struct TYPE) { \
.sz = sizeof(struct TYPE), \
__VA_ARGS__ \
}; \
})
struct bpf_object_open_opts {
/* size of this struct, for forward/backward compatiblity */
......@@ -96,8 +101,10 @@ struct bpf_object_open_opts {
const char *object_name;
/* parse map definitions non-strictly, allowing extra attributes/data */
bool relaxed_maps;
/* process CO-RE relocations non-strictly, allowing them to fail */
bool relaxed_core_relocs;
};
#define bpf_object_open_opts__last_field relaxed_maps
#define bpf_object_open_opts__last_field relaxed_core_relocs
LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
LIBBPF_API struct bpf_object *
......@@ -300,8 +307,13 @@ LIBBPF_API int bpf_program__set_sched_cls(struct bpf_program *prog);
LIBBPF_API int bpf_program__set_sched_act(struct bpf_program *prog);
LIBBPF_API int bpf_program__set_xdp(struct bpf_program *prog);
LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
enum bpf_prog_type type);
LIBBPF_API enum bpf_attach_type
bpf_program__get_expected_attach_type(struct bpf_program *prog);
LIBBPF_API void
bpf_program__set_expected_attach_type(struct bpf_program *prog,
enum bpf_attach_type type);
......
......@@ -195,4 +195,6 @@ LIBBPF_0.0.6 {
global:
bpf_object__open_file;
bpf_object__open_mem;
bpf_program__get_expected_attach_type;
bpf_program__get_type;
} LIBBPF_0.0.5;
......@@ -59,7 +59,7 @@ do { \
libbpf_print(level, "libbpf: " fmt, ##__VA_ARGS__); \
} while (0)
#define pr_warning(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__)
#define pr_warn(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__)
#define pr_info(fmt, ...) __pr(LIBBPF_INFO, fmt, ##__VA_ARGS__)
#define pr_debug(fmt, ...) __pr(LIBBPF_DEBUG, fmt, ##__VA_ARGS__)
......@@ -68,7 +68,7 @@ static inline bool libbpf_validate_opts(const char *opts,
const char *type_name)
{
if (user_sz < sizeof(size_t)) {
pr_warning("%s size (%zu) is too small\n", type_name, user_sz);
pr_warn("%s size (%zu) is too small\n", type_name, user_sz);
return false;
}
if (user_sz > opts_sz) {
......@@ -76,8 +76,8 @@ static inline bool libbpf_validate_opts(const char *opts,
for (i = opts_sz; i < user_sz; i++) {
if (opts[i]) {
pr_warning("%s has non-zero extra bytes",
type_name);
pr_warn("%s has non-zero extra bytes",
type_name);
return false;
}
}
......@@ -126,7 +126,7 @@ struct btf_ext {
};
struct btf_ext_info func_info;
struct btf_ext_info line_info;
struct btf_ext_info offset_reloc_info;
struct btf_ext_info field_reloc_info;
__u32 data_size;
};
......@@ -151,13 +151,23 @@ struct bpf_line_info_min {
__u32 line_col;
};
/* The minimum bpf_offset_reloc checked by the loader
/* bpf_field_info_kind encodes which aspect of captured field has to be
* adjusted by relocations. Currently supported values are:
* - BPF_FIELD_BYTE_OFFSET: field offset (in bytes);
* - BPF_FIELD_EXISTS: field existence (1, if field exists; 0, otherwise);
*/
enum bpf_field_info_kind {
BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_FIELD_EXISTS = 2, /* field existence in target kernel */
};
/* The minimum bpf_field_reloc checked by the loader
*
* Offset relocation captures the following data:
* Field relocation captures the following data:
* - insn_off - instruction offset (in bytes) within a BPF program that needs
* its insn->imm field to be relocated with actual offset;
* its insn->imm field to be relocated with actual field info;
* - type_id - BTF type ID of the "root" (containing) entity of a relocatable
* offset;
* field;
* - access_str_off - offset into corresponding .BTF string section. String
* itself encodes an accessed field using a sequence of field and array
* indicies, separated by colon (:). It's conceptually very close to LLVM's
......@@ -188,15 +198,16 @@ struct bpf_line_info_min {
* bpf_probe_read(&dst, sizeof(dst),
* __builtin_preserve_access_index(&src->a.b.c));
*
* In this case Clang will emit offset relocation recording necessary data to
* In this case Clang will emit field relocation recording necessary data to
* be able to find offset of embedded `a.b.c` field within `src` struct.
*
* [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
*/
struct bpf_offset_reloc {
struct bpf_field_reloc {
__u32 insn_off;
__u32 type_id;
__u32 access_str_off;
enum bpf_field_info_kind kind;
};
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
......@@ -274,33 +274,55 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
/* This is the C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* {
* int index = ctx->rx_queue_index;
* int ret, index = ctx->rx_queue_index;
*
* // A set entry here means that the correspnding queue_id
* // has an active AF_XDP socket bound to it.
* ret = bpf_redirect_map(&xsks_map, index, XDP_PASS);
* if (ret > 0)
* return ret;
*
* // Fallback for pre-5.3 kernels, not supporting default
* // action in the flags parameter.
* if (bpf_map_lookup_elem(&xsks_map, &index))
* return bpf_redirect_map(&xsks_map, index, 0);
*
* return XDP_PASS;
* }
*/
struct bpf_insn prog[] = {
/* r1 = *(u32 *)(r1 + 16) */
BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
/* *(u32 *)(r10 - 4) = r1 */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
/* r2 = *(u32 *)(r1 + 16) */
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 16),
/* *(u32 *)(r10 - 4) = r2 */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4),
/* r1 = xskmap[] */
BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
/* r3 = XDP_PASS */
BPF_MOV64_IMM(BPF_REG_3, 2),
/* call bpf_redirect_map */
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
/* if w0 != 0 goto pc+13 */
BPF_JMP32_IMM(BPF_JSGT, BPF_REG_0, 0, 13),
/* r2 = r10 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
/* r2 += -4 */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
/* r1 = xskmap[] */
BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
/* call bpf_map_lookup_elem */
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
/* r1 = r0 */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_MOV32_IMM(BPF_REG_0, 2),
/* if r1 == 0 goto +5 */
/* r0 = XDP_PASS */
BPF_MOV64_IMM(BPF_REG_0, 2),
/* if r1 == 0 goto pc+5 */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
/* r2 = *(u32 *)(r10 - 4) */
BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
BPF_MOV32_IMM(BPF_REG_3, 0),
/* r1 = xskmap[] */
BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
/* r3 = 0 */
BPF_MOV64_IMM(BPF_REG_3, 0),
/* call bpf_redirect_map */
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
/* The jumps are to this instruction */
BPF_EXIT_INSN(),
......@@ -311,7 +333,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
"LGPL-2.1 or BSD-2-Clause", 0, log_buf,
log_buf_size);
if (prog_fd < 0) {
pr_warning("BPF log buffer:\n%s", log_buf);
pr_warn("BPF log buffer:\n%s", log_buf);
return prog_fd;
}
......@@ -499,7 +521,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
return -EFAULT;
if (umem->refcount) {
pr_warning("Error: shared umems not supported by libbpf.\n");
pr_warn("Error: shared umems not supported by libbpf.\n");
return -EBUSY;
}
......
......@@ -7,11 +7,10 @@ FEATURE-DUMP.libbpf
fixdep
test_align
test_dev_cgroup
test_progs
/test_progs*
test_tcpbpf_user
test_verifier_log
feature
test_libbpf_open
test_sock
test_sock_addr
test_sock_fields
......@@ -33,9 +32,10 @@ test_tcpnotify_user
test_libbpf
test_tcp_check_syncookie_user
test_sysctl
alu32
libbpf.pc
libbpf.so.*
test_hashmap
test_btf_dump
xdping
/no_alu32
/bpf_gcc
......@@ -50,7 +50,7 @@ void test_attach_probe(void)
const int kprobe_idx = 0, kretprobe_idx = 1;
const int uprobe_idx = 2, uretprobe_idx = 3;
const char *obj_name = "attach_probe";
LIBBPF_OPTS(bpf_object_open_opts, open_opts,
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
.object_name = obj_name,
.relaxed_maps = true,
);
......@@ -99,11 +99,6 @@ void test_attach_probe(void)
"prog '%s' not found\n", uretprobe_name))
goto cleanup;
bpf_program__set_kprobe(kprobe_prog);
bpf_program__set_kprobe(kretprobe_prog);
bpf_program__set_kprobe(uprobe_prog);
bpf_program__set_kprobe(uretprobe_prog);
/* create maps && load programs */
err = bpf_object__load(obj);
if (CHECK(err, "obj_load", "err %d\n", err))
......
......@@ -174,6 +174,21 @@
.fails = true, \
}
#define EXISTENCE_DATA(struct_name) STRUCT_TO_CHAR_PTR(struct_name) { \
.a = 42, \
}
#define EXISTENCE_CASE_COMMON(name) \
.case_name = #name, \
.bpf_obj_file = "test_core_reloc_existence.o", \
.btf_src_file = "btf__core_reloc_" #name ".o", \
.relaxed_core_relocs = true \
#define EXISTENCE_ERR_CASE(name) { \
EXISTENCE_CASE_COMMON(name), \
.fails = true, \
}
struct core_reloc_test_case {
const char *case_name;
const char *bpf_obj_file;
......@@ -183,6 +198,7 @@ struct core_reloc_test_case {
const char *output;
int output_len;
bool fails;
bool relaxed_core_relocs;
};
static struct core_reloc_test_case test_cases[] = {
......@@ -195,8 +211,8 @@ static struct core_reloc_test_case test_cases[] = {
.input_len = 0,
.output = STRUCT_TO_CHAR_PTR(core_reloc_kernel_output) {
.valid = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, },
.comm = "test_progs\0\0\0\0\0",
.comm_len = 11,
.comm = "test_progs",
.comm_len = sizeof("test_progs"),
},
.output_len = sizeof(struct core_reloc_kernel_output),
},
......@@ -283,6 +299,59 @@ static struct core_reloc_test_case test_cases[] = {
},
.output_len = sizeof(struct core_reloc_misc_output),
},
/* validate field existence checks */
{
EXISTENCE_CASE_COMMON(existence),
.input = STRUCT_TO_CHAR_PTR(core_reloc_existence) {
.a = 1,
.b = 2,
.c = 3,
.arr = { 4 },
.s = { .x = 5 },
},
.input_len = sizeof(struct core_reloc_existence),
.output = STRUCT_TO_CHAR_PTR(core_reloc_existence_output) {
.a_exists = 1,
.b_exists = 1,
.c_exists = 1,
.arr_exists = 1,
.s_exists = 1,
.a_value = 1,
.b_value = 2,
.c_value = 3,
.arr_value = 4,
.s_value = 5,
},
.output_len = sizeof(struct core_reloc_existence_output),
},
{
EXISTENCE_CASE_COMMON(existence___minimal),
.input = STRUCT_TO_CHAR_PTR(core_reloc_existence___minimal) {
.a = 42,
},
.input_len = sizeof(struct core_reloc_existence),
.output = STRUCT_TO_CHAR_PTR(core_reloc_existence_output) {
.a_exists = 1,
.b_exists = 0,
.c_exists = 0,
.arr_exists = 0,
.s_exists = 0,
.a_value = 42,
.b_value = 0xff000002u,
.c_value = 0xff000003u,
.arr_value = 0xff000004u,
.s_value = 0xff000005u,
},
.output_len = sizeof(struct core_reloc_existence_output),
},
EXISTENCE_ERR_CASE(existence__err_int_sz),
EXISTENCE_ERR_CASE(existence__err_int_type),
EXISTENCE_ERR_CASE(existence__err_int_kind),
EXISTENCE_ERR_CASE(existence__err_arr_kind),
EXISTENCE_ERR_CASE(existence__err_arr_value_type),
EXISTENCE_ERR_CASE(existence__err_struct_type),
};
struct data {
......@@ -305,11 +374,14 @@ void test_core_reloc(void)
for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
test_case = &test_cases[i];
if (!test__start_subtest(test_case->case_name))
continue;
obj = bpf_object__open(test_case->bpf_obj_file);
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
.relaxed_core_relocs = test_case->relaxed_core_relocs,
);
obj = bpf_object__open_file(test_case->bpf_obj_file, &opts);
if (CHECK(IS_ERR_OR_NULL(obj), "obj_open",
"failed to open '%s': %ld\n",
test_case->bpf_obj_file, PTR_ERR(obj)))
......@@ -319,7 +391,6 @@ void test_core_reloc(void)
if (CHECK(!prog, "find_probe",
"prog '%s' not found\n", probe_name))
goto cleanup;
bpf_program__set_type(prog, BPF_PROG_TYPE_RAW_TRACEPOINT);
load_attr.obj = obj;
load_attr.log_level = 0;
......
......@@ -91,12 +91,18 @@ static void do_flow_dissector_reattach(void)
void test_flow_dissector_reattach(void)
{
int init_net, err;
int init_net, self_net, err;
self_net = open("/proc/self/ns/net", O_RDONLY);
if (CHECK_FAIL(self_net < 0)) {
perror("open(/proc/self/ns/net");
return;
}
init_net = open("/proc/1/ns/net", O_RDONLY);
if (CHECK_FAIL(init_net < 0)) {
perror("open(/proc/1/ns/net)");
return;
goto out_close;
}
err = setns(init_net, CLONE_NEWNET);
......@@ -108,7 +114,7 @@ void test_flow_dissector_reattach(void)
if (is_attached(init_net)) {
test__skip();
printf("Can't test with flow dissector attached to init_net\n");
return;
goto out_setns;
}
/* First run tests in root network namespace */
......@@ -118,10 +124,17 @@ void test_flow_dissector_reattach(void)
err = unshare(CLONE_NEWNET);
if (CHECK_FAIL(err)) {
perror("unshare(CLONE_NEWNET)");
goto out_close;
goto out_setns;
}
do_flow_dissector_reattach();
out_setns:
/* Move back to netns we started in. */
err = setns(self_net, CLONE_NEWNET);
if (CHECK_FAIL(err))
perror("setns(/proc/self/ns/net)");
out_close:
close(init_net);
close(self_net);
}
此差异已折叠。
#include "core_reloc_types.h"
void f(struct core_reloc_existence x) {}
#include "core_reloc_types.h"
void f(struct core_reloc_existence___err_wrong_arr_kind x) {}
#include "core_reloc_types.h"
void f(struct core_reloc_existence___err_wrong_arr_value_type x) {}
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册