未验证 提交 d3265aba 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!70 BPF: Add CO-RE support for openEuler-22.09 v3

Merge Pull Request from: @zhengzengkai 
 
This set introduces CO-RE support in the kernel.
There are several reasons to add such support:
1. It's a step toward signed BPF programs.
2. It allows golang like languages that struggle to adopt libbpf
   to take advantage of CO-RE powers.
3. Currently the field accessed by 'ldx [R1 + 10]' insn is recognized
   by the verifier purely based on +10 offset. If R1 points to a union
   the verifier picks one of the fields at this offset.
   With CO-RE the kernel can disambiguate the field access. 
 
Link:https://gitee.com/openeuler/kernel/pulls/70 
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> 
Signed-off-by: Xu Kuohai <xukuohai@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
...@@ -15,3 +15,11 @@ Description: ...@@ -15,3 +15,11 @@ Description:
information with description of all internal kernel types. See information with description of all internal kernel types. See
Documentation/bpf/btf.rst for detailed description of format Documentation/bpf/btf.rst for detailed description of format
itself. itself.
What: /sys/kernel/btf/<module-name>
Date: Nov 2020
KernelVersion: 5.11
Contact: bpf@vger.kernel.org
Description:
Read-only binary attribute exposing kernel module's BTF type
information as an add-on to the kernel's BTF (/sys/kernel/btf/vmlinux).
...@@ -84,6 +84,8 @@ sequentially and type id is assigned to each recognized type starting from id ...@@ -84,6 +84,8 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ #define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
#define BTF_KIND_VAR 14 /* Variable */ #define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */ #define BTF_KIND_DATASEC 15 /* Section */
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */
#define BTF_KIND_TYPE_TAG 18 /* Type Tag */
Note that the type section encodes debug info, not just pure types. Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram. ``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
...@@ -105,7 +107,7 @@ Each type contains the following common data:: ...@@ -105,7 +107,7 @@ Each type contains the following common data::
* "size" tells the size of the type it is describing. * "size" tells the size of the type it is describing.
* *
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC and FUNC_PROTO. * FUNC, FUNC_PROTO, DECL_TAG and TYPE_TAG.
* "type" is a type_id referring to another type. * "type" is a type_id referring to another type.
*/ */
union { union {
...@@ -452,6 +454,42 @@ map definition. ...@@ -452,6 +454,42 @@ map definition.
* ``offset``: the in-section offset of the variable * ``offset``: the in-section offset of the variable
* ``size``: the size of the variable in bytes * ``size``: the size of the variable in bytes
2.2.17 BTF_KIND_DECL_TAG
~~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a non-empty string
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_DECL_TAG
* ``info.vlen``: 0
* ``type``: ``struct``, ``union``, ``func``, ``var`` or ``typedef``
``btf_type`` is followed by ``struct btf_decl_tag``.::
struct btf_decl_tag {
__u32 component_idx;
};
The ``name_off`` encodes btf_decl_tag attribute string.
The ``type`` should be ``struct``, ``union``, ``func``, ``var`` or ``typedef``.
For ``var`` or ``typedef`` type, ``btf_decl_tag.component_idx`` must be ``-1``.
For the other three types, if the btf_decl_tag attribute is
applied to the ``struct``, ``union`` or ``func`` itself,
``btf_decl_tag.component_idx`` must be ``-1``. Otherwise,
the attribute is applied to a ``struct``/``union`` member or
a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
valid index (starting from 0) pointing to a member or an argument.
2.2.17 BTF_KIND_TYPE_TAG
~~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a non-empty string
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_TYPE_TAG
* ``info.vlen``: 0
* ``type``: the type with ``btf_type_tag`` attribute
3. BTF Kernel API 3. BTF Kernel API
***************** *****************
......
...@@ -1012,7 +1012,7 @@ Mode modifier is one of:: ...@@ -1012,7 +1012,7 @@ Mode modifier is one of::
BPF_MEM 0x60 BPF_MEM 0x60
BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */ BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */ BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
BPF_XADD 0xc0 /* eBPF only, exclusive add */ BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data. (BPF_IND | <size> | BPF_LD) which are used to access packet data.
...@@ -1044,11 +1044,50 @@ Unlike classic BPF instruction set, eBPF has generic load/store operations:: ...@@ -1044,11 +1044,50 @@ Unlike classic BPF instruction set, eBPF has generic load/store operations::
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32 BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off) BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
2 byte atomic increments are not supported.
It also includes atomic operations, which use the immediate field for extra
encoding.
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
The basic atomic operations supported are:
BPF_ADD
BPF_AND
BPF_OR
BPF_XOR
Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
memory location addresed by ``dst_reg + off`` is atomically modified, with
``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
immediate, then these operations also overwrite ``src_reg`` with the
value that was in memory before it was modified.
The more special operations are:
BPF_XCHG
This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
off``.
BPF_CMPXCHG
This atomically compares the value addressed by ``dst_reg + off`` with
``R0``. If they match it is replaced with ``src_reg``, The value that was there
before is loaded back to ``R0``.
Note that 1 and 2 byte atomic operations are not supported.
Except ``BPF_ADD`` _without_ ``BPF_FETCH`` (for legacy reasons), all 4 byte
atomic operations require alu32 mode. Clang enables this mode by default in
architecture v3 (``-mcpu=v3``). For older versions it can be enabled with
``-Xclang -target-feature -Xclang +alu32``.
You may encounter BPF_XADD - this is a legacy name for BPF_ATOMIC, referring to
the exclusive-add operation encoded when the immediate field is zero.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
......
...@@ -1642,10 +1642,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) ...@@ -1642,10 +1642,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
} }
emit_str_r(dst_lo, tmp2, off, ctx, BPF_SIZE(code)); emit_str_r(dst_lo, tmp2, off, ctx, BPF_SIZE(code));
break; break;
/* STX XADD: lock *(u32 *)(dst + off) += src */ /* Atomic ops */
case BPF_STX | BPF_XADD | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_W:
/* STX XADD: lock *(u64 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_DW:
case BPF_STX | BPF_XADD | BPF_DW:
goto notyet; goto notyet;
/* STX: *(size *)(dst + off) = src */ /* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W: case BPF_STX | BPF_MEM | BPF_W:
......
...@@ -888,10 +888,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, ...@@ -888,10 +888,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
} }
break; break;
/* STX XADD: lock *(u32 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_XADD | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW:
/* STX XADD: lock *(u64 *)(dst + off) += src */ if (insn->imm != BPF_ADD) {
case BPF_STX | BPF_XADD | BPF_DW: pr_err_once("unknown atomic op code %02x\n", insn->imm);
return -EINVAL;
}
/* STX XADD: lock *(u32 *)(dst + off) += src
* and
* STX XADD: lock *(u64 *)(dst + off) += src
*/
if (!off) { if (!off) {
reg = dst; reg = dst;
} else { } else {
......
...@@ -1426,8 +1426,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, ...@@ -1426,8 +1426,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
case BPF_STX | BPF_H | BPF_MEM: case BPF_STX | BPF_H | BPF_MEM:
case BPF_STX | BPF_W | BPF_MEM: case BPF_STX | BPF_W | BPF_MEM:
case BPF_STX | BPF_DW | BPF_MEM: case BPF_STX | BPF_DW | BPF_MEM:
case BPF_STX | BPF_W | BPF_XADD: case BPF_STX | BPF_W | BPF_ATOMIC:
case BPF_STX | BPF_DW | BPF_XADD: case BPF_STX | BPF_DW | BPF_ATOMIC:
if (insn->dst_reg == BPF_REG_10) { if (insn->dst_reg == BPF_REG_10) {
ctx->flags |= EBPF_SEEN_FP; ctx->flags |= EBPF_SEEN_FP;
dst = MIPS_R_SP; dst = MIPS_R_SP;
...@@ -1441,7 +1441,12 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, ...@@ -1441,7 +1441,12 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp); src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
if (src < 0) if (src < 0)
return src; return src;
if (BPF_MODE(insn->code) == BPF_XADD) { if (BPF_MODE(insn->code) == BPF_ATOMIC) {
if (insn->imm != BPF_ADD) {
pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
return -EINVAL;
}
/* /*
* If mem_off does not fit within the 9 bit ll/sc * If mem_off does not fit within the 9 bit ll/sc
* instruction immediate field, use a temp reg. * instruction immediate field, use a temp reg.
......
...@@ -756,10 +756,18 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, ...@@ -756,10 +756,18 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
break; break;
/* /*
* BPF_STX XADD (atomic_add) * BPF_STX ATOMIC (atomic ops)
*/ */
case BPF_STX | BPF_ATOMIC | BPF_W:
if (insn->imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);
return -ENOTSUPP;
}
/* *(u32 *)(dst + off) += src */ /* *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
/* Get EA into TMP_REG_1 */ /* Get EA into TMP_REG_1 */
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off)); EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4; tmp_idx = ctx->idx * 4;
...@@ -772,8 +780,15 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, ...@@ -772,8 +780,15 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
/* we're done if this succeeded */ /* we're done if this succeeded */
PPC_BCC_SHORT(COND_NE, tmp_idx); PPC_BCC_SHORT(COND_NE, tmp_idx);
break; break;
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);
return -ENOTSUPP;
}
/* *(u64 *)(dst + off) += src */ /* *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off)); EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4; tmp_idx = ctx->idx * 4;
EMIT(PPC_RAW_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0)); EMIT(PPC_RAW_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0));
......
...@@ -881,7 +881,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off, ...@@ -881,7 +881,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off,
const s8 *rd = bpf_get_reg64(dst, tmp1, ctx); const s8 *rd = bpf_get_reg64(dst, tmp1, ctx);
const s8 *rs = bpf_get_reg64(src, tmp2, ctx); const s8 *rs = bpf_get_reg64(src, tmp2, ctx);
if (mode == BPF_XADD && size != BPF_W) if (mode == BPF_ATOMIC && size != BPF_W)
return -1; return -1;
emit_imm(RV_REG_T0, off, ctx); emit_imm(RV_REG_T0, off, ctx);
...@@ -899,7 +899,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off, ...@@ -899,7 +899,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off,
case BPF_MEM: case BPF_MEM:
emit(rv_sw(RV_REG_T0, 0, lo(rs)), ctx); emit(rv_sw(RV_REG_T0, 0, lo(rs)), ctx);
break; break;
case BPF_XADD: case BPF_ATOMIC: /* Only BPF_ADD supported */
emit(rv_amoadd_w(RV_REG_ZERO, lo(rs), RV_REG_T0, 0, 0), emit(rv_amoadd_w(RV_REG_ZERO, lo(rs), RV_REG_T0, 0, 0),
ctx); ctx);
break; break;
...@@ -1264,7 +1264,6 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, ...@@ -1264,7 +1264,6 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
case BPF_STX | BPF_MEM | BPF_H: case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_W: case BPF_STX | BPF_MEM | BPF_W:
case BPF_STX | BPF_MEM | BPF_DW: case BPF_STX | BPF_MEM | BPF_DW:
case BPF_STX | BPF_XADD | BPF_W:
if (BPF_CLASS(code) == BPF_ST) { if (BPF_CLASS(code) == BPF_ST) {
emit_imm32(tmp2, imm, ctx); emit_imm32(tmp2, imm, ctx);
src = tmp2; src = tmp2;
...@@ -1275,8 +1274,21 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, ...@@ -1275,8 +1274,21 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
return -1; return -1;
break; break;
case BPF_STX | BPF_ATOMIC | BPF_W:
if (insn->imm != BPF_ADD) {
pr_info_once(
"bpf-jit: not supported: atomic operation %02x ***\n",
insn->imm);
return -EFAULT;
}
if (emit_store_r64(dst, src, off, ctx, BPF_SIZE(code),
BPF_MODE(code)))
return -1;
break;
/* No hardware support for 8-byte atomics in RV32. */ /* No hardware support for 8-byte atomics in RV32. */
case BPF_STX | BPF_XADD | BPF_DW: case BPF_STX | BPF_ATOMIC | BPF_DW:
/* Fallthrough. */ /* Fallthrough. */
notsupported: notsupported:
......
...@@ -1031,10 +1031,18 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, ...@@ -1031,10 +1031,18 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
emit_sd(RV_REG_T1, 0, rs, ctx); emit_sd(RV_REG_T1, 0, rs, ctx);
break; break;
/* STX XADD: lock *(u32 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_XADD | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW:
/* STX XADD: lock *(u64 *)(dst + off) += src */ if (insn->imm != BPF_ADD) {
case BPF_STX | BPF_XADD | BPF_DW: pr_err("bpf-jit: not supported: atomic operation %02x ***\n",
insn->imm);
return -EINVAL;
}
/* atomic_add: lock *(u32 *)(dst + off) += src
* atomic_add: lock *(u64 *)(dst + off) += src
*/
if (off) { if (off) {
if (is_12b_int(off)) { if (is_12b_int(off)) {
emit_addi(RV_REG_T1, rd, off, ctx); emit_addi(RV_REG_T1, rd, off, ctx);
......
...@@ -1216,18 +1216,23 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, ...@@ -1216,18 +1216,23 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
jit->seen |= SEEN_MEM; jit->seen |= SEEN_MEM;
break; break;
/* /*
* BPF_STX XADD (atomic_add) * BPF_ATOMIC
*/ */
case BPF_STX | BPF_XADD | BPF_W: /* *(u32 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_DW:
/* laal %w0,%src,off(%dst) */ case BPF_STX | BPF_ATOMIC | BPF_W:
EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W0, src_reg, if (insn->imm != BPF_ADD) {
dst_reg, off); pr_err("Unknown atomic operation %02x\n", insn->imm);
jit->seen |= SEEN_MEM; return -1;
break; }
case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
/* laalg %w0,%src,off(%dst) */ /* *(u32/u64 *)(dst + off) += src
EMIT6_DISP_LH(0xeb000000, 0x00ea, REG_W0, src_reg, *
dst_reg, off); * BFW_W: laal %w0,%src,off(%dst)
* BPF_DW: laalg %w0,%src,off(%dst)
*/
EMIT6_DISP_LH(0xeb000000,
BPF_SIZE(insn->code) == BPF_W ? 0x00fa : 0x00ea,
REG_W0, src_reg, dst_reg, off);
jit->seen |= SEEN_MEM; jit->seen |= SEEN_MEM;
break; break;
/* /*
......
...@@ -1369,12 +1369,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) ...@@ -1369,12 +1369,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break; break;
} }
/* STX XADD: lock *(u32 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_W: {
case BPF_STX | BPF_XADD | BPF_W: {
const u8 tmp = bpf2sparc[TMP_REG_1]; const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2]; const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3]; const u8 tmp3 = bpf2sparc[TMP_REG_3];
if (insn->imm != BPF_ADD) {
pr_err_once("unknown atomic op %02x\n", insn->imm);
return -EINVAL;
}
/* lock *(u32 *)(dst + off) += src */
if (insn->dst_reg == BPF_REG_FP) if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true; ctx->saw_frame_pointer = true;
...@@ -1393,11 +1399,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) ...@@ -1393,11 +1399,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break; break;
} }
/* STX XADD: lock *(u64 *)(dst + off) += src */ /* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW: { case BPF_STX | BPF_ATOMIC | BPF_DW: {
const u8 tmp = bpf2sparc[TMP_REG_1]; const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2]; const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3]; const u8 tmp3 = bpf2sparc[TMP_REG_3];
if (insn->imm != BPF_ADD) {
pr_err_once("unknown atomic op %02x\n", insn->imm);
return -EINVAL;
}
if (insn->dst_reg == BPF_REG_FP) if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true; ctx->saw_frame_pointer = true;
......
...@@ -205,6 +205,18 @@ static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg) ...@@ -205,6 +205,18 @@ static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
return byte + reg2hex[dst_reg] + (reg2hex[src_reg] << 3); return byte + reg2hex[dst_reg] + (reg2hex[src_reg] << 3);
} }
/* Some 1-byte opcodes for binary ALU operations */
static u8 simple_alu_opcodes[] = {
[BPF_ADD] = 0x01,
[BPF_SUB] = 0x29,
[BPF_AND] = 0x21,
[BPF_OR] = 0x09,
[BPF_XOR] = 0x31,
[BPF_LSH] = 0xE0,
[BPF_RSH] = 0xE8,
[BPF_ARSH] = 0xF8,
};
static void jit_fill_hole(void *area, unsigned int size) static void jit_fill_hole(void *area, unsigned int size)
{ {
/* Fill whole space with INT3 instructions */ /* Fill whole space with INT3 instructions */
...@@ -684,6 +696,42 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg) ...@@ -684,6 +696,42 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
*pprog = prog; *pprog = prog;
} }
/* Emit the suffix (ModR/M etc) for addressing *(ptr_reg + off) and val_reg */
static void emit_insn_suffix(u8 **pprog, u32 ptr_reg, u32 val_reg, int off)
{
u8 *prog = *pprog;
int cnt = 0;
if (is_imm8(off)) {
/* 1-byte signed displacement.
*
* If off == 0 we could skip this and save one extra byte, but
* special case of x86 R13 which always needs an offset is not
* worth the hassle
*/
EMIT2(add_2reg(0x40, ptr_reg, val_reg), off);
} else {
/* 4-byte signed displacement */
EMIT1_off32(add_2reg(0x80, ptr_reg, val_reg), off);
}
*pprog = prog;
}
/*
* Emit a REX byte if it will be necessary to address these registers
*/
static void maybe_emit_mod(u8 **pprog, u32 dst_reg, u32 src_reg, bool is64)
{
u8 *prog = *pprog;
int cnt = 0;
if (is64)
EMIT1(add_2mod(0x48, dst_reg, src_reg));
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
*pprog = prog;
}
/* LDX: dst_reg = *(u8*)(src_reg + off) */ /* LDX: dst_reg = *(u8*)(src_reg + off) */
static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off) static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{ {
...@@ -711,15 +759,7 @@ static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off) ...@@ -711,15 +759,7 @@ static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B); EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
break; break;
} }
/* emit_insn_suffix(&prog, src_reg, dst_reg, off);
* If insn->off == 0 we can save one extra byte, but
* special case of x86 R13 which always needs an offset
* is not worth the hassle
*/
if (is_imm8(off))
EMIT2(add_2reg(0x40, src_reg, dst_reg), off);
else
EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), off);
*pprog = prog; *pprog = prog;
} }
...@@ -754,11 +794,51 @@ static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off) ...@@ -754,11 +794,51 @@ static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89); EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
break; break;
} }
if (is_imm8(off)) emit_insn_suffix(&prog, dst_reg, src_reg, off);
EMIT2(add_2reg(0x40, dst_reg, src_reg), off); *pprog = prog;
else }
EMIT1_off32(add_2reg(0x80, dst_reg, src_reg), off);
static int emit_atomic(u8 **pprog, u8 atomic_op,
u32 dst_reg, u32 src_reg, s16 off, u8 bpf_size)
{
u8 *prog = *pprog;
int cnt = 0;
EMIT1(0xF0); /* lock prefix */
maybe_emit_mod(&prog, dst_reg, src_reg, bpf_size == BPF_DW);
/* emit opcode */
switch (atomic_op) {
case BPF_ADD:
case BPF_SUB:
case BPF_AND:
case BPF_OR:
case BPF_XOR:
/* lock *(u32/u64*)(dst_reg + off) <op>= src_reg */
EMIT1(simple_alu_opcodes[atomic_op]);
break;
case BPF_ADD | BPF_FETCH:
/* src_reg = atomic_fetch_add(dst_reg + off, src_reg); */
EMIT2(0x0F, 0xC1);
break;
case BPF_XCHG:
/* src_reg = atomic_xchg(dst_reg + off, src_reg); */
EMIT1(0x87);
break;
case BPF_CMPXCHG:
/* r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg); */
EMIT2(0x0F, 0xB1);
break;
default:
pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op);
return -EFAULT;
}
emit_insn_suffix(&prog, dst_reg, src_reg, off);
*pprog = prog; *pprog = prog;
return 0;
} }
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs) bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
...@@ -790,8 +870,31 @@ static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt, ...@@ -790,8 +870,31 @@ static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt,
} }
} }
static int emit_nops(u8 **pprog, int len)
{
u8 *prog = *pprog;
int i, noplen, cnt = 0;
while (len > 0) {
noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
for (i = 0; i < noplen; i++)
EMIT1(ideal_nops[noplen][i]);
len -= noplen;
}
*pprog = prog;
return cnt;
}
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
int oldproglen, struct jit_context *ctx) int oldproglen, struct jit_context *ctx, bool jmp_padding)
{ {
bool tail_call_reachable = bpf_prog->aux->tail_call_reachable; bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
struct bpf_insn *insn = bpf_prog->insnsi; struct bpf_insn *insn = bpf_prog->insnsi;
...@@ -801,8 +904,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -801,8 +904,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
bool seen_exit = false; bool seen_exit = false;
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY]; u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
int i, cnt = 0, excnt = 0; int i, cnt = 0, excnt = 0;
int proglen = 0; int ilen, proglen = 0;
u8 *prog = temp; u8 *prog = temp;
int err;
detect_reg_usage(insn, insn_cnt, callee_regs_used, detect_reg_usage(insn, insn_cnt, callee_regs_used,
&tail_call_seen); &tail_call_seen);
...@@ -814,7 +918,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -814,7 +918,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
bpf_prog_was_classic(bpf_prog), tail_call_reachable, bpf_prog_was_classic(bpf_prog), tail_call_reachable,
bpf_prog->aux->func_idx != 0); bpf_prog->aux->func_idx != 0);
push_callee_regs(&prog, callee_regs_used); push_callee_regs(&prog, callee_regs_used);
addrs[0] = prog - temp;
ilen = prog - temp;
if (image)
memcpy(image + proglen, temp, ilen);
proglen += ilen;
addrs[0] = proglen;
prog = temp;
for (i = 1; i <= insn_cnt; i++, insn++) { for (i = 1; i <= insn_cnt; i++, insn++) {
const s32 imm32 = insn->imm; const s32 imm32 = insn->imm;
...@@ -823,8 +933,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -823,8 +933,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
u8 b2 = 0, b3 = 0; u8 b2 = 0, b3 = 0;
s64 jmp_offset; s64 jmp_offset;
u8 jmp_cond; u8 jmp_cond;
int ilen;
u8 *func; u8 *func;
int nops;
switch (insn->code) { switch (insn->code) {
/* ALU */ /* ALU */
...@@ -838,17 +948,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -838,17 +948,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
case BPF_ALU64 | BPF_AND | BPF_X: case BPF_ALU64 | BPF_AND | BPF_X:
case BPF_ALU64 | BPF_OR | BPF_X: case BPF_ALU64 | BPF_OR | BPF_X:
case BPF_ALU64 | BPF_XOR | BPF_X: case BPF_ALU64 | BPF_XOR | BPF_X:
switch (BPF_OP(insn->code)) { maybe_emit_mod(&prog, dst_reg, src_reg,
case BPF_ADD: b2 = 0x01; break; BPF_CLASS(insn->code) == BPF_ALU64);
case BPF_SUB: b2 = 0x29; break; b2 = simple_alu_opcodes[BPF_OP(insn->code)];
case BPF_AND: b2 = 0x21; break;
case BPF_OR: b2 = 0x09; break;
case BPF_XOR: b2 = 0x31; break;
}
if (BPF_CLASS(insn->code) == BPF_ALU64)
EMIT1(add_2mod(0x48, dst_reg, src_reg));
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
EMIT2(b2, add_2reg(0xC0, dst_reg, src_reg)); EMIT2(b2, add_2reg(0xC0, dst_reg, src_reg));
break; break;
...@@ -1028,12 +1130,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -1028,12 +1130,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
else if (is_ereg(dst_reg)) else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg)); EMIT1(add_1mod(0x40, dst_reg));
switch (BPF_OP(insn->code)) { b3 = simple_alu_opcodes[BPF_OP(insn->code)];
case BPF_LSH: b3 = 0xE0; break;
case BPF_RSH: b3 = 0xE8; break;
case BPF_ARSH: b3 = 0xF8; break;
}
if (imm32 == 1) if (imm32 == 1)
EMIT2(0xD1, add_1reg(b3, dst_reg)); EMIT2(0xD1, add_1reg(b3, dst_reg));
else else
...@@ -1067,11 +1164,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -1067,11 +1164,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
else if (is_ereg(dst_reg)) else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg)); EMIT1(add_1mod(0x40, dst_reg));
switch (BPF_OP(insn->code)) { b3 = simple_alu_opcodes[BPF_OP(insn->code)];
case BPF_LSH: b3 = 0xE0; break;
case BPF_RSH: b3 = 0xE8; break;
case BPF_ARSH: b3 = 0xF8; break;
}
EMIT2(0xD3, add_1reg(b3, dst_reg)); EMIT2(0xD3, add_1reg(b3, dst_reg));
if (src_reg != BPF_REG_4) if (src_reg != BPF_REG_4)
...@@ -1233,21 +1326,56 @@ st: if (is_imm8(insn->off)) ...@@ -1233,21 +1326,56 @@ st: if (is_imm8(insn->off))
} }
break; break;
/* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */ case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_XADD | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW:
/* Emit 'lock add dword ptr [rax + off], eax' */ if (insn->imm == (BPF_AND | BPF_FETCH) ||
if (is_ereg(dst_reg) || is_ereg(src_reg)) insn->imm == (BPF_OR | BPF_FETCH) ||
EMIT3(0xF0, add_2mod(0x40, dst_reg, src_reg), 0x01); insn->imm == (BPF_XOR | BPF_FETCH)) {
else u8 *branch_target;
EMIT2(0xF0, 0x01); bool is64 = BPF_SIZE(insn->code) == BPF_DW;
goto xadd;
case BPF_STX | BPF_XADD | BPF_DW: /*
EMIT3(0xF0, add_2mod(0x48, dst_reg, src_reg), 0x01); * Can't be implemented with a single x86 insn.
xadd: if (is_imm8(insn->off)) * Need to do a CMPXCHG loop.
EMIT2(add_2reg(0x40, dst_reg, src_reg), insn->off); */
else
EMIT1_off32(add_2reg(0x80, dst_reg, src_reg), /* Will need RAX as a CMPXCHG operand so save R0 */
insn->off); emit_mov_reg(&prog, true, BPF_REG_AX, BPF_REG_0);
branch_target = prog;
/* Load old value */
emit_ldx(&prog, BPF_SIZE(insn->code),
BPF_REG_0, dst_reg, insn->off);
/*
* Perform the (commutative) operation locally,
* put the result in the AUX_REG.
*/
emit_mov_reg(&prog, is64, AUX_REG, BPF_REG_0);
maybe_emit_mod(&prog, AUX_REG, src_reg, is64);
EMIT2(simple_alu_opcodes[BPF_OP(insn->imm)],
add_2reg(0xC0, AUX_REG, src_reg));
/* Attempt to swap in new value */
err = emit_atomic(&prog, BPF_CMPXCHG,
dst_reg, AUX_REG, insn->off,
BPF_SIZE(insn->code));
if (WARN_ON(err))
return err;
/*
* ZF tells us whether we won the race. If it's
* cleared we need to try again.
*/
EMIT2(X86_JNE, -(prog - branch_target) - 2);
/* Return the pre-modification value */
emit_mov_reg(&prog, is64, src_reg, BPF_REG_0);
/* Restore R0 after clobbering RAX */
emit_mov_reg(&prog, true, BPF_REG_0, BPF_REG_AX);
break;
}
err = emit_atomic(&prog, insn->imm, dst_reg, src_reg,
insn->off, BPF_SIZE(insn->code));
if (err)
return err;
break; break;
/* call */ /* call */
...@@ -1298,20 +1426,16 @@ xadd: if (is_imm8(insn->off)) ...@@ -1298,20 +1426,16 @@ xadd: if (is_imm8(insn->off))
case BPF_JMP32 | BPF_JSGE | BPF_X: case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X: case BPF_JMP32 | BPF_JSLE | BPF_X:
/* cmp dst_reg, src_reg */ /* cmp dst_reg, src_reg */
if (BPF_CLASS(insn->code) == BPF_JMP) maybe_emit_mod(&prog, dst_reg, src_reg,
EMIT1(add_2mod(0x48, dst_reg, src_reg)); BPF_CLASS(insn->code) == BPF_JMP);
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
EMIT2(0x39, add_2reg(0xC0, dst_reg, src_reg)); EMIT2(0x39, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp; goto emit_cond_jmp;
case BPF_JMP | BPF_JSET | BPF_X: case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X: case BPF_JMP32 | BPF_JSET | BPF_X:
/* test dst_reg, src_reg */ /* test dst_reg, src_reg */
if (BPF_CLASS(insn->code) == BPF_JMP) maybe_emit_mod(&prog, dst_reg, src_reg,
EMIT1(add_2mod(0x48, dst_reg, src_reg)); BPF_CLASS(insn->code) == BPF_JMP);
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
EMIT2(0x85, add_2reg(0xC0, dst_reg, src_reg)); EMIT2(0x85, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp; goto emit_cond_jmp;
...@@ -1347,10 +1471,8 @@ xadd: if (is_imm8(insn->off)) ...@@ -1347,10 +1471,8 @@ xadd: if (is_imm8(insn->off))
case BPF_JMP32 | BPF_JSLE | BPF_K: case BPF_JMP32 | BPF_JSLE | BPF_K:
/* test dst_reg, dst_reg to save one extra byte */ /* test dst_reg, dst_reg to save one extra byte */
if (imm32 == 0) { if (imm32 == 0) {
if (BPF_CLASS(insn->code) == BPF_JMP) maybe_emit_mod(&prog, dst_reg, dst_reg,
EMIT1(add_2mod(0x48, dst_reg, dst_reg)); BPF_CLASS(insn->code) == BPF_JMP);
else if (is_ereg(dst_reg))
EMIT1(add_2mod(0x40, dst_reg, dst_reg));
EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg)); EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg));
goto emit_cond_jmp; goto emit_cond_jmp;
} }
...@@ -1412,6 +1534,30 @@ xadd: if (is_imm8(insn->off)) ...@@ -1412,6 +1534,30 @@ xadd: if (is_imm8(insn->off))
} }
jmp_offset = addrs[i + insn->off] - addrs[i]; jmp_offset = addrs[i + insn->off] - addrs[i];
if (is_imm8(jmp_offset)) { if (is_imm8(jmp_offset)) {
if (jmp_padding) {
/* To keep the jmp_offset valid, the extra bytes are
* padded before the jump insn, so we substract the
* 2 bytes of jmp_cond insn from INSN_SZ_DIFF.
*
* If the previous pass already emits an imm8
* jmp_cond, then this BPF insn won't shrink, so
* "nops" is 0.
*
* On the other hand, if the previous pass emits an
* imm32 jmp_cond, the extra 4 bytes(*) is padded to
* keep the image from shrinking further.
*
* (*) imm32 jmp_cond is 6 bytes, and imm8 jmp_cond
* is 2 bytes, so the size difference is 4 bytes.
*/
nops = INSN_SZ_DIFF - 2;
if (nops != 0 && nops != 4) {
pr_err("unexpected jmp_cond padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, nops);
}
EMIT2(jmp_cond, jmp_offset); EMIT2(jmp_cond, jmp_offset);
} else if (is_simm32(jmp_offset)) { } else if (is_simm32(jmp_offset)) {
EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset); EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
...@@ -1434,11 +1580,55 @@ xadd: if (is_imm8(insn->off)) ...@@ -1434,11 +1580,55 @@ xadd: if (is_imm8(insn->off))
else else
jmp_offset = addrs[i + insn->off] - addrs[i]; jmp_offset = addrs[i + insn->off] - addrs[i];
if (!jmp_offset) if (!jmp_offset) {
/* Optimize out nop jumps */ /*
* If jmp_padding is enabled, the extra nops will
* be inserted. Otherwise, optimize out nop jumps.
*/
if (jmp_padding) {
/* There are 3 possible conditions.
* (1) This BPF_JA is already optimized out in
* the previous run, so there is no need
* to pad any extra byte (0 byte).
* (2) The previous pass emits an imm8 jmp,
* so we pad 2 bytes to match the previous
* insn size.
* (3) Similarly, the previous pass emits an
* imm32 jmp, and 5 bytes is padded.
*/
nops = INSN_SZ_DIFF;
if (nops != 0 && nops != 2 && nops != 5) {
pr_err("unexpected nop jump padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, nops);
}
break; break;
}
emit_jmp: emit_jmp:
if (is_imm8(jmp_offset)) { if (is_imm8(jmp_offset)) {
if (jmp_padding) {
/* To avoid breaking jmp_offset, the extra bytes
* are padded before the actual jmp insn, so
* 2 bytes is substracted from INSN_SZ_DIFF.
*
* If the previous pass already emits an imm8
* jmp, there is nothing to pad (0 byte).
*
* If it emits an imm32 jmp (5 bytes) previously
* and now an imm8 jmp (2 bytes), then we pad
* (5 - 2 = 3) bytes to stop the image from
* shrinking further.
*/
nops = INSN_SZ_DIFF - 2;
if (nops != 0 && nops != 3) {
pr_err("unexpected jump padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, INSN_SZ_DIFF - 2);
}
EMIT2(0xEB, jmp_offset); EMIT2(0xEB, jmp_offset);
} else if (is_simm32(jmp_offset)) { } else if (is_simm32(jmp_offset)) {
EMIT1_off32(0xE9, jmp_offset); EMIT1_off32(0xE9, jmp_offset);
...@@ -1543,17 +1733,25 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, ...@@ -1543,17 +1733,25 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
struct bpf_prog *p, int stack_size, bool save_ret) struct bpf_prog *p, int stack_size, bool save_ret)
{ {
u8 *prog = *pprog; u8 *prog = *pprog;
u8 *jmp_insn;
int cnt = 0; int cnt = 0;
if (p->aux->sleepable) { /* arg1: mov rdi, progs[i] */
if (emit_call(&prog, __bpf_prog_enter_sleepable, prog)) emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
return -EINVAL; if (emit_call(&prog,
} else { p->aux->sleepable ? __bpf_prog_enter_sleepable :
if (emit_call(&prog, __bpf_prog_enter, prog)) __bpf_prog_enter, prog))
return -EINVAL; return -EINVAL;
/* remember prog start time returned by __bpf_prog_enter */ /* remember prog start time returned by __bpf_prog_enter */
emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
}
/* if (__bpf_prog_enter*(prog) == 0)
* goto skip_exec_of_prog;
*/
EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */
/* emit 2 nops that will be replaced with JE insn */
jmp_insn = prog;
emit_nops(&prog, 2);
/* arg1: lea rdi, [rbp - stack_size] */ /* arg1: lea rdi, [rbp - stack_size] */
EMIT4(0x48, 0x8D, 0x7D, -stack_size); EMIT4(0x48, 0x8D, 0x7D, -stack_size);
...@@ -1577,43 +1775,23 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, ...@@ -1577,43 +1775,23 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
if (save_ret) if (save_ret)
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
if (p->aux->sleepable) { /* replace 2 nops with JE insn, since jmp target is known */
if (emit_call(&prog, __bpf_prog_exit_sleepable, prog)) jmp_insn[0] = X86_JE;
return -EINVAL; jmp_insn[1] = prog - jmp_insn - 2;
} else {
/* arg1: mov rdi, progs[i] */ /* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
(u32) (long) p);
/* arg2: mov rsi, rbx <- start time in nsec */ /* arg2: mov rsi, rbx <- start time in nsec */
emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
if (emit_call(&prog, __bpf_prog_exit, prog)) if (emit_call(&prog,
p->aux->sleepable ? __bpf_prog_exit_sleepable :
__bpf_prog_exit, prog))
return -EINVAL; return -EINVAL;
}
*pprog = prog; *pprog = prog;
return 0; return 0;
} }
static void emit_nops(u8 **pprog, unsigned int len)
{
unsigned int i, noplen;
u8 *prog = *pprog;
int cnt = 0;
while (len > 0) {
noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
for (i = 0; i < noplen; i++)
EMIT1(ideal_nops[noplen][i]);
len -= noplen;
}
*pprog = prog;
}
static void emit_align(u8 **pprog, u32 align) static void emit_align(u8 **pprog, u32 align)
{ {
u8 *target, *prog = *pprog; u8 *target, *prog = *pprog;
...@@ -2030,6 +2208,9 @@ struct x64_jit_data { ...@@ -2030,6 +2208,9 @@ struct x64_jit_data {
struct jit_context ctx; struct jit_context ctx;
}; };
#define MAX_PASSES 20
#define PADDING_PASSES (MAX_PASSES - 5)
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{ {
struct bpf_binary_header *header = NULL; struct bpf_binary_header *header = NULL;
...@@ -2039,6 +2220,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -2039,6 +2220,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
struct jit_context ctx = {}; struct jit_context ctx = {};
bool tmp_blinded = false; bool tmp_blinded = false;
bool extra_pass = false; bool extra_pass = false;
bool padding = false;
u8 *image = NULL; u8 *image = NULL;
int *addrs; int *addrs;
int pass; int pass;
...@@ -2075,6 +2257,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -2075,6 +2257,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
image = jit_data->image; image = jit_data->image;
header = jit_data->header; header = jit_data->header;
extra_pass = true; extra_pass = true;
padding = true;
goto skip_init_addrs; goto skip_init_addrs;
} }
addrs = kvmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL); addrs = kvmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
...@@ -2100,8 +2283,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -2100,8 +2283,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
* may converge on the last pass. In such case do one more * may converge on the last pass. In such case do one more
* pass to emit the final image. * pass to emit the final image.
*/ */
for (pass = 0; pass < 20 || image; pass++) { for (pass = 0; pass < MAX_PASSES || image; pass++) {
proglen = do_jit(prog, addrs, image, oldproglen, &ctx); if (!padding && pass >= PADDING_PASSES)
padding = true;
proglen = do_jit(prog, addrs, image, oldproglen, &ctx, padding);
if (proglen <= 0) { if (proglen <= 0) {
out_image: out_image:
image = NULL; image = NULL;
...@@ -2177,3 +2362,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -2177,3 +2362,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
tmp : orig_prog); tmp : orig_prog);
return prog; return prog;
} }
bool bpf_jit_supports_kfunc_call(void)
{
return true;
}
...@@ -1390,6 +1390,19 @@ static inline void emit_push_r64(const u8 src[], u8 **pprog) ...@@ -1390,6 +1390,19 @@ static inline void emit_push_r64(const u8 src[], u8 **pprog)
*pprog = prog; *pprog = prog;
} }
static void emit_push_r32(const u8 src[], u8 **pprog)
{
u8 *prog = *pprog;
int cnt = 0;
/* mov ecx,dword ptr [ebp+off] */
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_lo));
/* push ecx */
EMIT1(0x51);
*pprog = prog;
}
static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo) static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo)
{ {
u8 jmp_cond; u8 jmp_cond;
...@@ -1459,6 +1472,174 @@ static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo) ...@@ -1459,6 +1472,174 @@ static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo)
return jmp_cond; return jmp_cond;
} }
/* i386 kernel compiles with "-mregparm=3". From gcc document:
*
* ==== snippet ====
* regparm (number)
* On x86-32 targets, the regparm attribute causes the compiler
* to pass arguments number one to (number) if they are of integral
* type in registers EAX, EDX, and ECX instead of on the stack.
* Functions that take a variable number of arguments continue
* to be passed all of their arguments on the stack.
* ==== snippet ====
*
* The first three args of a function will be considered for
* putting into the 32bit register EAX, EDX, and ECX.
*
* Two 32bit registers are used to pass a 64bit arg.
*
* For example,
* void foo(u32 a, u32 b, u32 c, u32 d):
* u32 a: EAX
* u32 b: EDX
* u32 c: ECX
* u32 d: stack
*
* void foo(u64 a, u32 b, u32 c):
* u64 a: EAX (lo32) EDX (hi32)
* u32 b: ECX
* u32 c: stack
*
* void foo(u32 a, u64 b, u32 c):
* u32 a: EAX
* u64 b: EDX (lo32) ECX (hi32)
* u32 c: stack
*
* void foo(u32 a, u32 b, u64 c):
* u32 a: EAX
* u32 b: EDX
* u64 c: stack
*
* The return value will be stored in the EAX (and EDX for 64bit value).
*
* For example,
* u32 foo(u32 a, u32 b, u32 c):
* return value: EAX
*
* u64 foo(u32 a, u32 b, u32 c):
* return value: EAX (lo32) EDX (hi32)
*
* Notes:
* The verifier only accepts function having integer and pointers
* as its args and return value, so it does not have
* struct-by-value.
*
* emit_kfunc_call() finds out the btf_func_model by calling
* bpf_jit_find_kfunc_model(). A btf_func_model
* has the details about the number of args, size of each arg,
* and the size of the return value.
*
* It first decides how many args can be passed by EAX, EDX, and ECX.
* That will decide what args should be pushed to the stack:
* [first_stack_regno, last_stack_regno] are the bpf regnos
* that should be pushed to the stack.
*
* It will first push all args to the stack because the push
* will need to use ECX. Then, it moves
* [BPF_REG_1, first_stack_regno) to EAX, EDX, and ECX.
*
* When emitting a call (0xE8), it needs to figure out
* the jmp_offset relative to the jit-insn address immediately
* following the call (0xE8) instruction. At this point, it knows
* the end of the jit-insn address after completely translated the
* current (BPF_JMP | BPF_CALL) bpf-insn. It is passed as "end_addr"
* to the emit_kfunc_call(). Thus, it can learn the "immediate-follow-call"
* address by figuring out how many jit-insn is generated between
* the call (0xE8) and the end_addr:
* - 0-1 jit-insn (3 bytes each) to restore the esp pointer if there
* is arg pushed to the stack.
* - 0-2 jit-insns (3 bytes each) to handle the return value.
*/
static int emit_kfunc_call(const struct bpf_prog *bpf_prog, u8 *end_addr,
const struct bpf_insn *insn, u8 **pprog)
{
const u8 arg_regs[] = { IA32_EAX, IA32_EDX, IA32_ECX };
int i, cnt = 0, first_stack_regno, last_stack_regno;
int free_arg_regs = ARRAY_SIZE(arg_regs);
const struct btf_func_model *fm;
int bytes_in_stack = 0;
const u8 *cur_arg_reg;
u8 *prog = *pprog;
s64 jmp_offset;
fm = bpf_jit_find_kfunc_model(bpf_prog, insn);
if (!fm)
return -EINVAL;
first_stack_regno = BPF_REG_1;
for (i = 0; i < fm->nr_args; i++) {
int regs_needed = fm->arg_size[i] > sizeof(u32) ? 2 : 1;
if (regs_needed > free_arg_regs)
break;
free_arg_regs -= regs_needed;
first_stack_regno++;
}
/* Push the args to the stack */
last_stack_regno = BPF_REG_0 + fm->nr_args;
for (i = last_stack_regno; i >= first_stack_regno; i--) {
if (fm->arg_size[i - 1] > sizeof(u32)) {
emit_push_r64(bpf2ia32[i], &prog);
bytes_in_stack += 8;
} else {
emit_push_r32(bpf2ia32[i], &prog);
bytes_in_stack += 4;
}
}
cur_arg_reg = &arg_regs[0];
for (i = BPF_REG_1; i < first_stack_regno; i++) {
/* mov e[adc]x,dword ptr [ebp+off] */
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, *cur_arg_reg++),
STACK_VAR(bpf2ia32[i][0]));
if (fm->arg_size[i - 1] > sizeof(u32))
/* mov e[adc]x,dword ptr [ebp+off] */
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, *cur_arg_reg++),
STACK_VAR(bpf2ia32[i][1]));
}
if (bytes_in_stack)
/* add esp,"bytes_in_stack" */
end_addr -= 3;
/* mov dword ptr [ebp+off],edx */
if (fm->ret_size > sizeof(u32))
end_addr -= 3;
/* mov dword ptr [ebp+off],eax */
if (fm->ret_size)
end_addr -= 3;
jmp_offset = (u8 *)__bpf_call_base + insn->imm - end_addr;
if (!is_simm32(jmp_offset)) {
pr_err("unsupported BPF kernel function jmp_offset:%lld\n",
jmp_offset);
return -EINVAL;
}
EMIT1_off32(0xE8, jmp_offset);
if (fm->ret_size)
/* mov dword ptr [ebp+off],eax */
EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
STACK_VAR(bpf2ia32[BPF_REG_0][0]));
if (fm->ret_size > sizeof(u32))
/* mov dword ptr [ebp+off],edx */
EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
STACK_VAR(bpf2ia32[BPF_REG_0][1]));
if (bytes_in_stack)
/* add esp,"bytes_in_stack" */
EMIT3(0x83, add_1reg(0xC0, IA32_ESP), bytes_in_stack);
*pprog = prog;
return 0;
}
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
int oldproglen, struct jit_context *ctx) int oldproglen, struct jit_context *ctx)
{ {
...@@ -1894,6 +2075,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, ...@@ -1894,6 +2075,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
if (insn->src_reg == BPF_PSEUDO_CALL) if (insn->src_reg == BPF_PSEUDO_CALL)
goto notyet; goto notyet;
if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
int err;
err = emit_kfunc_call(bpf_prog,
image + addrs[i],
insn, &prog);
if (err)
return err;
break;
}
func = (u8 *) __bpf_call_base + imm32; func = (u8 *) __bpf_call_base + imm32;
jmp_offset = func - (image + addrs[i]); jmp_offset = func - (image + addrs[i]);
...@@ -2249,10 +2442,8 @@ emit_cond_jmp: jmp_cond = get_cond_jmp_opcode(BPF_OP(code), false); ...@@ -2249,10 +2442,8 @@ emit_cond_jmp: jmp_cond = get_cond_jmp_opcode(BPF_OP(code), false);
return -EFAULT; return -EFAULT;
} }
break; break;
/* STX XADD: lock *(u32 *)(dst + off) += src */ case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_XADD | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW:
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
goto notyet; goto notyet;
case BPF_JMP | BPF_EXIT: case BPF_JMP | BPF_EXIT:
if (seen_exit) { if (seen_exit) {
...@@ -2410,3 +2601,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -2410,3 +2601,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
tmp : orig_prog); tmp : orig_prog);
return prog; return prog;
} }
bool bpf_jit_supports_kfunc_call(void)
{
return true;
}
...@@ -13,8 +13,8 @@ ...@@ -13,8 +13,8 @@
*/ */
#define NFP_BPF_SCALAR_VALUE 1 #define NFP_BPF_SCALAR_VALUE 1
#define NFP_BPF_MAP_VALUE 4 #define NFP_BPF_MAP_VALUE 4
#define NFP_BPF_STACK 5 #define NFP_BPF_STACK 6
#define NFP_BPF_PACKET_DATA 7 #define NFP_BPF_PACKET_DATA 8
enum bpf_cap_tlv_type { enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_FUNC = 1, NFP_BPF_CAP_TYPE_FUNC = 1,
......
...@@ -3109,13 +3109,19 @@ mem_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, bool is64) ...@@ -3109,13 +3109,19 @@ mem_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, bool is64)
return 0; return 0;
} }
static int mem_xadd4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) static int mem_atomic4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{ {
if (meta->insn.imm != BPF_ADD)
return -EOPNOTSUPP;
return mem_xadd(nfp_prog, meta, false); return mem_xadd(nfp_prog, meta, false);
} }
static int mem_xadd8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) static int mem_atomic8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{ {
if (meta->insn.imm != BPF_ADD)
return -EOPNOTSUPP;
return mem_xadd(nfp_prog, meta, true); return mem_xadd(nfp_prog, meta, true);
} }
...@@ -3475,8 +3481,8 @@ static const instr_cb_t instr_cb[256] = { ...@@ -3475,8 +3481,8 @@ static const instr_cb_t instr_cb[256] = {
[BPF_STX | BPF_MEM | BPF_H] = mem_stx2, [BPF_STX | BPF_MEM | BPF_H] = mem_stx2,
[BPF_STX | BPF_MEM | BPF_W] = mem_stx4, [BPF_STX | BPF_MEM | BPF_W] = mem_stx4,
[BPF_STX | BPF_MEM | BPF_DW] = mem_stx8, [BPF_STX | BPF_MEM | BPF_DW] = mem_stx8,
[BPF_STX | BPF_XADD | BPF_W] = mem_xadd4, [BPF_STX | BPF_ATOMIC | BPF_W] = mem_atomic4,
[BPF_STX | BPF_XADD | BPF_DW] = mem_xadd8, [BPF_STX | BPF_ATOMIC | BPF_DW] = mem_atomic8,
[BPF_ST | BPF_MEM | BPF_B] = mem_st1, [BPF_ST | BPF_MEM | BPF_B] = mem_st1,
[BPF_ST | BPF_MEM | BPF_H] = mem_st2, [BPF_ST | BPF_MEM | BPF_H] = mem_st2,
[BPF_ST | BPF_MEM | BPF_W] = mem_st4, [BPF_ST | BPF_MEM | BPF_W] = mem_st4,
......
...@@ -428,9 +428,9 @@ static inline bool is_mbpf_classic_store_pkt(const struct nfp_insn_meta *meta) ...@@ -428,9 +428,9 @@ static inline bool is_mbpf_classic_store_pkt(const struct nfp_insn_meta *meta)
return is_mbpf_classic_store(meta) && meta->ptr.type == PTR_TO_PACKET; return is_mbpf_classic_store(meta) && meta->ptr.type == PTR_TO_PACKET;
} }
static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta) static inline bool is_mbpf_atomic(const struct nfp_insn_meta *meta)
{ {
return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD); return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_ATOMIC);
} }
static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta) static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
......
...@@ -479,7 +479,7 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, ...@@ -479,7 +479,7 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
pr_vlog(env, "map writes not supported\n"); pr_vlog(env, "map writes not supported\n");
return -EOPNOTSUPP; return -EOPNOTSUPP;
} }
if (is_mbpf_xadd(meta)) { if (is_mbpf_atomic(meta)) {
err = nfp_bpf_map_mark_used(env, meta, reg, err = nfp_bpf_map_mark_used(env, meta, reg,
NFP_MAP_USE_ATOMIC_CNT); NFP_MAP_USE_ATOMIC_CNT);
if (err) if (err)
...@@ -523,12 +523,17 @@ nfp_bpf_check_store(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, ...@@ -523,12 +523,17 @@ nfp_bpf_check_store(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
} }
static int static int
nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, nfp_bpf_check_atomic(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
struct bpf_verifier_env *env) struct bpf_verifier_env *env)
{ {
const struct bpf_reg_state *sreg = cur_regs(env) + meta->insn.src_reg; const struct bpf_reg_state *sreg = cur_regs(env) + meta->insn.src_reg;
const struct bpf_reg_state *dreg = cur_regs(env) + meta->insn.dst_reg; const struct bpf_reg_state *dreg = cur_regs(env) + meta->insn.dst_reg;
if (meta->insn.imm != BPF_ADD) {
pr_vlog(env, "atomic op not implemented: %d\n", meta->insn.imm);
return -EOPNOTSUPP;
}
if (dreg->type != PTR_TO_MAP_VALUE) { if (dreg->type != PTR_TO_MAP_VALUE) {
pr_vlog(env, "atomic add not to a map value pointer: %d\n", pr_vlog(env, "atomic add not to a map value pointer: %d\n",
dreg->type); dreg->type);
...@@ -655,8 +660,8 @@ int nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, ...@@ -655,8 +660,8 @@ int nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx,
if (is_mbpf_store(meta)) if (is_mbpf_store(meta))
return nfp_bpf_check_store(nfp_prog, meta, env); return nfp_bpf_check_store(nfp_prog, meta, env);
if (is_mbpf_xadd(meta)) if (is_mbpf_atomic(meta))
return nfp_bpf_check_xadd(nfp_prog, meta, env); return nfp_bpf_check_atomic(nfp_prog, meta, env);
if (is_mbpf_alu(meta)) if (is_mbpf_alu(meta))
return nfp_bpf_check_alu(nfp_prog, meta, env); return nfp_bpf_check_alu(nfp_prog, meta, env);
......
...@@ -14,13 +14,13 @@ ...@@ -14,13 +14,13 @@
#include <linux/numa.h> #include <linux/numa.h>
#include <linux/mm_types.h> #include <linux/mm_types.h>
#include <linux/wait.h> #include <linux/wait.h>
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h> #include <linux/refcount.h>
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/kallsyms.h> #include <linux/kallsyms.h>
#include <linux/capability.h> #include <linux/capability.h>
#include <linux/percpu-refcount.h> #include <linux/percpu-refcount.h>
#include <linux/bpfptr.h>
struct bpf_verifier_env; struct bpf_verifier_env;
struct bpf_verifier_log; struct bpf_verifier_log;
...@@ -37,9 +37,12 @@ struct seq_operations; ...@@ -37,9 +37,12 @@ struct seq_operations;
struct bpf_iter_aux_info; struct bpf_iter_aux_info;
struct bpf_local_storage; struct bpf_local_storage;
struct bpf_local_storage_map; struct bpf_local_storage_map;
struct kobject;
struct bpf_func_state;
extern struct idr btf_idr; extern struct idr btf_idr;
extern spinlock_t btf_idr_lock; extern spinlock_t btf_idr_lock;
extern struct kobject *btf_kobj;
typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
struct bpf_iter_aux_info *aux); struct bpf_iter_aux_info *aux);
...@@ -126,6 +129,13 @@ struct bpf_map_ops { ...@@ -126,6 +129,13 @@ struct bpf_map_ops {
bool (*map_meta_equal)(const struct bpf_map *meta0, bool (*map_meta_equal)(const struct bpf_map *meta0,
const struct bpf_map *meta1); const struct bpf_map *meta1);
int (*map_set_for_each_callback_args)(struct bpf_verifier_env *env,
struct bpf_func_state *caller,
struct bpf_func_state *callee);
int (*map_for_each_callback)(struct bpf_map *map, void *callback_fn,
void *callback_ctx, u64 flags);
/* BTF name and id of struct allocated by map_alloc */ /* BTF name and id of struct allocated by map_alloc */
const char * const map_btf_name; const char * const map_btf_name;
int *map_btf_id; int *map_btf_id;
...@@ -323,6 +333,9 @@ enum bpf_arg_type { ...@@ -323,6 +333,9 @@ enum bpf_arg_type {
ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */ ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */
ARG_PTR_TO_BTF_ID_SOCK_COMMON, /* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */ ARG_PTR_TO_BTF_ID_SOCK_COMMON, /* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */
ARG_PTR_TO_PERCPU_BTF_ID, /* pointer to in-kernel percpu type */ ARG_PTR_TO_PERCPU_BTF_ID, /* pointer to in-kernel percpu type */
ARG_PTR_TO_FUNC, /* pointer to a bpf program function */
ARG_PTR_TO_STACK_OR_NULL, /* pointer to stack or NULL */
ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */
__BPF_ARG_TYPE_MAX, __BPF_ARG_TYPE_MAX,
/* Extended arg_types. */ /* Extended arg_types. */
...@@ -427,6 +440,7 @@ enum bpf_reg_type { ...@@ -427,6 +440,7 @@ enum bpf_reg_type {
PTR_TO_CTX, /* reg points to bpf_context */ PTR_TO_CTX, /* reg points to bpf_context */
CONST_PTR_TO_MAP, /* reg points to struct bpf_map */ CONST_PTR_TO_MAP, /* reg points to struct bpf_map */
PTR_TO_MAP_VALUE, /* reg points to map element value */ PTR_TO_MAP_VALUE, /* reg points to map element value */
PTR_TO_MAP_KEY, /* reg points to a map element key */
PTR_TO_STACK, /* reg == frame_pointer + offset */ PTR_TO_STACK, /* reg == frame_pointer + offset */
PTR_TO_PACKET_META, /* skb->data - meta_len */ PTR_TO_PACKET_META, /* skb->data - meta_len */
PTR_TO_PACKET, /* reg points to skb->data */ PTR_TO_PACKET, /* reg points to skb->data */
...@@ -454,6 +468,7 @@ enum bpf_reg_type { ...@@ -454,6 +468,7 @@ enum bpf_reg_type {
*/ */
PTR_TO_MEM, /* reg points to valid memory region */ PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_BUF, /* reg points to a read/write buffer */ PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_FUNC, /* reg points to a bpf program function */
PTR_TO_PERCPU_BTF_ID, /* reg points to a percpu kernel variable */ PTR_TO_PERCPU_BTF_ID, /* reg points to a percpu kernel variable */
__BPF_REG_TYPE_MAX, __BPF_REG_TYPE_MAX,
...@@ -478,8 +493,11 @@ struct bpf_insn_access_aux { ...@@ -478,8 +493,11 @@ struct bpf_insn_access_aux {
enum bpf_reg_type reg_type; enum bpf_reg_type reg_type;
union { union {
int ctx_field_size; int ctx_field_size;
struct {
struct btf *btf;
u32 btf_id; u32 btf_id;
}; };
};
struct bpf_verifier_log *log; /* for verbose logs */ struct bpf_verifier_log *log; /* for verbose logs */
}; };
...@@ -515,9 +533,11 @@ struct bpf_verifier_ops { ...@@ -515,9 +533,11 @@ struct bpf_verifier_ops {
struct bpf_insn *dst, struct bpf_insn *dst,
struct bpf_prog *prog, u32 *target_size); struct bpf_prog *prog, u32 *target_size);
int (*btf_struct_access)(struct bpf_verifier_log *log, int (*btf_struct_access)(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off, int size, const struct btf_type *t, int off, int size,
enum bpf_access_type atype, enum bpf_access_type atype,
u32 *next_btf_id); u32 *next_btf_id);
bool (*check_kfunc_call)(u32 kfunc_btf_id, struct module *owner);
}; };
struct bpf_prog_offload_ops { struct bpf_prog_offload_ops {
...@@ -568,11 +588,10 @@ enum bpf_cgroup_storage_type { ...@@ -568,11 +588,10 @@ enum bpf_cgroup_storage_type {
*/ */
#define MAX_BPF_FUNC_ARGS 12 #define MAX_BPF_FUNC_ARGS 12
struct bpf_prog_stats { /* The maximum number of arguments passed through registers
u64 cnt; * a single function may have.
u64 nsecs; */
struct u64_stats_sync syncp; #define MAX_BPF_FUNC_REG_ARGS 5
} __aligned(2 * sizeof(u64));
struct btf_func_model { struct btf_func_model {
u8 ret_size; u8 ret_size;
...@@ -599,7 +618,7 @@ struct btf_func_model { ...@@ -599,7 +618,7 @@ struct btf_func_model {
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2 * bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2
*/ */
#define BPF_MAX_TRAMP_PROGS 40 #define BPF_MAX_TRAMP_PROGS 38
struct bpf_tramp_progs { struct bpf_tramp_progs {
struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS]; struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS];
...@@ -632,10 +651,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *i ...@@ -632,10 +651,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *i
struct bpf_tramp_progs *tprogs, struct bpf_tramp_progs *tprogs,
void *orig_call); void *orig_call);
/* these two functions are called from generated trampoline */ /* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(void); u64 notrace __bpf_prog_enter(struct bpf_prog *prog);
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
void notrace __bpf_prog_enter_sleepable(void); u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog);
void notrace __bpf_prog_exit_sleepable(void); void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start);
void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr);
void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr);
...@@ -842,9 +861,17 @@ struct bpf_ctx_arg_aux { ...@@ -842,9 +861,17 @@ struct bpf_ctx_arg_aux {
u32 btf_id; u32 btf_id;
}; };
struct btf_mod_pair {
struct btf *btf;
struct module *module;
};
struct bpf_kfunc_desc_tab;
struct bpf_prog_aux { struct bpf_prog_aux {
atomic64_t refcnt; atomic64_t refcnt;
u32 used_map_cnt; u32 used_map_cnt;
u32 used_btf_cnt;
u32 max_ctx_offset; u32 max_ctx_offset;
u32 max_pkt_offset; u32 max_pkt_offset;
u32 max_tp_access; u32 max_tp_access;
...@@ -856,6 +883,7 @@ struct bpf_prog_aux { ...@@ -856,6 +883,7 @@ struct bpf_prog_aux {
u32 ctx_arg_info_size; u32 ctx_arg_info_size;
u32 max_rdonly_access; u32 max_rdonly_access;
u32 max_rdwr_access; u32 max_rdwr_access;
struct btf *attach_btf;
const struct bpf_ctx_arg_aux *ctx_arg_info; const struct bpf_ctx_arg_aux *ctx_arg_info;
struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */ struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */
struct bpf_prog *dst_prog; struct bpf_prog *dst_prog;
...@@ -876,14 +904,18 @@ struct bpf_prog_aux { ...@@ -876,14 +904,18 @@ struct bpf_prog_aux {
struct bpf_prog **func; struct bpf_prog **func;
void *jit_data; /* JIT specific data. arch dependent */ void *jit_data; /* JIT specific data. arch dependent */
struct bpf_jit_poke_descriptor *poke_tab; struct bpf_jit_poke_descriptor *poke_tab;
struct bpf_kfunc_desc_tab *kfunc_tab;
struct bpf_kfunc_btf_tab *kfunc_btf_tab;
u32 size_poke_tab; u32 size_poke_tab;
struct bpf_ksym ksym; struct bpf_ksym ksym;
const struct bpf_prog_ops *ops; const struct bpf_prog_ops *ops;
struct bpf_map **used_maps; struct bpf_map **used_maps;
struct mutex used_maps_mutex; /* mutex for used_maps and used_map_cnt */ struct mutex used_maps_mutex; /* mutex for used_maps and used_map_cnt */
struct btf_mod_pair *used_btfs;
struct bpf_prog *prog; struct bpf_prog *prog;
struct user_struct *user; struct user_struct *user;
u64 load_time; /* ns since boottime */ u64 load_time; /* ns since boottime */
u32 verified_insns;
struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
char name[BPF_OBJ_NAME_LEN]; char name[BPF_OBJ_NAME_LEN];
#ifdef CONFIG_SECURITY #ifdef CONFIG_SECURITY
...@@ -917,7 +949,6 @@ struct bpf_prog_aux { ...@@ -917,7 +949,6 @@ struct bpf_prog_aux {
u32 linfo_idx; u32 linfo_idx;
u32 num_exentries; u32 num_exentries;
struct exception_table_entry *extable; struct exception_table_entry *extable;
struct bpf_prog_stats __percpu *stats;
union { union {
struct work_struct work; struct work_struct work;
struct rcu_head rcu; struct rcu_head rcu;
...@@ -1092,7 +1123,6 @@ struct bpf_event_entry { ...@@ -1092,7 +1123,6 @@ struct bpf_event_entry {
bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
int bpf_prog_calc_tag(struct bpf_prog *fp); int bpf_prog_calc_tag(struct bpf_prog *fp);
const char *kernel_type_name(u32 btf_type_id);
const struct bpf_func_proto *bpf_get_trace_printk_proto(void); const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
...@@ -1445,7 +1475,7 @@ struct bpf_iter__bpf_map_elem { ...@@ -1445,7 +1475,7 @@ struct bpf_iter__bpf_map_elem {
int bpf_iter_reg_target(const struct bpf_iter_reg *reg_info); int bpf_iter_reg_target(const struct bpf_iter_reg *reg_info);
void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info); void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info);
bool bpf_iter_prog_supported(struct bpf_prog *prog); bool bpf_iter_prog_supported(struct bpf_prog *prog);
int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_prog *prog);
int bpf_iter_new_fd(struct bpf_link *link); int bpf_iter_new_fd(struct bpf_link *link);
bool bpf_link_is_iter(struct bpf_link *link); bool bpf_link_is_iter(struct bpf_link *link);
struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop); struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop);
...@@ -1472,7 +1502,7 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, ...@@ -1472,7 +1502,7 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value); int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
int bpf_get_file_flag(int flags); int bpf_get_file_flag(int flags);
int bpf_check_uarg_tail_zero(void __user *uaddr, size_t expected_size, int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size,
size_t actual_size); size_t actual_size);
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
...@@ -1492,8 +1522,7 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) ...@@ -1492,8 +1522,7 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
} }
/* verify correctness of eBPF program */ /* verify correctness of eBPF program */
int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr);
union bpf_attr __user *uattr);
#ifndef CONFIG_BPF_JIT_ALWAYS_ON #ifndef CONFIG_BPF_JIT_ALWAYS_ON
void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth); void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
...@@ -1545,15 +1574,20 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, ...@@ -1545,15 +1574,20 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
const union bpf_attr *kattr, const union bpf_attr *kattr,
union bpf_attr __user *uattr); union bpf_attr __user *uattr);
int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr);
bool bpf_prog_test_check_kfunc_call(u32 kfunc_id, struct module *owner);
bool btf_ctx_access(int off, int size, enum bpf_access_type type, bool btf_ctx_access(int off, int size, enum bpf_access_type type,
const struct bpf_prog *prog, const struct bpf_prog *prog,
struct bpf_insn_access_aux *info); struct bpf_insn_access_aux *info);
int btf_struct_access(struct bpf_verifier_log *log, int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size, const struct btf_type *t, int off, int size,
enum bpf_access_type atype, enum bpf_access_type atype,
u32 *next_btf_id); u32 *next_btf_id);
bool btf_struct_ids_match(struct bpf_verifier_log *log, bool btf_struct_ids_match(struct bpf_verifier_log *log,
int off, u32 id, u32 need_type_id); const struct btf *btf, u32 id, int off,
const struct btf *need_btf, u32 need_type_id);
int btf_distill_func_proto(struct bpf_verifier_log *log, int btf_distill_func_proto(struct bpf_verifier_log *log,
struct btf *btf, struct btf *btf,
...@@ -1562,7 +1596,10 @@ int btf_distill_func_proto(struct bpf_verifier_log *log, ...@@ -1562,7 +1596,10 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
struct btf_func_model *m); struct btf_func_model *m);
struct bpf_reg_state; struct bpf_reg_state;
int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog, int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs); struct bpf_reg_state *regs);
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg); struct bpf_reg_state *reg);
...@@ -1573,6 +1610,18 @@ struct bpf_prog *bpf_prog_by_id(u32 id); ...@@ -1573,6 +1610,18 @@ struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id); struct bpf_link *bpf_link_by_id(u32 id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
const struct btf_func_model *
bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
const struct bpf_insn *insn);
struct bpf_core_ctx {
struct bpf_verifier_log *log;
const struct btf *btf;
};
int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
int relo_idx, void *insn);
static inline bool unprivileged_ebpf_enabled(void) static inline bool unprivileged_ebpf_enabled(void)
{ {
...@@ -1759,6 +1808,19 @@ static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, ...@@ -1759,6 +1808,19 @@ static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
return -ENOTSUPP; return -ENOTSUPP;
} }
static inline int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
return -ENOTSUPP;
}
static inline bool bpf_prog_test_check_kfunc_call(u32 kfunc_id,
struct module *owner)
{
return false;
}
static inline void bpf_map_put(struct bpf_map *map) static inline void bpf_map_put(struct bpf_map *map)
{ {
} }
...@@ -1774,6 +1836,18 @@ bpf_base_func_proto(enum bpf_func_id func_id) ...@@ -1774,6 +1836,18 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return NULL; return NULL;
} }
static inline bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog)
{
return false;
}
static inline const struct btf_func_model *
bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
const struct bpf_insn *insn)
{
return NULL;
}
static inline bool unprivileged_ebpf_enabled(void) static inline bool unprivileged_ebpf_enabled(void)
{ {
return false; return false;
...@@ -1781,6 +1855,9 @@ static inline bool unprivileged_ebpf_enabled(void) ...@@ -1781,6 +1855,9 @@ static inline bool unprivileged_ebpf_enabled(void)
#endif /* CONFIG_BPF_SYSCALL */ #endif /* CONFIG_BPF_SYSCALL */
void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
struct btf_mod_pair *used_btfs, u32 len);
static inline struct bpf_prog *bpf_prog_get_type(u32 ufd, static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
enum bpf_prog_type type) enum bpf_prog_type type)
{ {
...@@ -1833,6 +1910,9 @@ static inline bool bpf_map_is_dev_bound(struct bpf_map *map) ...@@ -1833,6 +1910,9 @@ static inline bool bpf_map_is_dev_bound(struct bpf_map *map)
struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr); struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr);
void bpf_map_offload_map_free(struct bpf_map *map); void bpf_map_offload_map_free(struct bpf_map *map);
int bpf_prog_test_run_syscall(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr);
#else #else
static inline int bpf_prog_offload_init(struct bpf_prog *prog, static inline int bpf_prog_offload_init(struct bpf_prog *prog,
union bpf_attr *attr) union bpf_attr *attr)
...@@ -1858,6 +1938,13 @@ static inline struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr) ...@@ -1858,6 +1938,13 @@ static inline struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr)
static inline void bpf_map_offload_map_free(struct bpf_map *map) static inline void bpf_map_offload_map_free(struct bpf_map *map)
{ {
} }
static inline int bpf_prog_test_run_syscall(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
return -ENOTSUPP;
}
#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */ #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
#if defined(CONFIG_BPF_STREAM_PARSER) #if defined(CONFIG_BPF_STREAM_PARSER)
...@@ -1973,8 +2060,12 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; ...@@ -1973,8 +2060,12 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto;
extern const struct bpf_func_proto bpf_copy_from_user_proto; extern const struct bpf_func_proto bpf_copy_from_user_proto;
extern const struct bpf_func_proto bpf_snprintf_btf_proto; extern const struct bpf_func_proto bpf_snprintf_btf_proto;
extern const struct bpf_func_proto bpf_snprintf_proto;
extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; extern const struct bpf_func_proto bpf_per_cpu_ptr_proto;
extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
const struct bpf_func_proto *bpf_tracing_func_proto( const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog); enum bpf_func_id func_id, const struct bpf_prog *prog);
...@@ -2092,4 +2183,24 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, ...@@ -2092,4 +2183,24 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
struct btf_id_set; struct btf_id_set;
bool btf_id_set_contains(const struct btf_id_set *set, u32 id); bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
enum bpf_printf_mod_type {
BPF_PRINTF_INT,
BPF_PRINTF_LONG,
BPF_PRINTF_LONG_LONG,
};
/* Workaround for getting va_list handling working with different argument type
* combinations generically for 32 and 64 bit archs.
*/
#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \
(mod[arg_nb] == BPF_PRINTF_LONG_LONG || \
(mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \
? (u64)args[arg_nb] \
: (u32)args[arg_nb])
int bpf_printf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
u64 *final_args, enum bpf_printf_mod_type *mod,
u32 num_args);
void bpf_printf_cleanup(void);
#endif /* _LINUX_BPF_H */ #endif /* _LINUX_BPF_H */
...@@ -81,6 +81,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm, ...@@ -81,6 +81,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED, bpf_sched, BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED, bpf_sched,
void *, void *) void *, void *)
#endif /* CONFIG_BPF_SCHED */ #endif /* CONFIG_BPF_SCHED */
BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
void *, void *)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
#define _LINUX_BPF_VERIFIER_H 1 #define _LINUX_BPF_VERIFIER_H 1
#include <linux/bpf.h> /* for enum bpf_reg_type */ #include <linux/bpf.h> /* for enum bpf_reg_type */
#include <linux/btf.h> /* for struct btf and btf_id() */
#include <linux/filter.h> /* for MAX_BPF_STACK */ #include <linux/filter.h> /* for MAX_BPF_STACK */
#include <linux/tnum.h> #include <linux/tnum.h>
...@@ -45,6 +46,8 @@ enum bpf_reg_liveness { ...@@ -45,6 +46,8 @@ enum bpf_reg_liveness {
struct bpf_reg_state { struct bpf_reg_state {
/* Ordering of fields matters. See states_equal() */ /* Ordering of fields matters. See states_equal() */
enum bpf_reg_type type; enum bpf_reg_type type;
/* Fixed part of pointer offset, pointer types only */
s32 off;
union { union {
/* valid when type == PTR_TO_PACKET */ /* valid when type == PTR_TO_PACKET */
u16 range; u16 range;
...@@ -54,15 +57,22 @@ struct bpf_reg_state { ...@@ -54,15 +57,22 @@ struct bpf_reg_state {
*/ */
struct bpf_map *map_ptr; struct bpf_map *map_ptr;
u32 btf_id; /* for PTR_TO_BTF_ID */ /* for PTR_TO_BTF_ID */
struct {
struct btf *btf;
u32 btf_id;
};
u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
/* Max size from any of the above. */ /* Max size from any of the above. */
unsigned long raw; struct {
unsigned long raw1;
unsigned long raw2;
} raw;
u32 subprogno; /* for PTR_TO_FUNC */
}; };
/* Fixed part of pointer offset, pointer types only */
s32 off;
/* For PTR_TO_PACKET, used to find other pointers with the same variable /* For PTR_TO_PACKET, used to find other pointers with the same variable
* offset, so they can share range knowledge. * offset, so they can share range knowledge.
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
...@@ -198,6 +208,7 @@ struct bpf_func_state { ...@@ -198,6 +208,7 @@ struct bpf_func_state {
int acquired_refs; int acquired_refs;
struct bpf_reference_state *refs; struct bpf_reference_state *refs;
int allocated_stack; int allocated_stack;
bool in_callback_fn;
struct bpf_stack_state *stack; struct bpf_stack_state *stack;
}; };
...@@ -321,7 +332,10 @@ struct bpf_insn_aux_data { ...@@ -321,7 +332,10 @@ struct bpf_insn_aux_data {
struct { struct {
enum bpf_reg_type reg_type; /* type of pseudo_btf_id */ enum bpf_reg_type reg_type; /* type of pseudo_btf_id */
union { union {
struct {
struct btf *btf;
u32 btf_id; /* btf_id for struct typed var */ u32 btf_id; /* btf_id for struct typed var */
};
u32 mem_size; /* mem_size for non-struct typed var */ u32 mem_size; /* mem_size for non-struct typed var */
}; };
} btf_var; } btf_var;
...@@ -339,6 +353,7 @@ struct bpf_insn_aux_data { ...@@ -339,6 +353,7 @@ struct bpf_insn_aux_data {
}; };
#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
#define MAX_USED_BTFS 64 /* max number of BTFs accessed by one BPF program */
#define BPF_VERIFIER_TMP_LOG_SIZE 1024 #define BPF_VERIFIER_TMP_LOG_SIZE 1024
...@@ -404,7 +419,9 @@ struct bpf_verifier_env { ...@@ -404,7 +419,9 @@ struct bpf_verifier_env {
struct bpf_verifier_state_list **explored_states; /* search pruning optimization */ struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
struct bpf_verifier_state_list *free_list; struct bpf_verifier_state_list *free_list;
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */ struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
struct btf_mod_pair used_btfs[MAX_USED_BTFS]; /* array of BTF's used by BPF program */
u32 used_map_cnt; /* number of used maps */ u32 used_map_cnt; /* number of used maps */
u32 used_btf_cnt; /* number of used BTF objects */
u32 id_gen; /* used to generate unique reg IDs */ u32 id_gen; /* used to generate unique reg IDs */
bool explore_alu_limits; bool explore_alu_limits;
bool allow_ptr_leaks; bool allow_ptr_leaks;
...@@ -443,6 +460,7 @@ struct bpf_verifier_env { ...@@ -443,6 +460,7 @@ struct bpf_verifier_env {
u32 peak_states; u32 peak_states;
/* longest register parentage chain walked for liveness marking */ /* longest register parentage chain walked for liveness marking */
u32 longest_mark_read_walk; u32 longest_mark_read_walk;
bpfptr_t fd_array;
/* buffer used in reg_type_str() to generate reg_type string */ /* buffer used in reg_type_str() to generate reg_type string */
char type_str_buf[TYPE_STR_BUF_LEN]; char type_str_buf[TYPE_STR_BUF_LEN];
}; };
...@@ -478,12 +496,26 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt); ...@@ -478,12 +496,26 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
int check_ptr_off_reg(struct bpf_verifier_env *env, int check_ptr_off_reg(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno); const struct bpf_reg_state *reg, int regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
/* this lives here instead of in bpf.h because it needs to dereference tgt_prog */ /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog, static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
u32 btf_id) struct btf *btf, u32 btf_id)
{ {
return tgt_prog ? (((u64)tgt_prog->aux->id) << 32 | btf_id) : btf_id; if (tgt_prog)
return ((u64)tgt_prog->aux->id << 32) | btf_id;
else
return ((u64)btf_obj_id(btf) << 32) | 0x80000000 | btf_id;
}
/* unpack the IDs from the key as constructed above */
static inline void bpf_trampoline_unpack_key(u64 key, u32 *obj_id, u32 *btf_id)
{
if (obj_id)
*obj_id = key >> 32;
if (btf_id)
*btf_id = key & 0x7FFFFFFF;
} }
int bpf_check_attach_target(struct bpf_verifier_log *log, int bpf_check_attach_target(struct bpf_verifier_log *log,
...@@ -491,6 +523,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, ...@@ -491,6 +523,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
const struct bpf_prog *tgt_prog, const struct bpf_prog *tgt_prog,
u32 btf_id, u32 btf_id,
struct bpf_attach_target_info *tgt_info); struct bpf_attach_target_info *tgt_info);
void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab);
#define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0) #define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0)
/* extract base type from bpf_{arg, return, reg}_type. */ /* extract base type from bpf_{arg, return, reg}_type. */
......
/* SPDX-License-Identifier: GPL-2.0-only */
/* A pointer that can point to either kernel or userspace memory. */
#ifndef _LINUX_BPFPTR_H
#define _LINUX_BPFPTR_H
#include <linux/mm.h>
#include <linux/sockptr.h>
typedef sockptr_t bpfptr_t;
static inline bool bpfptr_is_kernel(bpfptr_t bpfptr)
{
return bpfptr.is_kernel;
}
static inline bpfptr_t KERNEL_BPFPTR(void *p)
{
return (bpfptr_t) { .kernel = p, .is_kernel = true };
}
static inline bpfptr_t USER_BPFPTR(void __user *p)
{
return (bpfptr_t) { .user = p };
}
static inline bpfptr_t make_bpfptr(u64 addr, bool is_kernel)
{
if (is_kernel)
return KERNEL_BPFPTR((void*) (uintptr_t) addr);
else
return USER_BPFPTR(u64_to_user_ptr(addr));
}
static inline bool bpfptr_is_null(bpfptr_t bpfptr)
{
if (bpfptr_is_kernel(bpfptr))
return !bpfptr.kernel;
return !bpfptr.user;
}
static inline void bpfptr_add(bpfptr_t *bpfptr, size_t val)
{
if (bpfptr_is_kernel(*bpfptr))
bpfptr->kernel += val;
else
bpfptr->user += val;
}
static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src,
size_t offset, size_t size)
{
return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size);
}
static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size)
{
return copy_from_bpfptr_offset(dst, src, 0, size);
}
static inline int copy_to_bpfptr_offset(bpfptr_t dst, size_t offset,
const void *src, size_t size)
{
return copy_to_sockptr_offset((sockptr_t) dst, offset, src, size);
}
static inline void *memdup_bpfptr(bpfptr_t src, size_t len)
{
return memdup_sockptr((sockptr_t) src, len);
}
static inline long strncpy_from_bpfptr(char *dst, bpfptr_t src, size_t count)
{
return strncpy_from_sockptr(dst, (sockptr_t) src, count);
}
#endif /* _LINUX_BPFPTR_H */
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
#define _LINUX_BTF_H 1 #define _LINUX_BTF_H 1
#include <linux/types.h> #include <linux/types.h>
#include <linux/bpfptr.h>
#include <uapi/linux/btf.h> #include <uapi/linux/btf.h>
#include <uapi/linux/bpf.h> #include <uapi/linux/bpf.h>
...@@ -18,8 +19,9 @@ struct btf_show; ...@@ -18,8 +19,9 @@ struct btf_show;
extern const struct file_operations btf_fops; extern const struct file_operations btf_fops;
void btf_get(struct btf *btf);
void btf_put(struct btf *btf); void btf_put(struct btf *btf);
int btf_new_fd(const union bpf_attr *attr); int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr);
struct btf *btf_get_by_fd(int fd); struct btf *btf_get_by_fd(int fd);
int btf_get_info_by_fd(const struct btf *btf, int btf_get_info_by_fd(const struct btf *btf,
const union bpf_attr *attr, const union bpf_attr *attr,
...@@ -88,7 +90,11 @@ int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj, ...@@ -88,7 +90,11 @@ int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
char *buf, int len, u64 flags); char *buf, int len, u64 flags);
int btf_get_fd_by_id(u32 id); int btf_get_fd_by_id(u32 id);
u32 btf_id(const struct btf *btf); u32 btf_obj_id(const struct btf *btf);
bool btf_is_kernel(const struct btf *btf);
bool btf_is_module(const struct btf *btf);
struct module *btf_try_get_module(const struct btf *btf);
u32 btf_nr_types(const struct btf *btf);
bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
const struct btf_member *m, const struct btf_member *m,
u32 expected_offset, u32 expected_size); u32 expected_offset, u32 expected_size);
...@@ -104,6 +110,7 @@ const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf, ...@@ -104,6 +110,7 @@ const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
const struct btf_type * const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type, btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size); u32 *type_size);
const char *btf_type_str(const struct btf_type *t);
#define for_each_member(i, struct_type, member) \ #define for_each_member(i, struct_type, member) \
for (i = 0, member = btf_type_member(struct_type); \ for (i = 0, member = btf_type_member(struct_type); \
...@@ -135,6 +142,58 @@ static inline bool btf_type_is_enum(const struct btf_type *t) ...@@ -135,6 +142,58 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM; return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
} }
static inline bool str_is_empty(const char *s)
{
return !s || !s[0];
}
static inline u16 btf_kind(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info);
}
static inline bool btf_is_enum(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_ENUM;
}
static inline bool btf_is_composite(const struct btf_type *t)
{
u16 kind = btf_kind(t);
return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
}
static inline bool btf_is_array(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_ARRAY;
}
static inline bool btf_is_int(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_INT;
}
static inline bool btf_is_ptr(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_PTR;
}
static inline u8 btf_int_offset(const struct btf_type *t)
{
return BTF_INT_OFFSET(*(u32 *)(t + 1));
}
static inline u8 btf_int_encoding(const struct btf_type *t)
{
return BTF_INT_ENCODING(*(u32 *)(t + 1));
}
static inline bool btf_type_is_scalar(const struct btf_type *t)
{
return btf_type_is_int(t) || btf_type_is_enum(t);
}
static inline bool btf_type_is_typedef(const struct btf_type *t) static inline bool btf_type_is_typedef(const struct btf_type *t)
{ {
return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF; return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
...@@ -170,6 +229,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t) ...@@ -170,6 +229,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t)
return BTF_INFO_VLEN(t->info); return BTF_INFO_VLEN(t->info);
} }
static inline u16 btf_vlen(const struct btf_type *t)
{
return btf_type_vlen(t);
}
static inline u16 btf_func_linkage(const struct btf_type *t) static inline u16 btf_func_linkage(const struct btf_type *t)
{ {
return BTF_INFO_VLEN(t->info); return BTF_INFO_VLEN(t->info);
...@@ -180,25 +244,54 @@ static inline bool btf_type_kflag(const struct btf_type *t) ...@@ -180,25 +244,54 @@ static inline bool btf_type_kflag(const struct btf_type *t)
return BTF_INFO_KFLAG(t->info); return BTF_INFO_KFLAG(t->info);
} }
static inline u32 btf_member_bit_offset(const struct btf_type *struct_type, static inline u32 __btf_member_bit_offset(const struct btf_type *struct_type,
const struct btf_member *member) const struct btf_member *member)
{ {
return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset) return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset)
: member->offset; : member->offset;
} }
static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type, static inline u32 __btf_member_bitfield_size(const struct btf_type *struct_type,
const struct btf_member *member) const struct btf_member *member)
{ {
return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset) return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset)
: 0; : 0;
} }
static inline struct btf_member *btf_members(const struct btf_type *t)
{
return (struct btf_member *)(t + 1);
}
static inline u32 btf_member_bit_offset(const struct btf_type *t, u32 member_idx)
{
const struct btf_member *m = btf_members(t) + member_idx;
return __btf_member_bit_offset(t, m);
}
static inline u32 btf_member_bitfield_size(const struct btf_type *t, u32 member_idx)
{
const struct btf_member *m = btf_members(t) + member_idx;
return __btf_member_bitfield_size(t, m);
}
static inline const struct btf_member *btf_type_member(const struct btf_type *t) static inline const struct btf_member *btf_type_member(const struct btf_type *t)
{ {
return (const struct btf_member *)(t + 1); return (const struct btf_member *)(t + 1);
} }
static inline struct btf_array *btf_array(const struct btf_type *t)
{
return (struct btf_array *)(t + 1);
}
static inline struct btf_enum *btf_enum(const struct btf_type *t)
{
return (struct btf_enum *)(t + 1);
}
static inline const struct btf_var_secinfo *btf_type_var_secinfo( static inline const struct btf_var_secinfo *btf_type_var_secinfo(
const struct btf_type *t) const struct btf_type *t)
{ {
...@@ -206,6 +299,8 @@ static inline const struct btf_var_secinfo *btf_type_var_secinfo( ...@@ -206,6 +299,8 @@ static inline const struct btf_var_secinfo *btf_type_var_secinfo(
} }
#ifdef CONFIG_BPF_SYSCALL #ifdef CONFIG_BPF_SYSCALL
struct bpf_prog;
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset); const char *btf_name_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void); struct btf *btf_parse_vmlinux(void);
...@@ -223,4 +318,40 @@ static inline const char *btf_name_by_offset(const struct btf *btf, ...@@ -223,4 +318,40 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
} }
#endif #endif
struct kfunc_btf_id_set {
struct list_head list;
struct btf_id_set *set;
struct module *owner;
};
struct kfunc_btf_id_list;
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l,
struct kfunc_btf_id_set *s);
void unregister_kfunc_btf_id_set(struct kfunc_btf_id_list *l,
struct kfunc_btf_id_set *s);
bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id,
struct module *owner);
#else
static inline void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l,
struct kfunc_btf_id_set *s)
{
}
static inline void unregister_kfunc_btf_id_set(struct kfunc_btf_id_list *l,
struct kfunc_btf_id_set *s)
{
}
static inline bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist,
u32 kfunc_id, struct module *owner)
{
return false;
}
#endif
#define DEFINE_KFUNC_BTF_ID_SET(set, name) \
struct kfunc_btf_id_set name = { LIST_HEAD_INIT(name.list), (set), \
THIS_MODULE }
extern struct kfunc_btf_id_list prog_test_kfunc_list;
#endif #endif
...@@ -22,6 +22,7 @@ ...@@ -22,6 +22,7 @@
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/sockptr.h> #include <linux/sockptr.h>
#include <crypto/sha1.h> #include <crypto/sha1.h>
#include <linux/u64_stats_sync.h>
#include <net/sch_generic.h> #include <net/sch_generic.h>
...@@ -264,15 +265,32 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) ...@@ -264,15 +265,32 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
.off = OFF, \ .off = OFF, \
.imm = 0 }) .imm = 0 })
/* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */
#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \ /*
* Atomic operations:
*
* BPF_ADD *(uint *) (dst_reg + off16) += src_reg
* BPF_AND *(uint *) (dst_reg + off16) &= src_reg
* BPF_OR *(uint *) (dst_reg + off16) |= src_reg
* BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg
* BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
* BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
* BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
* BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
* BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg)
* BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
*/
#define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \
((struct bpf_insn) { \ ((struct bpf_insn) { \
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
.dst_reg = DST, \ .dst_reg = DST, \
.src_reg = SRC, \ .src_reg = SRC, \
.off = OFF, \ .off = OFF, \
.imm = 0 }) .imm = OP })
/* Legacy alias */
#define BPF_STX_XADD(SIZE, DST, SRC, OFF) BPF_ATOMIC_OP(SIZE, BPF_ADD, DST, SRC, OFF)
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */ /* Memory store, *(uint *) (dst_reg + off16) = imm32 */
...@@ -537,6 +555,13 @@ struct bpf_binary_header { ...@@ -537,6 +555,13 @@ struct bpf_binary_header {
u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); u8 image[] __aligned(BPF_IMAGE_ALIGNMENT);
}; };
struct bpf_prog_stats {
u64 cnt;
u64 nsecs;
u64 misses;
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
struct bpf_prog { struct bpf_prog {
u16 pages; /* Number of allocated pages */ u16 pages; /* Number of allocated pages */
u16 jited:1, /* Is our filter JIT'ed? */ u16 jited:1, /* Is our filter JIT'ed? */
...@@ -555,10 +580,12 @@ struct bpf_prog { ...@@ -555,10 +580,12 @@ struct bpf_prog {
u32 len; /* Number of filter blocks */ u32 len; /* Number of filter blocks */
u32 jited_len; /* Size of jited insns in bytes */ u32 jited_len; /* Size of jited insns in bytes */
u8 tag[BPF_TAG_SIZE]; u8 tag[BPF_TAG_SIZE];
struct bpf_prog_aux *aux; /* Auxiliary fields */ struct bpf_prog_stats __percpu *stats;
struct sock_fprog_kern *orig_prog; /* Original BPF program */ int __percpu *active;
unsigned int (*bpf_func)(const void *ctx, unsigned int (*bpf_func)(const void *ctx,
const struct bpf_insn *insn); const struct bpf_insn *insn);
struct bpf_prog_aux *aux; /* Auxiliary fields */
struct sock_fprog_kern *orig_prog; /* Original BPF program */
/* Instructions for interpreter */ /* Instructions for interpreter */
struct sock_filter insns[0]; struct sock_filter insns[0];
struct bpf_insn insnsi[]; struct bpf_insn insnsi[];
...@@ -579,7 +606,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); ...@@ -579,7 +606,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
struct bpf_prog_stats *__stats; \ struct bpf_prog_stats *__stats; \
u64 __start = sched_clock(); \ u64 __start = sched_clock(); \
__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \
__stats = this_cpu_ptr(prog->aux->stats); \ __stats = this_cpu_ptr(prog->stats); \
u64_stats_update_begin(&__stats->syncp); \ u64_stats_update_begin(&__stats->syncp); \
__stats->cnt++; \ __stats->cnt++; \
__stats->nsecs += sched_clock() - __start; \ __stats->nsecs += sched_clock() - __start; \
...@@ -864,8 +891,7 @@ void bpf_prog_free_linfo(struct bpf_prog *prog); ...@@ -864,8 +891,7 @@ void bpf_prog_free_linfo(struct bpf_prog *prog);
void bpf_prog_fill_jited_linfo(struct bpf_prog *prog, void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
const u32 *insn_to_jit_off); const u32 *insn_to_jit_off);
int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog); int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog);
void bpf_prog_free_jited_linfo(struct bpf_prog *prog); void bpf_prog_jit_attempt_done(struct bpf_prog *prog);
void bpf_prog_free_unused_jited_linfo(struct bpf_prog *prog);
struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags); struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags);
struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags); struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags);
...@@ -906,6 +932,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); ...@@ -906,6 +932,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog); struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog); void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void); bool bpf_jit_needs_zext(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_helper_changes_pkt_data(void *func); bool bpf_helper_changes_pkt_data(void *func);
static inline bool bpf_dump_raw_ok(const struct cred *cred) static inline bool bpf_dump_raw_ok(const struct cred *cred)
......
...@@ -481,6 +481,10 @@ struct module { ...@@ -481,6 +481,10 @@ struct module {
unsigned int num_bpf_raw_events; unsigned int num_bpf_raw_events;
struct bpf_raw_event_map *bpf_raw_events; struct bpf_raw_event_map *bpf_raw_events;
#endif #endif
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
unsigned int btf_data_size;
void *btf_data;
#endif
#ifdef CONFIG_JUMP_LABEL #ifdef CONFIG_JUMP_LABEL
struct jump_entry *jump_entries; struct jump_entry *jump_entries;
unsigned int num_jump_entries; unsigned int num_jump_entries;
......
...@@ -19,7 +19,8 @@ ...@@ -19,7 +19,8 @@
/* ld/ldx fields */ /* ld/ldx fields */
#define BPF_DW 0x18 /* double word (64-bit) */ #define BPF_DW 0x18 /* double word (64-bit) */
#define BPF_XADD 0xc0 /* exclusive add */ #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */
#define BPF_XADD 0xc0 /* exclusive add - legacy name */
/* alu/jmp fields */ /* alu/jmp fields */
#define BPF_MOV 0xb0 /* mov reg to reg */ #define BPF_MOV 0xb0 /* mov reg to reg */
...@@ -43,6 +44,11 @@ ...@@ -43,6 +44,11 @@
#define BPF_CALL 0x80 /* function call */ #define BPF_CALL 0x80 /* function call */
#define BPF_EXIT 0x90 /* function return */ #define BPF_EXIT 0x90 /* function return */
/* atomic op type fields (stored in immediate) */
#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */
#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */
#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */
/* Register numbers */ /* Register numbers */
enum { enum {
BPF_REG_0 = 0, BPF_REG_0 = 0,
...@@ -100,6 +106,7 @@ enum bpf_cmd { ...@@ -100,6 +106,7 @@ enum bpf_cmd {
BPF_PROG_ATTACH, BPF_PROG_ATTACH,
BPF_PROG_DETACH, BPF_PROG_DETACH,
BPF_PROG_TEST_RUN, BPF_PROG_TEST_RUN,
BPF_PROG_RUN = BPF_PROG_TEST_RUN,
BPF_PROG_GET_NEXT_ID, BPF_PROG_GET_NEXT_ID,
BPF_MAP_GET_NEXT_ID, BPF_MAP_GET_NEXT_ID,
BPF_PROG_GET_FD_BY_ID, BPF_PROG_GET_FD_BY_ID,
...@@ -200,6 +207,7 @@ enum bpf_prog_type { ...@@ -200,6 +207,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_LSM,
BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SCHED, BPF_PROG_TYPE_SCHED,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
}; };
enum bpf_attach_type { enum bpf_attach_type {
...@@ -360,8 +368,8 @@ enum bpf_link_type { ...@@ -360,8 +368,8 @@ enum bpf_link_type {
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions: * the following extensions:
* *
* insn[0].src_reg: BPF_PSEUDO_MAP_FD * insn[0].src_reg: BPF_PSEUDO_MAP_[FD|IDX]
* insn[0].imm: map fd * insn[0].imm: map fd or fd_idx
* insn[1].imm: 0 * insn[1].imm: 0
* insn[0].off: 0 * insn[0].off: 0
* insn[1].off: 0 * insn[1].off: 0
...@@ -369,8 +377,10 @@ enum bpf_link_type { ...@@ -369,8 +377,10 @@ enum bpf_link_type {
* verifier type: CONST_PTR_TO_MAP * verifier type: CONST_PTR_TO_MAP
*/ */
#define BPF_PSEUDO_MAP_FD 1 #define BPF_PSEUDO_MAP_FD 1
/* insn[0].src_reg: BPF_PSEUDO_MAP_VALUE #define BPF_PSEUDO_MAP_IDX 5
* insn[0].imm: map fd
/* insn[0].src_reg: BPF_PSEUDO_MAP_[IDX_]VALUE
* insn[0].imm: map fd or fd_idx
* insn[1].imm: offset into value * insn[1].imm: offset into value
* insn[0].off: 0 * insn[0].off: 0
* insn[1].off: 0 * insn[1].off: 0
...@@ -378,6 +388,8 @@ enum bpf_link_type { ...@@ -378,6 +388,8 @@ enum bpf_link_type {
* verifier type: PTR_TO_MAP_VALUE * verifier type: PTR_TO_MAP_VALUE
*/ */
#define BPF_PSEUDO_MAP_VALUE 2 #define BPF_PSEUDO_MAP_VALUE 2
#define BPF_PSEUDO_MAP_IDX_VALUE 6
/* insn[0].src_reg: BPF_PSEUDO_BTF_ID /* insn[0].src_reg: BPF_PSEUDO_BTF_ID
* insn[0].imm: kernel btd id of VAR * insn[0].imm: kernel btd id of VAR
* insn[1].imm: 0 * insn[1].imm: 0
...@@ -388,11 +400,24 @@ enum bpf_link_type { ...@@ -388,11 +400,24 @@ enum bpf_link_type {
* is struct/union. * is struct/union.
*/ */
#define BPF_PSEUDO_BTF_ID 3 #define BPF_PSEUDO_BTF_ID 3
/* insn[0].src_reg: BPF_PSEUDO_FUNC
* insn[0].imm: insn offset to the func
* insn[1].imm: 0
* insn[0].off: 0
* insn[1].off: 0
* ldimm64 rewrite: address of the function
* verifier type: PTR_TO_FUNC.
*/
#define BPF_PSEUDO_FUNC 4
/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function * offset to another bpf function
*/ */
#define BPF_PSEUDO_CALL 1 #define BPF_PSEUDO_CALL 1
/* when bpf_call->src_reg == BPF_PSEUDO_KFUNC_CALL,
* bpf_call->imm == btf_id of a BTF_KIND_FUNC in the running kernel
*/
#define BPF_PSEUDO_KFUNC_CALL 2
/* flags for BPF_MAP_UPDATE_ELEM command */ /* flags for BPF_MAP_UPDATE_ELEM command */
enum { enum {
...@@ -558,7 +583,16 @@ union bpf_attr { ...@@ -558,7 +583,16 @@ union bpf_attr {
__aligned_u64 line_info; /* line info */ __aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */ __u32 line_info_cnt; /* number of bpf_line_info records */
__u32 attach_btf_id; /* in-kernel BTF type id to attach to */ __u32 attach_btf_id; /* in-kernel BTF type id to attach to */
__u32 attach_prog_fd; /* 0 to attach to vmlinux */ union {
/* valid prog_fd to attach to bpf prog */
__u32 attach_prog_fd;
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
}; };
struct { /* anonymous struct used by BPF_OBJ_* commands */ struct { /* anonymous struct used by BPF_OBJ_* commands */
...@@ -2444,7 +2478,7 @@ union bpf_attr { ...@@ -2444,7 +2478,7 @@ union bpf_attr {
* running simultaneously. * running simultaneously.
* *
* A user should care about the synchronization by himself. * A user should care about the synchronization by himself.
* For example, by using the **BPF_STX_XADD** instruction to alter * For example, by using the **BPF_ATOMIC** instructions to alter
* the shared data. * the shared data.
* Return * Return
* A pointer to the local storage area. * A pointer to the local storage area.
...@@ -2989,10 +3023,10 @@ union bpf_attr { ...@@ -2989,10 +3023,10 @@ union bpf_attr {
* string length is larger than *size*, just *size*-1 bytes are * string length is larger than *size*, just *size*-1 bytes are
* copied and the last byte is set to NUL. * copied and the last byte is set to NUL.
* *
* On success, the length of the copied string is returned. This * On success, returns the number of bytes that were written,
* makes this helper useful in tracing programs for reading * including the terminal NUL. This makes this helper useful in
* strings, and more importantly to get its length at runtime. See * tracing programs for reading strings, and more importantly to
* the following snippet: * get its length at runtime. See the following snippet:
* *
* :: * ::
* *
...@@ -3020,7 +3054,7 @@ union bpf_attr { ...@@ -3020,7 +3054,7 @@ union bpf_attr {
* **->mm->env_start**: using this helper and the return value, * **->mm->env_start**: using this helper and the return value,
* one can quickly iterate at the right offset of the memory area. * one can quickly iterate at the right offset of the memory area.
* Return * Return
* On success, the strictly positive length of the string, * On success, the strictly positive length of the output string,
* including the trailing NUL character. On error, a negative * including the trailing NUL character. On error, a negative
* value. * value.
* *
...@@ -3312,12 +3346,20 @@ union bpf_attr { ...@@ -3312,12 +3346,20 @@ union bpf_attr {
* of new data availability is sent. * of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally. * of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* An adaptive notification is a notification sent whenever the user-space
* process has caught up and consumed all available payloads. In case the user-space
* process is still processing a previous payload, then no notification is needed
* as it will process the newly added payload automatically.
* Return * Return
* 0 on success, or a negative error in case of failure. * 0 on success, or a negative error in case of failure.
* *
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags) * void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description * Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*. * Reserve *size* bytes of payload in a ring buffer *ringbuf*.
* *flags* must be 0.
* Return * Return
* Valid pointer with *size* bytes of memory available; NULL, * Valid pointer with *size* bytes of memory available; NULL,
* otherwise. * otherwise.
...@@ -3329,6 +3371,10 @@ union bpf_attr { ...@@ -3329,6 +3371,10 @@ union bpf_attr {
* of new data availability is sent. * of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally. * of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* See 'bpf_ringbuf_output()' for the definition of adaptive notification.
* Return * Return
* Nothing. Always succeeds. * Nothing. Always succeeds.
* *
...@@ -3339,6 +3385,10 @@ union bpf_attr { ...@@ -3339,6 +3385,10 @@ union bpf_attr {
* of new data availability is sent. * of new data availability is sent.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally. * of new data availability is sent unconditionally.
* If **0** is specified in *flags*, an adaptive notification
* of new data availability is sent.
*
* See 'bpf_ringbuf_output()' for the definition of adaptive notification.
* Return * Return
* Nothing. Always succeeds. * Nothing. Always succeeds.
* *
...@@ -3912,6 +3962,94 @@ union bpf_attr { ...@@ -3912,6 +3962,94 @@ union bpf_attr {
* set cpus_ptr in task. * set cpus_ptr in task.
* Return * Return
* 0 on success, or a negative error in case of failure. * 0 on success, or a negative error in case of failure.
* long bpf_for_each_map_elem(struct bpf_map *map, void *callback_fn, void *callback_ctx, u64 flags)
* Description
* For each element in **map**, call **callback_fn** function with
* **map**, **callback_ctx** and other map-specific parameters.
* The **callback_fn** should be a static function and
* the **callback_ctx** should be a pointer to the stack.
* The **flags** is used to control certain aspects of the helper.
* Currently, the **flags** must be 0.
*
* The following are a list of supported map types and their
* respective expected callback signatures:
*
* BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH,
* BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH,
* BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PERCPU_ARRAY
*
* long (\*callback_fn)(struct bpf_map \*map, const void \*key, void \*value, void \*ctx);
*
* For per_cpu maps, the map_value is the value on the cpu where the
* bpf_prog is running.
*
* If **callback_fn** return 0, the helper will continue to the next
* element. If return value is 1, the helper will skip the rest of
* elements and return. Other return values are not used now.
*
* Return
* The number of traversed map elements for success, **-EINVAL** for
* invalid **flags**.
*
* long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len)
* Description
* Outputs a string into the **str** buffer of size **str_size**
* based on a format string stored in a read-only map pointed by
* **fmt**.
*
* Each format specifier in **fmt** corresponds to one u64 element
* in the **data** array. For strings and pointers where pointees
* are accessed, only the pointer values are stored in the *data*
* array. The *data_len* is the size of *data* in bytes.
*
* Formats **%s** and **%p{i,I}{4,6}** require to read kernel
* memory. Reading kernel memory may fail due to either invalid
* address or valid address but requiring a major memory fault. If
* reading kernel memory fails, the string for **%s** will be an
* empty string, and the ip address for **%p{i,I}{4,6}** will be 0.
* Not returning error to bpf program is consistent with what
* **bpf_trace_printk**\ () does for now.
*
* Return
* The strictly positive length of the formatted string, including
* the trailing zero character. If the return value is greater than
* **str_size**, **str** contains a truncated string, guaranteed to
* be zero-terminated except when **str_size** is 0.
*
* Or **-EBUSY** if the per-CPU memory copy buffer is busy.
*
* long bpf_sys_bpf(u32 cmd, void *attr, u32 attr_size)
* Description
* Execute bpf syscall with given arguments.
* Return
* A syscall result.
*
* long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
* Description
* Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
* Return
* Returns btf_id and btf_obj_fd in lower and upper 32 bits.
*
* long bpf_sys_close(u32 fd)
* Description
* Execute close syscall for given FD.
* Return
* A syscall result.
*
* long bpf_kallsyms_lookup_name(const char *name, int name_sz, int flags, u64 *res)
* Description
* Get the address of a kernel symbol, returned in *res*. *res* is
* set to 0 if the symbol is not found.
* Return
* On success, zero. On error, a negative value.
*
* **-EINVAL** if *flags* is not zero.
*
* **-EINVAL** if string *name* is not the same size as *name_sz*.
*
* **-ENOENT** if symbol is not found.
*
* **-EPERM** if caller does not have permission to obtain kernel address.
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
...@@ -4091,6 +4229,12 @@ union bpf_attr { ...@@ -4091,6 +4229,12 @@ union bpf_attr {
FN(cpumask_op), \ FN(cpumask_op), \
FN(cpus_share_cache), \ FN(cpus_share_cache), \
FN(sched_set_task_cpus_ptr), \ FN(sched_set_task_cpus_ptr), \
FN(for_each_map_elem), \
FN(snprintf), \
FN(sys_bpf), \
FN(btf_find_by_name_kind), \
FN(sys_close), \
FN(kallsyms_lookup_name), \
/* */ /* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
...@@ -4587,6 +4731,8 @@ struct bpf_prog_info { ...@@ -4587,6 +4731,8 @@ struct bpf_prog_info {
__aligned_u64 prog_tags; __aligned_u64 prog_tags;
__u64 run_time_ns; __u64 run_time_ns;
__u64 run_cnt; __u64 run_cnt;
__u64 recursion_misses;
__u32 verified_insns;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_map_info { struct bpf_map_info {
...@@ -4610,6 +4756,9 @@ struct bpf_btf_info { ...@@ -4610,6 +4756,9 @@ struct bpf_btf_info {
__aligned_u64 btf; __aligned_u64 btf;
__u32 btf_size; __u32 btf_size;
__u32 id; __u32 id;
__aligned_u64 name;
__u32 name_len;
__u32 kernel_btf;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_link_info { struct bpf_link_info {
...@@ -4623,6 +4772,8 @@ struct bpf_link_info { ...@@ -4623,6 +4772,8 @@ struct bpf_link_info {
} raw_tracepoint; } raw_tracepoint;
struct { struct {
__u32 attach_type; __u32 attach_type;
__u32 target_obj_id; /* prog_id for PROG_EXT, otherwise btf object id */
__u32 target_btf_id; /* BTF type id inside the object */
} tracing; } tracing;
struct { struct {
__u64 cgroup_id; __u64 cgroup_id;
...@@ -5198,7 +5349,10 @@ struct bpf_pidns_info { ...@@ -5198,7 +5349,10 @@ struct bpf_pidns_info {
/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */ /* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
struct bpf_sk_lookup { struct bpf_sk_lookup {
union {
__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */ __bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
__u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */
};
__u32 family; /* Protocol family (AF_INET, AF_INET6) */ __u32 family; /* Protocol family (AF_INET, AF_INET6) */
__u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */ __u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
...@@ -5240,4 +5394,78 @@ enum { ...@@ -5240,4 +5394,78 @@ enum {
BTF_F_ZERO = (1ULL << 3), BTF_F_ZERO = (1ULL << 3),
}; };
/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
* has to be adjusted by relocations. It is emitted by llvm and passed to
* libbpf and later to the kernel.
*/
enum bpf_core_relo_kind {
BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */
BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */
BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */
BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */
BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */
BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */
BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */
BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */
BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */
BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */
BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */
};
/*
* "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf
* and from libbpf to the kernel.
*
* CO-RE relocation captures the following data:
* - insn_off - instruction offset (in bytes) within a BPF program that needs
* its insn->imm field to be relocated with actual field info;
* - type_id - BTF type ID of the "root" (containing) entity of a relocatable
* type or field;
* - access_str_off - offset into corresponding .BTF string section. String
* interpretation depends on specific relocation kind:
* - for field-based relocations, string encodes an accessed field using
* a sequence of field and array indices, separated by colon (:). It's
* conceptually very close to LLVM's getelementptr ([0]) instruction's
* arguments for identifying offset to a field.
* - for type-based relocations, strings is expected to be just "0";
* - for enum value-based relocations, string contains an index of enum
* value within its enum type;
* - kind - one of enum bpf_core_relo_kind;
*
* Example:
* struct sample {
* int a;
* struct {
* int b[10];
* };
* };
*
* struct sample *s = ...;
* int *x = &s->a; // encoded as "0:0" (a is field #0)
* int *y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1,
* // b is field #0 inside anon struct, accessing elem #5)
* int *z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
*
* type_id for all relocs in this example will capture BTF type id of
* `struct sample`.
*
* Such relocation is emitted when using __builtin_preserve_access_index()
* Clang built-in, passing expression that captures field address, e.g.:
*
* bpf_probe_read(&dst, sizeof(dst),
* __builtin_preserve_access_index(&src->a.b.c));
*
* In this case Clang will emit field relocation recording necessary data to
* be able to find offset of embedded `a.b.c` field within `src` struct.
*
* [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
*/
struct bpf_core_relo {
__u32 insn_off;
__u32 type_id;
__u32 access_str_off;
enum bpf_core_relo_kind kind;
};
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* _UAPI__LINUX_BPF_H__ */
...@@ -43,7 +43,7 @@ struct btf_type { ...@@ -43,7 +43,7 @@ struct btf_type {
* "size" tells the size of the type it is describing. * "size" tells the size of the type it is describing.
* *
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC, FUNC_PROTO and VAR. * FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
* "type" is a type_id referring to another type. * "type" is a type_id referring to another type.
*/ */
union { union {
...@@ -52,28 +52,34 @@ struct btf_type { ...@@ -52,28 +52,34 @@ struct btf_type {
}; };
}; };
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f) #define BTF_INFO_KIND(info) (((info) >> 24) & 0x1f)
#define BTF_INFO_VLEN(info) ((info) & 0xffff) #define BTF_INFO_VLEN(info) ((info) & 0xffff)
#define BTF_INFO_KFLAG(info) ((info) >> 31) #define BTF_INFO_KFLAG(info) ((info) >> 31)
#define BTF_KIND_UNKN 0 /* Unknown */ enum {
#define BTF_KIND_INT 1 /* Integer */ BTF_KIND_UNKN = 0, /* Unknown */
#define BTF_KIND_PTR 2 /* Pointer */ BTF_KIND_INT = 1, /* Integer */
#define BTF_KIND_ARRAY 3 /* Array */ BTF_KIND_PTR = 2, /* Pointer */
#define BTF_KIND_STRUCT 4 /* Struct */ BTF_KIND_ARRAY = 3, /* Array */
#define BTF_KIND_UNION 5 /* Union */ BTF_KIND_STRUCT = 4, /* Struct */
#define BTF_KIND_ENUM 6 /* Enumeration */ BTF_KIND_UNION = 5, /* Union */
#define BTF_KIND_FWD 7 /* Forward */ BTF_KIND_ENUM = 6, /* Enumeration */
#define BTF_KIND_TYPEDEF 8 /* Typedef */ BTF_KIND_FWD = 7, /* Forward */
#define BTF_KIND_VOLATILE 9 /* Volatile */ BTF_KIND_TYPEDEF = 8, /* Typedef */
#define BTF_KIND_CONST 10 /* Const */ BTF_KIND_VOLATILE = 9, /* Volatile */
#define BTF_KIND_RESTRICT 11 /* Restrict */ BTF_KIND_CONST = 10, /* Const */
#define BTF_KIND_FUNC 12 /* Function */ BTF_KIND_RESTRICT = 11, /* Restrict */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ BTF_KIND_FUNC = 12, /* Function */
#define BTF_KIND_VAR 14 /* Variable */ BTF_KIND_FUNC_PROTO = 13, /* Function Proto */
#define BTF_KIND_DATASEC 15 /* Section */ BTF_KIND_VAR = 14, /* Variable */
#define BTF_KIND_MAX BTF_KIND_DATASEC BTF_KIND_DATASEC = 15, /* Section */
#define NR_BTF_KINDS (BTF_KIND_MAX + 1) BTF_KIND_FLOAT = 16, /* Floating point */
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
BTF_KIND_TYPE_TAG = 18, /* Type Tag */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
};
/* For some specific BTF_KIND, "struct btf_type" is immediately /* For some specific BTF_KIND, "struct btf_type" is immediately
* followed by extra data. * followed by extra data.
...@@ -169,4 +175,15 @@ struct btf_var_secinfo { ...@@ -169,4 +175,15 @@ struct btf_var_secinfo {
__u32 size; __u32 size;
}; };
/* BTF_KIND_DECL_TAG is followed by a single "struct btf_decl_tag" to describe
* additional information related to the tag applied location.
* If component_idx == -1, the tag is applied to a struct, union,
* variable or function. Otherwise, it is applied to a struct/union
* member or a func argument, and component_idx indicates which member
* or argument (0 ... vlen-1).
*/
struct btf_decl_tag {
__s32 component_idx;
};
#endif /* _UAPI__LINUX_BTF_H__ */ #endif /* _UAPI__LINUX_BTF_H__ */
...@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o ...@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif endif
obj-$(CONFIG_BPF_PRELOAD) += preload/ obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
$(call if_changed_rule,cc_o_c)
...@@ -459,15 +459,16 @@ bool bpf_link_is_iter(struct bpf_link *link) ...@@ -459,15 +459,16 @@ bool bpf_link_is_iter(struct bpf_link *link)
return link->ops == &bpf_iter_link_lops; return link->ops == &bpf_iter_link_lops;
} }
int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
struct bpf_prog *prog)
{ {
union bpf_iter_link_info __user *ulinfo;
struct bpf_link_primer link_primer; struct bpf_link_primer link_primer;
struct bpf_iter_target_info *tinfo; struct bpf_iter_target_info *tinfo;
union bpf_iter_link_info linfo; union bpf_iter_link_info linfo;
struct bpf_iter_link *link; struct bpf_iter_link *link;
u32 prog_btf_id, linfo_len; u32 prog_btf_id, linfo_len;
bool existed = false; bool existed = false;
bpfptr_t ulinfo;
int err; int err;
if (attr->link_create.target_fd || attr->link_create.flags) if (attr->link_create.target_fd || attr->link_create.flags)
...@@ -475,18 +476,18 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) ...@@ -475,18 +476,18 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
memset(&linfo, 0, sizeof(union bpf_iter_link_info)); memset(&linfo, 0, sizeof(union bpf_iter_link_info));
ulinfo = u64_to_user_ptr(attr->link_create.iter_info); ulinfo = make_bpfptr(attr->link_create.iter_info, uattr.is_kernel);
linfo_len = attr->link_create.iter_info_len; linfo_len = attr->link_create.iter_info_len;
if (!ulinfo ^ !linfo_len) if (bpfptr_is_null(ulinfo) ^ !linfo_len)
return -EINVAL; return -EINVAL;
if (ulinfo) { if (!bpfptr_is_null(ulinfo)) {
err = bpf_check_uarg_tail_zero(ulinfo, sizeof(linfo), err = bpf_check_uarg_tail_zero(ulinfo, sizeof(linfo),
linfo_len); linfo_len);
if (err) if (err)
return err; return err;
linfo_len = min_t(u32, linfo_len, sizeof(linfo)); linfo_len = min_t(u32, linfo_len, sizeof(linfo));
if (copy_from_user(&linfo, ulinfo, linfo_len)) if (copy_from_bpfptr(&linfo, ulinfo, linfo_len))
return -EFAULT; return -EFAULT;
} }
...@@ -661,3 +662,19 @@ int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) ...@@ -661,3 +662,19 @@ int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx)
*/ */
return ret == 0 ? 0 : -EAGAIN; return ret == 0 ? 0 : -EAGAIN;
} }
BPF_CALL_4(bpf_for_each_map_elem, struct bpf_map *, map, void *, callback_fn,
void *, callback_ctx, u64, flags)
{
return map->ops->map_for_each_callback(map, callback_fn, callback_ctx, flags);
}
const struct bpf_func_proto bpf_for_each_map_elem_proto = {
.func = bpf_for_each_map_elem,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_FUNC,
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
.arg4_type = ARG_ANYTHING,
};
...@@ -161,7 +161,7 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) ...@@ -161,7 +161,7 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log)
break; break;
} }
if (btf_member_bitfield_size(t, member)) { if (__btf_member_bitfield_size(t, member)) {
pr_warn("bit field member %s in struct %s is not supported\n", pr_warn("bit field member %s in struct %s is not supported\n",
mname, st_ops->name); mname, st_ops->name);
break; break;
...@@ -292,7 +292,7 @@ static int check_zero_holes(const struct btf_type *t, void *data) ...@@ -292,7 +292,7 @@ static int check_zero_holes(const struct btf_type *t, void *data)
const struct btf_type *mtype; const struct btf_type *mtype;
for_each_member(i, t, member) { for_each_member(i, t, member) {
moff = btf_member_bit_offset(t, member) / 8; moff = __btf_member_bit_offset(t, member) / 8;
if (moff > prev_mend && if (moff > prev_mend &&
memchr_inv(data + prev_mend, 0, moff - prev_mend)) memchr_inv(data + prev_mend, 0, moff - prev_mend))
return -EINVAL; return -EINVAL;
...@@ -369,7 +369,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, ...@@ -369,7 +369,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
u32 moff; u32 moff;
u32 flags; u32 flags;
moff = btf_member_bit_offset(t, member) / 8; moff = __btf_member_bit_offset(t, member) / 8;
ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL); ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL);
if (ptype == module_type) { if (ptype == module_type) {
if (*(void **)(udata + moff)) if (*(void **)(udata + moff))
......
此差异已折叠。
此差异已折叠。
...@@ -19,16 +19,23 @@ static const char *__func_get_name(const struct bpf_insn_cbs *cbs, ...@@ -19,16 +19,23 @@ static const char *__func_get_name(const struct bpf_insn_cbs *cbs,
{ {
BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID); BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID);
if (insn->src_reg != BPF_PSEUDO_CALL && if (!insn->src_reg &&
insn->imm >= 0 && insn->imm < __BPF_FUNC_MAX_ID && insn->imm >= 0 && insn->imm < __BPF_FUNC_MAX_ID &&
func_id_str[insn->imm]) func_id_str[insn->imm])
return func_id_str[insn->imm]; return func_id_str[insn->imm];
if (cbs && cbs->cb_call) if (cbs && cbs->cb_call) {
return cbs->cb_call(cbs->private_data, insn); const char *res;
res = cbs->cb_call(cbs->private_data, insn);
if (res)
return res;
}
if (insn->src_reg == BPF_PSEUDO_CALL) if (insn->src_reg == BPF_PSEUDO_CALL)
snprintf(buff, len, "%+d", insn->imm); snprintf(buff, len, "%+d", insn->imm);
else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL)
snprintf(buff, len, "kernel-function");
return buff; return buff;
} }
...@@ -80,6 +87,13 @@ const char *const bpf_alu_string[16] = { ...@@ -80,6 +87,13 @@ const char *const bpf_alu_string[16] = {
[BPF_END >> 4] = "endian", [BPF_END >> 4] = "endian",
}; };
static const char *const bpf_atomic_alu_string[16] = {
[BPF_ADD >> 4] = "add",
[BPF_AND >> 4] = "and",
[BPF_OR >> 4] = "or",
[BPF_XOR >> 4] = "or",
};
static const char *const bpf_ldst_string[] = { static const char *const bpf_ldst_string[] = {
[BPF_W >> 3] = "u32", [BPF_W >> 3] = "u32",
[BPF_H >> 3] = "u16", [BPF_H >> 3] = "u16",
...@@ -153,14 +167,44 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, ...@@ -153,14 +167,44 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
bpf_ldst_string[BPF_SIZE(insn->code) >> 3], bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->dst_reg,
insn->off, insn->src_reg); insn->off, insn->src_reg);
else if (BPF_MODE(insn->code) == BPF_XADD) else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) += r%d\n", (insn->imm == BPF_ADD || insn->imm == BPF_ADD ||
insn->imm == BPF_OR || insn->imm == BPF_XOR)) {
verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n",
insn->code, insn->code,
bpf_ldst_string[BPF_SIZE(insn->code) >> 3], bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off, insn->dst_reg, insn->off,
bpf_alu_string[BPF_OP(insn->imm) >> 4],
insn->src_reg); insn->src_reg);
else } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
(insn->imm == (BPF_ADD | BPF_FETCH) ||
insn->imm == (BPF_AND | BPF_FETCH) ||
insn->imm == (BPF_OR | BPF_FETCH) ||
insn->imm == (BPF_XOR | BPF_FETCH))) {
verbose(cbs->private_data, "(%02x) r%d = atomic%s_fetch_%s((%s *)(r%d %+d), r%d)\n",
insn->code, insn->src_reg,
BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
bpf_atomic_alu_string[BPF_OP(insn->imm) >> 4],
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off, insn->src_reg);
} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
insn->imm == BPF_CMPXCHG) {
verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)\n",
insn->code,
BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off,
insn->src_reg);
} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
insn->imm == BPF_XCHG) {
verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)\n",
insn->code, insn->src_reg,
BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off, insn->src_reg);
} else {
verbose(cbs->private_data, "BUG_%02x\n", insn->code); verbose(cbs->private_data, "BUG_%02x\n", insn->code);
}
} else if (class == BPF_ST) { } else if (class == BPF_ST) {
if (BPF_MODE(insn->code) == BPF_MEM) { if (BPF_MODE(insn->code) == BPF_MEM) {
verbose(cbs->private_data, "(%02x) *(%s *)(r%d %+d) = %d\n", verbose(cbs->private_data, "(%02x) *(%s *)(r%d %+d) = %d\n",
......
...@@ -1122,7 +1122,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key, ...@@ -1122,7 +1122,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
...@@ -1174,7 +1174,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, ...@@ -1174,7 +1174,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
......
此差异已折叠。
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
LIBBPF_SRCS = $(srctree)/tools/lib/bpf/ LIBBPF_SRCS = $(srctree)/tools/lib/bpf/
LIBBPF_A = $(obj)/libbpf.a LIBBPF_OUT = $(abspath $(obj))/libbpf
LIBBPF_OUT = $(abspath $(obj)) LIBBPF_A = $(LIBBPF_OUT)/libbpf.a
LIBBPF_DESTDIR = $(LIBBPF_OUT)
LIBBPF_INCLUDE = $(LIBBPF_DESTDIR)/include
# Although not in use by libbpf's Makefile, set $(O) so that the "dummy" test # Although not in use by libbpf's Makefile, set $(O) so that the "dummy" test
# in tools/scripts/Makefile.include always succeeds when building the kernel # in tools/scripts/Makefile.include always succeeds when building the kernel
# with $(O) pointing to a relative path, as in "make O=build bindeb-pkg". # with $(O) pointing to a relative path, as in "make O=build bindeb-pkg".
$(LIBBPF_A): $(LIBBPF_A): | $(LIBBPF_OUT)
$(Q)$(MAKE) -C $(LIBBPF_SRCS) O=$(LIBBPF_OUT)/ OUTPUT=$(LIBBPF_OUT)/ $(LIBBPF_OUT)/libbpf.a $(Q)$(MAKE) -C $(LIBBPF_SRCS) O=$(LIBBPF_OUT)/ OUTPUT=$(LIBBPF_OUT)/ \
DESTDIR=$(LIBBPF_DESTDIR) prefix= \
$(LIBBPF_OUT)/libbpf.a install_headers
libbpf_hdrs: $(LIBBPF_A)
.PHONY: libbpf_hdrs
$(LIBBPF_OUT):
$(call msg,MKDIR,$@)
$(Q)mkdir -p $@
userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi \ userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi \
-I $(srctree)/tools/lib/ -Wno-unused-result -I $(LIBBPF_INCLUDE) -Wno-unused-result
userprogs := bpf_preload_umd userprogs := bpf_preload_umd
clean-files := $(userprogs) bpf_helper_defs.h FEATURE-DUMP.libbpf staticobjs/ feature/ clean-files := $(userprogs) bpf_helper_defs.h FEATURE-DUMP.libbpf staticobjs/ feature/
clean-files += $(LIBBPF_OUT) $(LIBBPF_DESTDIR)
$(obj)/iterators/iterators.o: | libbpf_hdrs
bpf_preload_umd-objs := iterators/iterators.o bpf_preload_umd-objs := iterators/iterators.o
bpf_preload_umd-userldlibs := $(LIBBPF_A) -lelf -lz bpf_preload_umd-userldlibs := $(LIBBPF_A) -lelf -lz
......
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
OUTPUT := .output OUTPUT := .output
abs_out := $(abspath $(OUTPUT))
CLANG ?= clang CLANG ?= clang
LLC ?= llc LLC ?= llc
LLVM_STRIP ?= llvm-strip LLVM_STRIP ?= llvm-strip
TOOLS_PATH := $(abspath ../../../../tools)
BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool
BPFTOOL_OUTPUT := $(abs_out)/bpftool
DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL) BPFTOOL ?= $(DEFAULT_BPFTOOL)
LIBBPF_SRC := $(abspath ../../../../tools/lib/bpf)
BPFOBJ := $(OUTPUT)/libbpf.a LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf
BPF_INCLUDE := $(OUTPUT) LIBBPF_OUTPUT := $(abs_out)/libbpf
INCLUDES := -I$(OUTPUT) -I$(BPF_INCLUDE) -I$(abspath ../../../../tools/lib) \ LIBBPF_DESTDIR := $(LIBBPF_OUTPUT)
-I$(abspath ../../../../tools/include/uapi) LIBBPF_INCLUDE := $(LIBBPF_DESTDIR)/include
BPFOBJ := $(LIBBPF_OUTPUT)/libbpf.a
INCLUDES := -I$(OUTPUT) -I$(LIBBPF_INCLUDE) -I$(TOOLS_PATH)/include/uapi
CFLAGS := -g -Wall CFLAGS := -g -Wall
abs_out := $(abspath $(OUTPUT))
ifeq ($(V),1) ifeq ($(V),1)
Q = Q =
msg = msg =
...@@ -44,14 +52,18 @@ $(OUTPUT)/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT) ...@@ -44,14 +52,18 @@ $(OUTPUT)/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT)
-c $(filter %.c,$^) -o $@ && \ -c $(filter %.c,$^) -o $@ && \
$(LLVM_STRIP) -g $@ $(LLVM_STRIP) -g $@
$(OUTPUT): $(OUTPUT) $(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT):
$(call msg,MKDIR,$@) $(call msg,MKDIR,$@)
$(Q)mkdir -p $(OUTPUT) $(Q)mkdir -p $@
$(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT) $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) \ $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) \
OUTPUT=$(abspath $(dir $@))/ $(abspath $@) OUTPUT=$(abspath $(dir $@))/ prefix= \
DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers
$(DEFAULT_BPFTOOL): $(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C ../../../../tools/bpf/bpftool \ $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) \
prefix= OUTPUT=$(abs_out)/ DESTDIR=$(abs_out) install OUTPUT=$(BPFTOOL_OUTPUT)/ \
LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \
LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ \
prefix= DESTDIR=$(abs_out)/ install-bin
...@@ -2,7 +2,6 @@ ...@@ -2,7 +2,6 @@
/* Copyright (c) 2020 Facebook */ /* Copyright (c) 2020 Facebook */
#include <linux/bpf.h> #include <linux/bpf.h>
#include <bpf/bpf_helpers.h> #include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h> #include <bpf/bpf_core_read.h>
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
......
此差异已折叠。
...@@ -26,7 +26,7 @@ static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = { ...@@ -26,7 +26,7 @@ static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = {
.read = btf_vmlinux_read, .read = btf_vmlinux_read,
}; };
static struct kobject *btf_kobj; struct kobject *btf_kobj;
static int __init btf_vmlinux_init(void) static int __init btf_vmlinux_init(void)
{ {
......
此差异已折叠。
此差异已折叠。
...@@ -387,6 +387,35 @@ static void *section_objs(const struct load_info *info, ...@@ -387,6 +387,35 @@ static void *section_objs(const struct load_info *info,
return (void *)info->sechdrs[sec].sh_addr; return (void *)info->sechdrs[sec].sh_addr;
} }
/* Find a module section: 0 means not found. Ignores SHF_ALLOC flag. */
static unsigned int find_any_sec(const struct load_info *info, const char *name)
{
unsigned int i;
for (i = 1; i < info->hdr->e_shnum; i++) {
Elf_Shdr *shdr = &info->sechdrs[i];
if (strcmp(info->secstrings + shdr->sh_name, name) == 0)
return i;
}
return 0;
}
/*
* Find a module section, or NULL. Fill in number of "objects" in section.
* Ignores SHF_ALLOC flag.
*/
static __maybe_unused void *any_section_objs(const struct load_info *info,
const char *name,
size_t object_size,
unsigned int *num)
{
unsigned int sec = find_any_sec(info, name);
/* Section 0 has sh_addr 0 and sh_size 0. */
*num = info->sechdrs[sec].sh_size / object_size;
return (void *)info->sechdrs[sec].sh_addr;
}
/* Provided by the linker */ /* Provided by the linker */
extern const struct kernel_symbol __start___ksymtab[]; extern const struct kernel_symbol __start___ksymtab[];
extern const struct kernel_symbol __stop___ksymtab[]; extern const struct kernel_symbol __stop___ksymtab[];
...@@ -3379,6 +3408,9 @@ static int find_module_sections(struct module *mod, struct load_info *info) ...@@ -3379,6 +3408,9 @@ static int find_module_sections(struct module *mod, struct load_info *info)
sizeof(*mod->bpf_raw_events), sizeof(*mod->bpf_raw_events),
&mod->num_bpf_raw_events); &mod->num_bpf_raw_events);
#endif #endif
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
#endif
#ifdef CONFIG_JUMP_LABEL #ifdef CONFIG_JUMP_LABEL
mod->jump_entries = section_objs(info, "__jump_table", mod->jump_entries = section_objs(info, "__jump_table",
sizeof(*mod->jump_entries), sizeof(*mod->jump_entries),
...@@ -3794,6 +3826,10 @@ static noinline int do_init_module(struct module *mod) ...@@ -3794,6 +3826,10 @@ static noinline int do_init_module(struct module *mod)
mod->init_layout.ro_size = 0; mod->init_layout.ro_size = 0;
mod->init_layout.ro_after_init_size = 0; mod->init_layout.ro_after_init_size = 0;
mod->init_layout.text_size = 0; mod->init_layout.text_size = 0;
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
/* .BTF is not SHF_ALLOC and will get removed, so sanitize pointer */
mod->btf_data = NULL;
#endif
/* /*
* We want to free module_init, but be aware that kallsyms may be * We want to free module_init, but be aware that kallsyms may be
* walking this with preempt disabled. In all the failure paths, we * walking this with preempt disabled. In all the failure paths, we
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册