提交 · 073f4ec124bb2c431d9e4136e7f583abfea7f290 · openeuler / Kernel

29 1月, 2021 3 次提交

bpf: Enable bpf_{g,s}etsockopt in BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME · 073f4ec1

由 Stanislav Fomichev 提交于 1月 27, 2021

Those hooks run as BPF_CGROUP_RUN_SA_PROG_LOCK and operate on
a locked socket.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210127232853.3753823-3-sdf@google.com

073f4ec1

bpf: Enable bpf_{g,s}etsockopt in BPF_CGROUP_UDP{4,6}_SENDMSG · 62476cc1

由 Stanislav Fomichev 提交于 1月 27, 2021

Can be used to query/modify socket state for unconnected UDP sendmsg.
Those hooks run as BPF_CGROUP_RUN_SA_PROG_LOCK and operate on
a locked socket.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210127232853.3753823-2-sdf@google.com

62476cc1

tools: Factor Clang, LLC and LLVM utils definitions · 211a741c

由 Sedat Dilek 提交于 1月 28, 2021

When dealing with BPF/BTF/pahole and DWARF v5 I wanted to build bpftool.

While looking into the source code I found duplicate assignments in misc tools
for the LLVM eco system, e.g. clang and llvm-objcopy.

Move the Clang, LLC and/or LLVM utils definitions to tools/scripts/Makefile.include
file and add missing includes where needed. Honestly, I was inspired by the commit
c8a950d0 ("tools: Factor HOSTCC, HOSTLD, HOSTAR definitions").

I tested with bpftool and perf on Debian/testing AMD64 and LLVM/Clang v11.1.0-rc1.

Build instructions:

[ make and make-options ]
MAKE="make V=1"
MAKE_OPTS="HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1 LLVM_IAS=1"
MAKE_OPTS="$MAKE_OPTS PAHOLE=/opt/pahole/bin/pahole"

[ clean-up ]
$MAKE $MAKE_OPTS -C tools/ clean

[ bpftool ]
$MAKE $MAKE_OPTS -C tools/bpf/bpftool/

[ perf ]
PYTHON=python3 $MAKE $MAKE_OPTS -C tools/perf/

I was careful with respecting the user's wish to override custom compiler, linker,
GNU/binutils and/or LLVM utils settings.
Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com> # tools/build and tools/perf
Link: https://lore.kernel.org/bpf/20210128015117.20515-1-sedat.dilek@gmail.com

211a741c

28 1月, 2021 4 次提交

selftests/bpf: Verify that rebinding to port < 1024 from BPF works · 8259fdeb

由 Stanislav Fomichev 提交于 1月 27, 2021

Return 3 to indicate that permission check for port 111
should be skipped.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210127193140.3170382-2-sdf@google.com

8259fdeb

bpf: Allow rewriting to ports under ip_unprivileged_port_start · 77241217

由 Stanislav Fomichev 提交于 1月 27, 2021

At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port
to the privileged ones (< ip_unprivileged_port_start), but it will
be rejected later on in the __inet_bind or __inet6_bind.

Let's add another return value to indicate that CAP_NET_BIND_SERVICE
check should be ignored. Use the same idea as we currently use
in cgroup/egress where bit #1 indicates CN. Instead, for
cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should
be bypassed.

v5:
- rename flags to be less confusing (Andrey Ignatov)
- rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags
  and accept BPF_RET_SET_CN (no behavioral changes)

v4:
- Add missing IPv6 support (Martin KaFai Lau)

v3:
- Update description (Martin KaFai Lau)
- Fix capability restore in selftest (Martin KaFai Lau)

v2:
- Switch to explicit return code (Martin KaFai Lau)
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com

77241217

skmsg: Make sk_psock_destroy() static · 8063e184

由 Cong Wang 提交于 1月 27, 2021

sk_psock_destroy() is a RCU callback, I can't see any reason why
it could be used outside.
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210127221501.46866-1-xiyou.wangcong@gmail.com

8063e184

bpf: Change 'BPF_ADD' to 'BPF_AND' in print_bpf_insn() · 60e578e8

由 Menglong Dong 提交于 1月 26, 2021

This 'BPF_ADD' is duplicated, and I belive it should be 'BPF_AND'.

Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions")
Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBrendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/bpf/20210127022507.23674-1-dong.menglong@zte.com.cn

60e578e8

27 1月, 2021 1 次提交

selftests/bpf: Don't exit on failed bpf_testmod unload · 86ce322d

由 Andrii Nakryiko 提交于 1月 25, 2021

Fix bug in handling bpf_testmod unloading that will cause test_progs exiting
prematurely if bpf_testmod unloading failed. This is especially problematic
when running a subset of test_progs that doesn't require root permissions and
doesn't rely on bpf_testmod, yet will fail immediately due to exit(1) in
unload_bpf_testmod().

Fixes: 9f7fa225 ("selftests/bpf: Add bpf_testmod kernel module for testing")
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210126065019.1268027-1-andrii@kernel.org

86ce322d

26 1月, 2021 17 次提交

samples/bpf: Set flag __SANE_USERSPACE_TYPES__ for MIPS to fix build warnings · 190d1c92

由 Tiezhu Yang 提交于 1月 25, 2021

There exists many build warnings when make M=samples/bpf on the Loongson
platform, this issue is MIPS related, x86 compiles just fine.

Here are some warnings:

  CC  samples/bpf/ibumad_user.o
samples/bpf/ibumad_user.c: In function ‘dump_counts’:
samples/bpf/ibumad_user.c:46:24: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=]
    printf("0x%02x : %llu\n", key, value);
                     ~~~^          ~~~~~
                     %lu
  CC  samples/bpf/offwaketime_user.o
samples/bpf/offwaketime_user.c: In function ‘print_ksym’:
samples/bpf/offwaketime_user.c:34:17: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=]
   printf("%s/%llx;", sym->name, addr);
              ~~~^               ~~~~
              %lx
samples/bpf/offwaketime_user.c: In function ‘print_stack’:
samples/bpf/offwaketime_user.c:68:17: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=]
  printf(";%s %lld\n", key->waker, count);
              ~~~^                 ~~~~~
              %ld

MIPS needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select
'int-ll64.h' in arch/mips/include/uapi/asm/types.h, then it can avoid
build warnings when printing __u64 with %llu, %llx or %lld.

The header tools/include/linux/types.h defines __SANE_USERSPACE_TYPES__,
it seems that we can include <linux/types.h> in the source files which
have build warnings, but it has no effect due to actually it includes
usr/include/linux/types.h instead of tools/include/linux/types.h, the
problem is that "usr/include" is preferred first than "tools/include"
in samples/bpf/Makefile, that sounds like a ugly hack to -Itools/include
before -Iusr/include.

So define __SANE_USERSPACE_TYPES__ for MIPS in samples/bpf/Makefile
is proper, if add "TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__" in
samples/bpf/Makefile, it appears the following error:

Auto-detecting system features:
...                        libelf: [ on  ]
...                          zlib: [ on  ]
...                           bpf: [ OFF ]

BPF API too old
make[3]: *** [Makefile:293: bpfdep] Error 1
make[2]: *** [Makefile:156: all] Error 2

With #ifndef __SANE_USERSPACE_TYPES__  in tools/include/linux/types.h,
the above error has gone and this ifndef change does not hurt other
compilations.
Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1611551146-14052-1-git-send-email-yangtiezhu@loongson.cn

190d1c92

tools, headers: Sync struct bpf_perf_event_data · 726bf76f

由 Florian Lehner 提交于 1月 23, 2021

Update struct bpf_perf_event_data with the addr field to match the
tools headers with the kernel headers.
Signed-off-by: NFlorian Lehner <dev@der-flo.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210123185221.23946-1-dev@der-flo.net

726bf76f

selftests/bpf: Avoid useless void *-casts · 095af986

由 Björn Töpel 提交于 1月 22, 2021

There is no need to cast to void * when the argument is void *. Avoid
cluttering of code.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-13-bjorn.topel@gmail.com

095af986

selftests/bpf: Consistent malloc/calloc usage · d08a17d6

由 Björn Töpel 提交于 1月 22, 2021

Use calloc instead of malloc where it makes sense, and avoid C++-style
void *-cast.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-12-bjorn.topel@gmail.com

d08a17d6

selftests/bpf: Avoid heap allocation · 93dd4a06

由 Björn Töpel 提交于 1月 22, 2021

The data variable is only used locally. Instead of using the heap,
stick to using the stack.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-11-bjorn.topel@gmail.com

93dd4a06

selftests/bpf: Define local variables at the beginning of a block · 829725ec

由 Björn Töpel 提交于 1月 22, 2021

Use C89 rules for variable definition.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-10-bjorn.topel@gmail.com

829725ec

selftests/bpf: Change type from void * to struct generic_data * · 59a4a87e

由 Björn Töpel 提交于 1月 22, 2021

Instead of casting from void *, let us use the actual type in
gen_udp_hdr().
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-9-bjorn.topel@gmail.com

59a4a87e

selftests/bpf: Change type from void * to struct ifaceconfigobj * · 124000e4

由 Björn Töpel 提交于 1月 22, 2021

Instead of casting from void *, let us use the actual type in
init_iface_config().
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-8-bjorn.topel@gmail.com

124000e4

selftests/bpf: Remove casting by introduce local variable · 0b50bd48

由 Björn Töpel 提交于 1月 22, 2021

Let us use a local variable in nsswitchthread(), so we can remove a
lot of casting for better readability.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-7-bjorn.topel@gmail.com

0b50bd48

selftests/bpf: Improve readability of xdpxceiver/worker_pkt_validate() · 8a9cba7e

由 Björn Töpel 提交于 1月 22, 2021

Introduce a local variable to get rid of lot of casting. Move common
code outside the if/else-clause.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-6-bjorn.topel@gmail.com

8a9cba7e

selftests/bpf: Remove memory leak · 4896d7e3

由 Björn Töpel 提交于 1月 22, 2021

The allocated entry is immediately overwritten by an assignment. Fix
that.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-5-bjorn.topel@gmail.com

4896d7e3

selftests/bpf: Fix style warnings · a8607283

由 Björn Töpel 提交于 1月 22, 2021

Silence three checkpatch style warnings.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-4-bjorn.topel@gmail.com

a8607283

selftests/bpf: Remove unused enums · 449f0874

由 Björn Töpel 提交于 1月 22, 2021

The enums undef and bidi are not used. Remove them.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-3-bjorn.topel@gmail.com

449f0874

selftests/bpf: Remove a lot of ifobject casting · 7140ef14

由 Björn Töpel 提交于 1月 22, 2021

Instead of passing void * all over the place, let us pass the actual
type (ifobject) and remove the void-ptr-to-type-ptr casting.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210122154725.22140-2-bjorn.topel@gmail.com

7140ef14

libbpf, xsk: Select AF_XDP BPF program based on kernel version · 78ed4045

由 Björn Töpel 提交于 1月 22, 2021

Add detection for kernel version, and adapt the BPF program based on
kernel support. This way, users will get the best possible performance
from the BPF program.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMarek Majtyka  <alardam@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com

78ed4045

xsk: Fold xp_assign_dev and __xp_assign_dev · f0863eab

由 Björn Töpel 提交于 1月 22, 2021

Fold xp_assign_dev and __xp_assign_dev. The former directly calls the
latter.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210122105351.11751-3-bjorn.topel@gmail.com

f0863eab

xsk: Remove explicit_free parameter from __xsk_rcv() · 458f7272

由 Björn Töpel 提交于 1月 22, 2021

The explicit_free parameter of the __xsk_rcv() function was used to
mark whether the call was via the generic XDP or the native XDP
path. Instead of clutter the code with if-statements and "true/false"
parameters which are hard to understand, simply move the explicit free
to the __xsk_map_redirect() which is always called from the native XDP
path.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210122105351.11751-2-bjorn.topel@gmail.com

458f7272

23 1月, 2021 3 次提交

samples/bpf: Add xdp program on egress for xdp_redirect_map · 6e66fbb1

由 Hangbin Liu 提交于 1月 22, 2021

This patch add a xdp program on egress to show that we can modify
the packet on egress. In this sample we will set the pkt's src
mac to egress's mac address. The xdp_prog will be attached when
-X option supplied.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210122025007.2968381-1-liuhangbin@gmail.com

6e66fbb1

bpf: Fix typo in scalar{,32}_min_max_rsh comments · 18b24d78

由 Tobias Klauser 提交于 1月 21, 2021

s/bounts/bounds/
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210121174324.24127-1-tklauser@distanz.ch

18b24d78

bpf, docs: Update build procedure for manually compiling LLVM and Clang · 628add78

由 Tiezhu Yang 提交于 1月 22, 2021

The current LLVM and Clang build procedure in samples/bpf/README.rst is
out of date. See below that the links are not accessible any more.

  $ git clone http://llvm.org/git/llvm.git
  Cloning into 'llvm'...
  fatal: unable to access 'http://llvm.org/git/llvm.git/': Maximum (20) redirects followed
  $ git clone --depth 1 http://llvm.org/git/clang.git
  Cloning into 'clang'...
  fatal: unable to access 'http://llvm.org/git/clang.git/': Maximum (20) redirects followed

The LLVM community has adopted new ways to build the compiler. There are
different ways to build LLVM and Clang, the Clang Getting Started page [1]
has one way. As Yonghong said, it is better to copy the build procedure
in Documentation/bpf/bpf_devel_QA.rst to keep consistent.

I verified the procedure and it is proved to be feasible, so we should
update README.rst to reflect the reality. At the same time, update the
related comment in Makefile.

Additionally, as Fangrui said, the dir llvm-project/llvm/build/install is
not used, BUILD_SHARED_LIBS=OFF is the default option [2], so also change
Documentation/bpf/bpf_devel_QA.rst together.

At last, we recommend that developers who want the fastest incremental
builds use the Ninja build system [1], you can find it in your system's
package manager, usually the package is ninja or ninja-build [3], so add
ninja to build dependencies suggested by Nathan.

  [1] https://clang.llvm.org/get_started.html
  [2] https://www.llvm.org/docs/CMake.html
  [3] https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packagesSigned-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NYonghong Song <yhs@fb.com>
Cc: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/bpf/1611279584-26047-1-git-send-email-yangtiezhu@loongson.cn

628add78

22 1月, 2021 2 次提交

selftest/bpf: Fix typo · 443edcef

由 Junlin Yang 提交于 1月 21, 2021

Change 'exeeds' to 'exceeds'.
Signed-off-by: NJunlin Yang <yangjunlin@yulong.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210121122309.1501-1-angkery@163.com

443edcef

libbpf: Use string table index from index table if needed · 6095d5a2

由 Jiri Olsa 提交于 1月 21, 2021

For very large ELF objects (with many sections), we could
get special value SHN_XINDEX (65535) for elf object's string
table index - e_shstrndx.

Call elf_getshdrstrndx to get the proper string table index,
instead of reading it directly from ELF header.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org

6095d5a2

21 1月, 2021 10 次提交

docs: bpf: Clarify -mcpu=v3 requirement for atomic ops · b452ee00

由 Brendan Jackman 提交于 1月 20, 2021

Alexei pointed out [1] that this wording is pretty confusing. Here's
an attempt to be more explicit and clear.

[1] https://lore.kernel.org/bpf/CAADnVQJVvwoZsE1K+6qRxzF7+6CvZNzygnoBW9tZNWJELk5c=Q@mail.gmail.com/Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210120133946.2107897-3-jackmanb@google.com

b452ee00

docs: bpf: Fixup atomics markup · 53fe5418

由 Brendan Jackman 提交于 1月 20, 2021

This fixes up the markup to fix a warning, be more consistent with
use of monospace, and use the correct .rst syntax for <em> (* instead
of _).
Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/bpf/20210120133946.2107897-2-jackmanb@google.com

53fe5418

Merge branch 'bpf: misc performance improvements for cgroup' · 636d549f

由 Alexei Starovoitov 提交于 1月 20, 2021

Stanislav Fomichev says:

====================

First patch adds custom getsockopt for TCP_ZEROCOPY_RECEIVE
to remove kmalloc and lock_sock overhead from the dat path.

Second patch removes kzalloc/kfree from getsockopt for the common cases.

Third patch switches cgroup_bpf_enabled to be per-attach to
to add only overhead for the cgroup attach types used on the system.

No visible user-side changes.

v9:
- include linux/tcp.h instead of netinet/tcp.h in sockopt_sk.c
- note that v9 depends on the commit 4be34f3d ("bpf: Don't leak
  memory in bpf getsockopt when optlen == 0") from bpf tree

v8:
- add bpi.h to tools/include/uapi in the same patch (Martin KaFai Lau)
- kmalloc instead of kzalloc when exporting buffer (Martin KaFai Lau)
- note that v8 depends on the commit 4be34f3d ("bpf: Don't leak
  memory in bpf getsockopt when optlen == 0") from bpf tree

v7:
- add comment about buffer contents for retval != 0 (Martin KaFai Lau)
- export tcp.h into tools/include/uapi (Martin KaFai Lau)
- note that v7 depends on the commit 4be34f3d ("bpf: Don't leak
  memory in bpf getsockopt when optlen == 0") from bpf tree

v6:
- avoid indirect cost for new bpf_bypass_getsockopt (Eric Dumazet)

v5:
- reorder patches to reduce the churn (Martin KaFai Lau)

v4:
- update performance numbers
- bypass_bpf_getsockopt (Martin KaFai Lau)

v3:
- remove extra newline, add comment about sizeof tcp_zerocopy_receive
  (Martin KaFai Lau)
- add another patch to remove lock_sock overhead from
  TCP_ZEROCOPY_RECEIVE; technically, this makes patch #1 obsolete,
  but I'd still prefer to keep it to help with other socket
  options

v2:
- perf numbers for getsockopt kmalloc reduction (Song Liu)
- (sk) in BPF_CGROUP_PRE_CONNECT_ENABLED (Song Liu)
- 128 -> 64 buffer size, BUILD_BUG_ON (Martin KaFai Lau)
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

636d549f

bpf: Split cgroup_bpf_enabled per attach type · a9ed15da

由 Stanislav Fomichev 提交于 1月 15, 2021

When we attach any cgroup hook, the rest (even if unused/unattached) start
to contribute small overhead. In particular, the one we want to avoid is
__cgroup_bpf_run_filter_skb which does two redirections to get to
the cgroup and pushes/pulls skb.

Let's split cgroup_bpf_enabled to be per-attach to make sure
only used attach types trigger.

I've dropped some existing high-level cgroup_bpf_enabled in some
places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
cgroup_bpf_enabled check.

I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
has to be constant and known at compile time.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com

a9ed15da

bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt · 20f2505f

由 Stanislav Fomichev 提交于 1月 15, 2021

When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost.

Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. The buffer is small enough to fit into
the cache line and cover the majority of simple options (most
of them are 4 byte ints).

It seems natural to do the same for setsockopt, but it's a bit more
involved when the BPF program modifies the data (where we have to
kmalloc). The assumption is that for the majority of setsockopt
calls (which are doing pure BPF options or apply policy) this
will bring some benefit as well.

Without this patch (we remove about 1% __kmalloc):
     3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.30%--__cgroup_bpf_run_filter_getsockopt
                       |
                        --0.81%--__kmalloc
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.com

20f2505f

bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE · 9cacf81f

由 Stanislav Fomichev 提交于 1月 15, 2021

Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
call in do_tcp_getsockopt using the on-stack data. This removes
3% overhead for locking/unlocking the socket.

Without this patch:
     3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.30%--__cgroup_bpf_run_filter_getsockopt
                       |
                        --0.81%--__kmalloc

With the patch applied:
     0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern

Note, exporting uapi/tcp.h requires removing netinet/tcp.h
from test_progs.h because those headers have confliciting
definitions.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com

9cacf81f

bpf: Permit size-0 datasec · 13ca51d5

由 Yonghong Song 提交于 1月 19, 2021

llvm patch https://reviews.llvm.org/D84002 permitted
to emit empty rodata datasec if the elf .rodata section
contains read-only data from local variables. These
local variables will be not emitted as BTF_KIND_VARs
since llvm converted these local variables as
static variables with private linkage without debuginfo
types. Such an empty rodata datasec will make
skeleton code generation easy since for skeleton
a rodata struct will be generated if there is a
.rodata elf section. The existence of a rodata
btf datasec is also consistent with the existence
of a rodata map created by libbpf.

The btf with such an empty rodata datasec will fail
in the kernel though as kernel will reject a datasec
with zero vlen and zero size. For example, for the below code,
    int sys_enter(void *ctx)
    {
       int fmt[6] = {1, 2, 3, 4, 5, 6};
       int dst[6];

       bpf_probe_read(dst, sizeof(dst), fmt);
       return 0;
    }
We got the below btf (bpftool btf dump ./test.o):
    [1] PTR '(anon)' type_id=0
    [2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1
            'ctx' type_id=1
    [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
    [4] FUNC 'sys_enter' type_id=2 linkage=global
    [5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
    [6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4
    [7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none)
    [8] VAR '_license' type_id=6, linkage=global-alloc
    [9] DATASEC '.rodata' size=0 vlen=0
    [10] DATASEC 'license' size=0 vlen=1
            type_id=8 offset=0 size=4
When loading the ./test.o to the kernel with bpftool,
we see the following error:
    libbpf: Error loading BTF: Invalid argument(22)
    libbpf: magic: 0xeb9f
    ...
    [6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4
    [7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
    [8] VAR _license type_id=6 linkage=1
    [9] DATASEC .rodata size=24 vlen=0 vlen == 0
    libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring.

Basically, libbpf changed .rodata datasec size to 24 since elf .rodata
section size is 24. The kernel then rejected the BTF since vlen = 0.
Note that the above kernel verifier failure can be worked around with
changing local variable "fmt" to a static or global, optionally const, variable.

This patch permits a datasec with vlen = 0 in kernel.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210119153519.3901963-1-yhs@fb.com

13ca51d5

Merge branch 'Allow attaching to bare tracepoints' · 71ee10e2

由 Alexei Starovoitov 提交于 1月 19, 2021

Qais Yousef says:

====================

Changes in v3:
	* Fix not returning error value correctly in
	  trigger_module_test_write() (Yonghong)
	* Add Yonghong acked-by to patch 1.

Changes in v2:
	* Fix compilation error. (Andrii)
	* Make the new test use write() instead of read() (Andrii)

Add some missing glue logic to teach bpf about bare tracepoints - tracepoints
without any trace event associated with them.

Bare tracepoints are declare with DECLARE_TRACE(). Full tracepoints are declare
with TRACE_EVENT().

BPF can attach to these tracepoints as RAW_TRACEPOINT() only as there're no
events in tracefs created with them.
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

71ee10e2

selftests: bpf: Add a new test for bare tracepoints · 407be922

由 Qais Yousef 提交于 1月 19, 2021

Reuse module_attach infrastructure to add a new bare tracepoint to check
we can attach to it as a raw tracepoint.
Signed-off-by: NQais Yousef <qais.yousef@arm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210119122237.2426878-3-qais.yousef@arm.com

407be922

Merge branch 'bpf,x64: implement jump padding in jit' · 86e6b4e9

由 Alexei Starovoitov 提交于 1月 19, 2021

Gary Lin says:

====================
This patch series implements jump padding to x64 jit to cover some
corner cases that used to consume more than 20 jit passes and caused
failure.

v4:
  - Add the detailed comments about the possible padding bytes
  - Add the second test case which triggers jmp_cond padding and imm32 nop
    jmp padding.
  - Add the new test case as another subprog

v3:
  - Copy the instructions of prologue separately or the size calculation
    of the first BPF instruction would include the prologue.
  - Replace WARN_ONCE() with pr_err() and EFAULT
  - Use MAX_PASSES in the for loop condition check
  - Remove the "padded" flag from x64_jit_data. For the extra pass of
    subprogs, padding is always enabled since it won't hurt the images
    that converge without padding.
v2:
  - Simplify the sample code in the commit description and provide the
    jit code
  - Check the expected padding bytes with WARN_ONCE
  - Move the 'padded' flag to 'struct x64_jit_data'
  - Remove the EXPECTED_FAIL flag from bpf_fill_maxinsns11() in test_bpf
  - Add 2 verifier tests
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

86e6b4e9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功