提交 · 6306c1189e77a513bf02720450bb43bd4ba5d8ae · openeuler / Kernel

13 2月, 2021 11 次提交

bpf: Remove MTU check in __bpf_skb_max_len · 6306c118

由 Jesper Dangaard Brouer 提交于 2月 09, 2021

Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb->dev->mtu).

When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.

This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit
is elsewhere in the system. Jesper's testing[1] showed it was not possible
to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting
factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for
SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is
in-effect due to this being called from softirq context see code
__gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed
that frames above 16KiB can cause NICs to reset (but not crash). Keep this
sanity limit at this level as memory layer can differ based on kernel
config.

[1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-testsSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul

6306c118

bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation · 7d4553b6

由 Jun'ichi Nomura 提交于 2月 09, 2021

The devmap bulk queue is allocated with GFP_ATOMIC and the allocation
may fail if there is no available space in existing percpu pool.

Since commit 75ccae62 ("xdp: Move devmap bulk queue into struct net_device")
moved the bulk queue allocation to NETDEV_REGISTER callback, whose context
is allowed to sleep, use GFP_KERNEL instead of GFP_ATOMIC to let percpu
allocator extend the pool when needed and avoid possible failure of netdev
registration.

As the required alignment is natural, we can simply use alloc_percpu().

Fixes: 75ccae62 ("xdp: Move devmap bulk queue into struct net_device")
Signed-off-by: NJun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210209082451.GA44021@jeru.linux.bs1.fc.nec.co.jp

7d4553b6

bpf: Fix an unitialized value in bpf_iter · 17d8beda

由 Yonghong Song 提交于 2月 11, 2021

Commit 15d83c4d ("bpf: Allow loading of a bpf_iter program")
cached btf_id in struct bpf_iter_target_info so later on
if it can be checked cheaply compared to checking registered names.

syzbot found a bug that uninitialized value may occur to
bpf_iter_target_info->btf_id. This is because we allocated
bpf_iter_target_info structure with kmalloc and never initialized
field btf_id afterwards. This uninitialized btf_id is typically
compared to a u32 bpf program func proto btf_id, and the chance
of being equal is extremely slim.

This patch fixed the issue by using kzalloc which will also
prevent future likely instances due to adding new fields.

Fixes: 15d83c4d ("bpf: Allow loading of a bpf_iter program")
Reported-by: syzbot+580f4f2a272e452d55cb@syzkaller.appspotmail.com
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210212005926.2875002-1-yhs@fb.com

17d8beda

tools/resolve_btfids: Add /libbpf to .gitignore · 90a82b1f

由 Stanislav Fomichev 提交于 2月 11, 2021

This is what I see after compiling the kernel:

 # bpf-next...bpf-next/master
 ?? tools/bpf/resolve_btfids/libbpf/

Fixes: fc6b48f6 ("tools/resolve_btfids: Build libbpf and libsubcmd in separate directories")
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212010053.668700-1-sdf@google.com

90a82b1f

Merge branch 'introduce bpf_iter for task_vma' · aca0b81e

由 Alexei Starovoitov 提交于 2月 12, 2021

Song Liu says:

====================

This set introduces bpf_iter for task_vma, which can be used to generate
information similar to /proc/pid/maps. Patch 4/4 adds an example that
mimics /proc/pid/maps.

Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of
vma's of a process. However, these information are not flexible enough to
cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB
pages (x86_64), there is no easy way to tell which address ranges are
backed by 2MB pages. task_vma solves the problem by enabling the user to
generate customize information based on the vma (and vma->vm_mm,
vma->vm_file, etc.).

Changes v6 => v7:
  1. Let BPF iter program use bpf_d_path without specifying sleepable.
     (Alexei)

Changes v5 => v6:
  1. Add more comments for task_vma_seq_get_next() to explain the logic
     of find_vma() calls. (Alexei)
  2. Skip vma found by find_vma() when both vm_start and vm_end matches
     prev_vm_[start|end]. Previous versions only compares vm_start.
     IOW, if vma of [4k, 8k] is replaced by [4k, 12k] after relocking
     mmap_lock, v5 will skip the new vma, while v6 will process it.

Changes v4 => v5:
  1. Fix a refcount leak on task_struct. (Yonghong)
  2. Fix the selftest. (Yonghong)

Changes v3 => v4:
  1. Avoid skipping vma by assigning invalid prev_vm_start in
     task_vma_seq_stop(). (Yonghong)
  2. Move "again" label in task_vma_seq_get_next() save a check. (Yonghong)

Changes v2 => v3:
  1. Rewrite 1/4 so that we hold mmap_lock while calling BPF program. This
     enables the BPF program to access the real vma with BTF. (Alexei)
  2. Fix the logic when the control is returned to user space. (Yonghong)
  3. Revise commit log and cover letter. (Yonghong)

Changes v1 => v2:
  1. Small fixes in task_iter.c and the selftests. (Yonghong)
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

aca0b81e

selftests/bpf: Add test for bpf_iter_task_vma · e8168840

由 Song Liu 提交于 2月 12, 2021

The test dumps information similar to /proc/pid/maps. The first line of
the output is compared against the /proc file to make sure they match.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-4-songliubraving@fb.com

e8168840

bpf: Allow bpf_d_path in bpf_iter program · 3d06f34a

由 Song Liu 提交于 2月 12, 2021

task_file and task_vma iter programs have access to file->f_path. Enable
bpf_d_path to print paths of these file.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com

3d06f34a

bpf: Introduce task_vma bpf_iter · 3a7b35b8

由 Song Liu 提交于 2月 12, 2021

Introduce task_vma bpf_iter to print memory information of a process. It
can be used to print customized information similar to /proc/<pid>/maps.

Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of
vma's of a process. However, these information are not flexible enough to
cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB
pages (x86_64), there is no easy way to tell which address ranges are
backed by 2MB pages. task_vma solves the problem by enabling the user to
generate customize information based on the vma (and vma->vm_mm,
vma->vm_file, etc.).

To access the vma safely in the BPF program, task_vma iterator holds
target mmap_lock while calling the BPF program. If the mmap_lock is
contended, task_vma unlocks mmap_lock between iterations to unblock the
writer(s). This lock contention avoidance mechanism is similar to the one
used in show_smaps_rollup().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com

3a7b35b8

bpf: selftests: Add non function pointer test to struct_ops · a79e88dd

由 Martin KaFai Lau 提交于 2月 11, 2021

This patch adds a "void *owner" member.  The existing
bpf_tcp_ca test will ensure the bpf_cubic.o and bpf_dctcp.o
can be loaded.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021037.267278-1-kafai@fb.com

a79e88dd

libbpf: Ignore non function pointer member in struct_ops · d2836ddd

由 Martin KaFai Lau 提交于 2月 11, 2021

When libbpf initializes the kernel's struct_ops in
"bpf_map__init_kern_struct_ops()", it enforces all
pointer types must be a function pointer and rejects
others.  It turns out to be too strict.  For example,
when directly using "struct tcp_congestion_ops" from vmlinux.h,
it has a "struct module *owner" member and it is set to NULL
in a bpf_tcp_cc.o.

Instead, it only needs to ensure the member is a function
pointer if it has been set (relocated) to a bpf-prog.
This patch moves the "btf_is_func_proto(kern_mtype)" check
after the existing "if (!prog) { continue; }".  The original debug
message in "if (!prog) { continue; }" is also removed since it is
no longer valid.  Beside, there is a later debug message to tell
which function pointer is set.

The "btf_is_func_proto(mtype)" has already been guaranteed
in "bpf_object__collect_st_ops_relos()" which has been run
before "bpf_map__init_kern_struct_ops()".  Thus, this check
is removed.

v2:
- Remove outdated debug message (Andrii)
  Remove because there is a later debug message to tell
  which function pointer is set.
- Following mtype->type is no longer needed. Remove:
  "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)"
- Do "if (!prog)" test before skip_mods_and_typedefs.

Fixes: 590a0088 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com

d2836ddd

libbpf: Use AF_LOCAL instead of AF_INET in xsk.c · 1e0aa3fb

由 Stanislav Fomichev 提交于 2月 09, 2021

We have the environments where usage of AF_INET is prohibited
(cgroup/sock_create returns EPERM for AF_INET). Let's use
AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Tested-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com

1e0aa3fb

12 2月, 2021 9 次提交

bpf: Fix subreg optimization for BPF_FETCH · b2e37a71

由 Ilya Leoshkevich 提交于 2月 10, 2021

All 32-bit variants of BPF_FETCH (add, and, or, xor, xchg, cmpxchg)
define a 32-bit subreg and thus have zext_dst set. Their encoding,
however, uses dst_reg field as a base register, which causes
opt_subreg_zext_lo32_rnd_hi32() to zero-extend said base register
instead of the one the insn really defines (r0 or src_reg).

Fix by properly choosing a register being defined, similar to how
check_atomic() already does that.
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210210204502.83429-1-iii@linux.ibm.com

b2e37a71

docs: bpf: Clarify BPF_CMPXCHG wording · 6a5df969

由 Ilya Leoshkevich 提交于 2月 10, 2021

Based on [1], BPF_CMPXCHG should always load the old value into R0. The
phrasing in bpf.rst is somewhat ambiguous in this regard, improve it to
make this aspect crystal clear.

  [1] https://lore.kernel.org/bpf/CAADnVQJFcFwxEz=wnV=hkie-EDwa8s5JGbBQeFt1TGux1OihJw@mail.gmail.com/Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210210142853.82203-1-iii@linux.ibm.com

6a5df969

bpf: Clear per_cpu pointers during bpf_prog_realloc · 1336c662

由 Alexei Starovoitov 提交于 2月 11, 2021

bpf_prog_realloc copies contents of struct bpf_prog.
The pointers have to be cleared before freeing old struct.
Reported-by: NIlya Leoshkevich <iii@linux.ibm.com>
Fixes: 700d4796 ("bpf: Optimize program stats")
Fixes: ca06f55b ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1336c662

selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie · 6fdd671b

由 Florent Revest 提交于 2月 10, 2021

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Instead of having an update_cookie sockops program tag a socket local
storage with 0xFF, this uses both an update_cookie_sockops program and
an update_cookie_tracing program which succesively tag the socket with
0x0F and then 0xF0.
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-5-revest@chromium.org

6fdd671b

selftests/bpf: Use vmlinux.h in socket_cookie_prog.c · 6cd4dcc3

由 Florent Revest 提交于 2月 10, 2021

When migrating from the bpf.h's to the vmlinux.h's definition of struct
bps_sock, an interesting LLVM behavior happened. LLVM started producing
two fetches of ctx->sk in the sockops program this means that the
verifier could not keep track of the NULL-check on ctx->sk. Therefore,
we need to extract ctx->sk in a variable before checking and
dereferencing it.
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-4-revest@chromium.org

6cd4dcc3

selftests/bpf: Integrate the socket_cookie test to test_progs · 61f8c9c8

由 Florent Revest 提交于 2月 10, 2021

Currently, the selftest for the BPF socket_cookie helpers is built and
run independently from test_progs. It's easy to forget and hard to
maintain.

This patch moves the socket cookies test into prog_tests/ and vastly
simplifies its logic by:
- rewriting the loading code with BPF skeletons
- rewriting the server/client code with network helpers
- rewriting the cgroup code with test__join_cgroup
- rewriting the error handling code with CHECKs
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-3-revest@chromium.org

61f8c9c8

bpf: Expose bpf_get_socket_cookie to tracing programs · c5dbb89f

由 Florent Revest 提交于 2月 10, 2021

This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org

c5dbb89f

bpf: Be less specific about socket cookies guarantees · 07881ccb

由 Florent Revest 提交于 2月 10, 2021

Since "92acdc58 bpf, net: Rework cookie generator as per-cpu one"
socket cookies are not guaranteed to be non-decreasing. The
bpf_get_socket_cookie helper descriptions are currently specifying that
cookies are non-decreasing but we don't want users to rely on that.
Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-1-revest@chromium.org

07881ccb

kbuild: Do not clean resolve_btfids if the output does not exist · 0e1aa629

由 Jiri Olsa 提交于 2月 11, 2021

Nathan reported issue with cleaning empty build directory:

  $ make -s O=build distclean
  ../../scripts/Makefile.include:4: *** \
  O=/ho...build/tools/bpf/resolve_btfids does not exist.  Stop.

The problem that tools scripts require existing output
directory, otherwise it fails.

Adding check around the resolve_btfids clean target to
ensure the output directory is in place.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/bpf/20210211124004.1144344-1-jolsa@kernel.org

0e1aa629

11 2月, 2021 17 次提交

selftests/bpf: Add a test for map-in-map and per-cpu maps in sleepable progs · 750e5d76

由 Alexei Starovoitov 提交于 2月 09, 2021

Add a basic test for map-in-map and per-cpu maps in sleepable programs.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-10-alexei.starovoitov@gmail.com

750e5d76

bpf: Allows per-cpu maps and map-in-map in sleepable programs · 638e4b82

由 Alexei Starovoitov 提交于 2月 09, 2021

Since sleepable programs are now executing under migrate_disable
the per-cpu maps are safe to use.
The map-in-map were ok to use in sleepable from the time sleepable
progs were introduced.

Note that non-preallocated maps are still not safe, since there is
no rcu_read_lock yet in sleepable programs and dynamically allocated
map elements are relying on rcu protection. The sleepable programs
have rcu_read_lock_trace instead. That limitation will be addresses
in the future.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-9-alexei.starovoitov@gmail.com

638e4b82

selftests/bpf: Improve recursion selftest · dcf33b6f

由 Alexei Starovoitov 提交于 2月 09, 2021

Since recursion_misses counter is available in bpf_prog_info
improve the selftest to make sure it's counting correctly.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210210033634.62081-8-alexei.starovoitov@gmail.com

dcf33b6f

bpf: Count the number of times recursion was prevented · 9ed9e9ba

由 Alexei Starovoitov 提交于 2月 09, 2021

Add per-program counter for number of times recursion prevention mechanism
was triggered and expose it via show_fdinfo and bpf_prog_info.
Teach bpftool to print it.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-7-alexei.starovoitov@gmail.com

9ed9e9ba

selftest/bpf: Add a recursion test · 406c557e

由 Alexei Starovoitov 提交于 2月 09, 2021

Add recursive non-sleepable fentry program as a test.
All attach points where sleepable progs can execute are non recursive so far.
The recursion protection mechanism for sleepable cannot be activated yet.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-6-alexei.starovoitov@gmail.com

406c557e

bpf: Add per-program recursion prevention mechanism · ca06f55b

由 Alexei Starovoitov 提交于 2月 09, 2021

Since both sleepable and non-sleepable programs execute under migrate_disable
add recursion prevention mechanism to both types of programs when they're
executed via bpf trampoline.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-5-alexei.starovoitov@gmail.com

ca06f55b

bpf: Compute program stats for sleepable programs · f2dd3b39

由 Alexei Starovoitov 提交于 2月 09, 2021

Since sleepable programs don't migrate from the cpu the excution stats can be
computed for them as well. Reuse the same infrastructure for both sleepable and
non-sleepable programs.

run_cnt     -> the number of times the program was executed.
run_time_ns -> the program execution time in nanoseconds including the
               off-cpu time when the program was sleeping.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-4-alexei.starovoitov@gmail.com

f2dd3b39

bpf: Run sleepable programs with migration disabled · 031d6e02

由 Alexei Starovoitov 提交于 2月 09, 2021

In older non-RT kernels migrate_disable() was the same as preempt_disable().
Since commit 74d862b6 ("sched: Make migrate_disable/enable() independent of RT")
migrate_disable() is real and doesn't prevent sleeping.

Running sleepable programs with migration disabled allows to add support for
program stats and per-cpu maps later.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-3-alexei.starovoitov@gmail.com

031d6e02

bpf: Optimize program stats · 700d4796

由 Alexei Starovoitov 提交于 2月 09, 2021

Move bpf_prog_stats from prog->aux into prog to avoid one extra load
in critical path of program execution.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-2-alexei.starovoitov@gmail.com

700d4796

bpf_lru_list: Read double-checked variable once without lock · 6df8fb83

由 Marco Elver 提交于 2月 09, 2021

For double-checked locking in bpf_common_lru_push_free(), node->type is
read outside the critical section and then re-checked under the lock.
However, concurrent writes to node->type result in data races.

For example, the following concurrent access was observed by KCSAN:

  write to 0xffff88801521bc22 of 1 bytes by task 10038 on cpu 1:
   __bpf_lru_node_move_in        kernel/bpf/bpf_lru_list.c:91
   __local_list_flush            kernel/bpf/bpf_lru_list.c:298
   ...
  read to 0xffff88801521bc22 of 1 bytes by task 10043 on cpu 0:
   bpf_common_lru_push_free      kernel/bpf/bpf_lru_list.c:507
   bpf_lru_push_free             kernel/bpf/bpf_lru_list.c:555
   ...

Fix the data races where node->type is read outside the critical section
(for double-checked locking) by marking the access with READ_ONCE() as
well as ensuring the variable is only accessed once.

Fixes: 3a08c2fd ("bpf: LRU List")
Reported-by: syzbot+3536db46dfa58c573458@syzkaller.appspotmail.com
Reported-by: syzbot+516acdb03d3e27d91bcd@syzkaller.appspotmail.com
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210209112701.3341724-1-elver@google.com

6df8fb83

selftests/bpf: Simplify the calculation of variables · bd2d4e6c

由 Jiapeng Chong 提交于 2月 09, 2021

Fix the following coccicheck warnings:

./tools/testing/selftests/bpf/xdpxceiver.c:954:28-30: WARNING !A || A &&
B is equivalent to !A || B.

./tools/testing/selftests/bpf/xdpxceiver.c:932:28-30: WARNING !A || A &&
B is equivalent to !A || B.

./tools/testing/selftests/bpf/xdpxceiver.c:909:28-30: WARNING !A || A &&
B is equivalent to !A || B.
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1612860398-102839-1-git-send-email-jiapeng.chong@linux.alibaba.com

bd2d4e6c

selftests/bpf: Fix endianness issues in atomic tests · 45df3052

由 Ilya Leoshkevich 提交于 2月 10, 2021

Atomic tests store a DW, but then load it back as a W from the same
address. This doesn't work on big-endian systems, and since the point
of those tests is not testing narrow loads, fix simply by loading a
DW.

Fixes: 98d666d0 ("bpf: Add tests for new BPF atomic operations")
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210020713.77911-1-iii@linux.ibm.com

45df3052

Merge branch 'allow variable-offset stack acces' · cf2d0a5e

由 Alexei Starovoitov 提交于 2月 10, 2021

Andrei Matei says:

====================

Before this patch, variable offset access to the stack was dissalowed
for regular instructions, but was allowed for "indirect" accesses (i.e.
helpers). This patch removes the restriction, allowing reading and
writing to the stack through stack pointers with variable offsets. This
makes stack-allocated buffers more usable in programs, and brings stack
pointers closer to other types of pointers.

The motivation is being able to use stack-allocated buffers for data
manipulation. When the stack size limit is sufficient, allocating
buffers on the stack is simpler than per-cpu arrays, or other
alternatives.

V2 -> V3

- var-offset writes mark all the stack slots in range as initialized, so
  that future reads are not rejected.
- rewrote the C test to not use uprobes, as per Andrii's suggestion.
- addressed other review comments from Alexei.

V1 -> V2

- add support for var-offset stack writes, in addition to reads
- add a C test
- made variable offset direct reads no longer destroy spilled registers
  in the access range
- address review nits
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

cf2d0a5e

selftest/bpf: Add test for var-offset stack access · 0fd7562a

由 Andrei Matei 提交于 2月 06, 2021

Add a higher-level test (C BPF program) for the new functionality -
variable access stack reads and writes.
Signed-off-by: NAndrei Matei <andreimatei1@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-5-andreimatei1@gmail.com

0fd7562a

selftest/bpf: Verifier tests for var-off access · 7a22930c

由 Andrei Matei 提交于 2月 06, 2021

Add tests for the new functionality - reading and writing to the stack
through a variable-offset pointer.
Signed-off-by: NAndrei Matei <andreimatei1@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-4-andreimatei1@gmail.com

7a22930c

selftest/bpf: Adjust expected verifier errors · a680cb3d

由 Andrei Matei 提交于 2月 06, 2021

The verifier errors around stack accesses have changed slightly in the
previous commit (generally for the better).
Signed-off-by: NAndrei Matei <andreimatei1@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-3-andreimatei1@gmail.com

a680cb3d

bpf: Allow variable-offset stack access · 01f810ac

由 Andrei Matei 提交于 2月 06, 2021

Before this patch, variable offset access to the stack was dissalowed
for regular instructions, but was allowed for "indirect" accesses (i.e.
helpers). This patch removes the restriction, allowing reading and
writing to the stack through stack pointers with variable offsets. This
makes stack-allocated buffers more usable in programs, and brings stack
pointers closer to other types of pointers.

The motivation is being able to use stack-allocated buffers for data
manipulation. When the stack size limit is sufficient, allocating
buffers on the stack is simpler than per-cpu arrays, or other
alternatives.

In unpriviledged programs, variable-offset reads and writes are
disallowed (they were already disallowed for the indirect access case)
because the speculative execution checking code doesn't support them.
Additionally, when writing through a variable-offset stack pointer, if
any pointers are in the accessible range, there's possilibities of later
leaking pointers because the write cannot be tracked precisely.

Writes with variable offset mark the whole range as initialized, even
though we don't know which stack slots are actually written. This is in
order to not reject future reads to these slots. Note that this doesn't
affect writes done through helpers; like before, helpers need the whole
stack range to be initialized to begin with.
All the stack slots are in range are considered scalars after the write;
variable-offset register spills are not tracked.

For reads, all the stack slots in the variable range needs to be
initialized (but see above about what writes do), otherwise the read is
rejected. All register spilled in stack slots that might be read are
marked as having been read, however reads through such pointers don't do
register filling; the target register will always be either a scalar or
a constant zero.
Signed-off-by: NAndrei Matei <andreimatei1@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com

01f810ac

09 2月, 2021 3 次提交

Merge branch 'kbuild/resolve_btfids: Invoke resolve_btfids' · ee5cc036

由 Andrii Nakryiko 提交于 2月 08, 2021

Jiri Olsa says:

====================

hi,
resolve_btfids tool is used during the kernel build,
so we should clean it on kernel's make clean.

v2 changes:
  - add Song's acks on patches 1 and 4 (others changed) [Song]
  - add missing / [Andrii]
  - change srctree variable initialization [Andrii]
  - shifted ifdef for clean target [Andrii]

thanks,
jirka
====================
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>

ee5cc036

kbuild: Add resolve_btfids clean to root clean target · 50d3a3f8

由 Jiri Olsa 提交于 2月 05, 2021

The resolve_btfids tool is used during the kernel build,
so we should clean it on kernel's make clean.

Invoking the the resolve_btfids clean as part of root
'make clean'.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210205124020.683286-5-jolsa@kernel.org

50d3a3f8

tools/resolve_btfids: Set srctree variable unconditionally · 7962cb9b

由 Jiri Olsa 提交于 2月 05, 2021

We want this clean to be called from tree's root Makefile,
which defines same srctree variable and that will screw
the make setup.

We actually do not use srctree being passed from outside,
so we can solve this by setting current srctree value
directly.

Also changing the way how srctree is initialized as suggested
by Andrri.

Also root Makefile does not define the implicit RM variable,
so adding RM initialization.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210205124020.683286-4-jolsa@kernel.org

7962cb9b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功