提交 · 73b11c2ab072d5b0599d1e12cc126f55ee306daf · openeuler / Kernel

02 8月, 2020 2 次提交

bpf: Add support for forced LINK_DETACH command · 73b11c2a

由 Andrii Nakryiko 提交于 7月 31, 2020

Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.

This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com

73b11c2a

bpf, selftests: Use single cgroup helpers for both test_sockmap/progs · 4939b284

由 John Fastabend 提交于 7月 31, 2020

Nearly every user of cgroup helpers does the same sequence of API calls. So
push these into a single helper cgroup_setup_and_join. The cases that do
a bit of extra logic are test_progs which currently uses an env variable
to decide if it needs to setup the cgroup environment or can use an
existingi environment. And then tests that are doing cgroup tests
themselves. We skip these cases for now.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159623335418.30208.15807461815525100199.stgit@john-XPS-13-9370

4939b284

01 8月, 2020 1 次提交

Documentation/bpf: Use valid and new links in index.rst · ffba964e

由 Tiezhu Yang 提交于 7月 31, 2020

There exists an error "404 Not Found" when I click the html link of
"Documentation/networking/filter.rst" in the BPF documentation [1],
fix it.

Additionally, use the new links about "BPF and XDP Reference Guide"
and "bpf(2)" to avoid redirects.

[1] https://www.kernel.org/doc/html/latest/bpf/

Fixes: d9b9170a ("docs: bpf: Rename README.rst to index.rst")
Fixes: cb3f0d56 ("docs: networking: convert filter.txt to ReST")
Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1596184142-18476-1-git-send-email-yangtiezhu@loongson.cn

ffba964e

31 7月, 2020 11 次提交

libbpf: Fix register in PT_REGS MIPS macros · 1acf8f90

由 Jerry Crunchtime 提交于 7月 31, 2020

The o32, n32 and n64 calling conventions require the return
value to be stored in $v0 which maps to $2 register, i.e.,
the register 2.

Fixes: c1932cdb ("bpf: Add MIPS support to samples/bpf.")
Signed-off-by: NJerry Crunchtime <jerry.c.t@web.de>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/43707d31-0210-e8f0-9226-1af140907641@web.de

1acf8f90

udp, bpf: Ignore connections in reuseport group after BPF sk lookup · c64c9c28

由 Jakub Sitnicki 提交于 7月 26, 2020

When BPF sk lookup invokes reuseport handling for the selected socket, it
should ignore the fact that reuseport group can contain connected UDP
sockets. With BPF sk lookup this is not relevant as we are not scoring
sockets to find the best match, which might be a connected UDP socket.

Fix it by unconditionally accepting the socket selected by reuseport.

This fixes the following two failures reported by test_progs.

  # ./test_progs -t sk_lookup
  ...
  #73/14 UDP IPv4 redir and reuseport with conns:FAIL
  ...
  #73/20 UDP IPv6 redir and reuseport with conns:FAIL
  ...

Fixes: a57066b1 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Reported-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com

c64c9c28

libbpf: Make destructors more robust by handling ERR_PTR(err) cases · 50450fc7

由 Andrii Nakryiko 提交于 7月 29, 2020

Most of libbpf "constructors" on failure return ERR_PTR(err) result encoded as
a pointer. It's a common mistake to eventually pass such malformed pointers
into xxx__destroy()/xxx__free() "destructors". So instead of fixing up
clean up code in selftests and user programs, handle such error pointers in
destructors themselves. This works beautifully for NULL pointers passed to
destructors, so might as well just work for error pointers.
Suggested-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729232148.896125-1-andriin@fb.com

50450fc7

selftests/bpf: Omit nodad flag when adding addresses to loopback · a6599abd

由 Jakub Sitnicki 提交于 7月 30, 2020

Setting IFA_F_NODAD flag for IPv6 addresses to add to loopback is
unnecessary. Duplicate Address Detection does not happen on loopback
device.

Also, passing 'nodad' flag to 'ip address' breaks libbpf CI, which runs in
an environment with BusyBox implementation of 'ip' command, that doesn't
understand this flag.

Fixes: 0ab5539f ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Reported-by: NAndrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NAndrii Nakryiko <andrii@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com

a6599abd

selftests/bpf: Don't destroy failed link · 80546ac4

由 Andrii Nakryiko 提交于 7月 28, 2020

Check that link is NULL or proper pointer before invoking bpf_link__destroy().
Not doing this causes crash in test_progs, when cg_storage_multi selftest
fails.

Fixes: 3573f384 ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729045056.3363921-1-andriin@fb.com

80546ac4

selftests/bpf: Add xdpdrv mode for test_xdp_redirect · dfdb0d93

由 Hangbin Liu 提交于 7月 29, 2020

This patch add xdpdrv mode for test_xdp_redirect.sh since veth has support
native mode. After update here is the test result:

  # ./test_xdp_redirect.sh
  selftests: test_xdp_redirect xdpgeneric [PASS]
  selftests: test_xdp_redirect xdpdrv [PASS]
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/bpf/20200729085658.403794-1-liuhangbin@gmail.com

dfdb0d93

selftests/bpf: Verify socket storage in cgroup/sock_{create, release} · 4fb5f949

由 Stanislav Fomichev 提交于 7月 28, 2020

Augment udp_limit test to set and verify socket storage value.
That should be enough to exercise the changes from the previous
patch.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-2-sdf@google.com

4fb5f949

bpf: Expose socket storage to BPF_PROG_TYPE_CGROUP_SOCK · f7c6cb1d

由 Stanislav Fomichev 提交于 7月 28, 2020

This lets us use socket storage from the following hooks:

* BPF_CGROUP_INET_SOCK_CREATE
* BPF_CGROUP_INET_SOCK_RELEASE
* BPF_CGROUP_INET4_POST_BIND
* BPF_CGROUP_INET6_POST_BIND

Using existing 'bpf_sk_storage_get_proto' doesn't work because
second argument is ARG_PTR_TO_SOCKET. Even though
BPF_PROG_TYPE_CGROUP_SOCK hooks operate on 'struct bpf_sock',
the verifier still considers it as a PTR_TO_CTX.
That's why I'm adding another 'bpf_sk_storage_get_cg_sock_proto'
definition strictly for BPF_PROG_TYPE_CGROUP_SOCK which accepts
ARG_PTR_TO_CTX which is really 'struct sock' for this program type.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-1-sdf@google.com

f7c6cb1d

selftests/bpf: Test bpf_iter buffer access with negative offset · 12e6196f

由 Yonghong Song 提交于 7月 28, 2020

Commit afbf21dc ("bpf: Support readonly/readwrite buffers
in verifier") added readonly/readwrite buffer support which
is currently used by bpf_iter tracing programs. It has
a bug with incorrect parameter ordering which later fixed
by Commit f6dfbe31 ("bpf: Fix swapped arguments in calls
to check_buffer_access").

This patch added a test case with a negative offset access
which will trigger the error path.

Without Commit f6dfbe31, running the test case in the patch,
the error message looks like:
   R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
  ; value_sum += *(__u32 *)(value - 4);
  2: (61) r1 = *(u32 *)(r1 -4)
  R1 invalid (null) buffer access: off=-4, size=4

With the above commit, the error message looks like:
   R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
  ; value_sum += *(__u32 *)(value - 4);
  2: (61) r1 = *(u32 *)(r1 -4)
  R1 invalid rdwr buffer access: off=-4, size=4
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090406-1-yhs@fb.com

12e6196f

bpf: Add missing newline characters in verifier error messages · 4fc00b79

由 Yonghong Song 提交于 7月 28, 2020

Newline characters are added in two verifier error messages,
refactored in Commit afbf21dc ("bpf: Support readonly/readwrite
buffers in verifier"). This way, they do not mix with
messages afterwards.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090349-1-yhs@fb.com

4fc00b79

bpf, arm64: Add BPF exception tables · 80083428

由 Jean-Philippe Brucker 提交于 7月 28, 2020

When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
this flag it falls back to the interpreter.

Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.

To keep the compact exception table entry format, inspect the pc in
fixup_exception(). A more generic solution would add a "handler" field
to the table entry, like on x86 and s390.
Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728152122.1292756-2-jean-philippe@linaro.org

80083428

29 7月, 2020 2 次提交

bpf: Fix build without CONFIG_NET when using BPF XDP link · 310ad797

由 Andrii Nakryiko 提交于 7月 28, 2020

Entire net/core subsystem is not built without CONFIG_NET. linux/netdevice.h
just assumes that it's always there, so the easiest way to fix this is to
conditionally compile out bpf_xdp_link_attach() use in bpf/syscall.c.

Fixes: aa8d3a71 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728190527.110830-1-andriin@fb.com

310ad797

bpf, selftests: use :: 1 for localhost in tcp_server.py · ca5cd355

由 John Fastabend 提交于 7月 28, 2020

Using localhost requires the host to have a /etc/hosts file with that
specific line in it. By default my dev box did not, they used
ip6-localhost, so the test was failing. To fix remove the need for any
/etc/hosts and use ::1.

I could just add the line, but this seems easier.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159594714197.21431.10113693935099326445.stgit@john-Precision-5820-Tower

ca5cd355

28 7月, 2020 6 次提交

xdp: Prevent kernel-infoleak in xsk_getsockopt() · 3c4f850e

由 Peilin Ye 提交于 7月 28, 2020

xsk_getsockopt() is copying uninitialized stack memory to userspace when
'extra_stats' is 'false'. Fix it. Doing '= {};' is sufficient since currently
'struct xdp_statistics' is defined as follows:

  struct xdp_statistics {
    __u64 rx_dropped;
    __u64 rx_invalid_descs;
    __u64 tx_invalid_descs;
    __u64 rx_ring_full;
    __u64 rx_fill_ring_empty_descs;
    __u64 tx_ring_empty_descs;
  };

When being copied to the userspace, 'stats' will not contain any uninitialized
'holes' between struct fields.

Fixes: 8aa5a335 ("xsk: Add new statistics")
Suggested-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPeilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/bpf/20200728053604.404631-1-yepeilin.cs@gmail.com

3c4f850e

bpf: Fix swapped arguments in calls to check_buffer_access · f6dfbe31

由 Colin Ian King 提交于 7月 27, 2020

There are a couple of arguments of the boolean flag zero_size_allowed and
the char pointer buf_info when calling to function check_buffer_access that
are swapped by mistake. Fix these by swapping them to correct the argument
ordering.

Fixes: afbf21dc ("bpf: Support readonly/readwrite buffers in verifier")
Addresses-Coverity: ("Array compared to 0")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727175411.155179-1-colin.king@canonical.com

f6dfbe31

selftests/bpf: Add new bpf_iter context structs to fix build on old kernels · 363885d7

由 Andrii Nakryiko 提交于 7月 27, 2020

Add bpf_iter__bpf_map_elem and bpf_iter__bpf_sk_storage_map to bpf_iter.h.

Fixes: 3b1c420b ("selftests/bpf: Add a test for bpf sk_storage_map iterator")
Fixes: 2a7c2fff ("selftests/bpf: Add test for bpf hash map iterators")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727233345.1686358-1-andriin@fb.com

363885d7

bpf: Fix bpf_ringbuf_output() signature to return long · e1613b57

由 Andrii Nakryiko 提交于 7月 27, 2020

Due to bpf tree fix merge, bpf_ringbuf_output() signature ended up with int as
a return type, while all other helpers got converted to returning long. So fix
it in bpf-next now.

Fixes: b0659d8a ("bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200727224715.652037-1-andriin@fb.com

e1613b57

tools, bpftool: Add LSM type to array of prog names · 9a97c9d2

由 Quentin Monnet 提交于 7月 24, 2020

Assign "lsm" as a printed name for BPF_PROG_TYPE_LSM in bpftool, so that
it can use it when listing programs loaded on the system or when probing
features.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200724090618.16378-3-quentin@isovalent.com

9a97c9d2

tools, bpftool: Skip type probe if name is not found · 70cfab1d

由 Quentin Monnet 提交于 7月 24, 2020

For probing program and map types, bpftool loops on type values and uses
the relevant type name in prog_type_name[] or map_type_name[]. To ensure
the name exists, we exit from the loop if we go over the size of the
array.

However, this is not enough in the case where the arrays have "holes" in
them, program or map types for which they have no name, but not at the
end of the list. This is currently the case for BPF_PROG_TYPE_LSM, not
known to bpftool and which name is a null string. When probing for
features, bpftool attempts to strlen() that name and segfaults.

Let's fix it by skipping probes for "unknown" program and map types,
with an informational message giving the numeral value in that case.

Fixes: 93a3545d ("tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type")
Reported-by: NPaul Chaignon <paul@cilium.io>
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200724090618.16378-2-quentin@isovalent.com

70cfab1d

26 7月, 2020 18 次提交

Merge branch 'bpf_link-XDP' · 47960ad6

由 Alexei Starovoitov 提交于 7月 25, 2020

Andrii Nakryiko says:

====================
Following cgroup and netns examples, implement bpf_link support for XDP.

The semantics is described in patch #2. Program and link attachments are
mutually exclusive, in the sense that neither link can replace attached
program nor program can replace attached link. Link can't replace attached
link as well, as is the case for any other bpf_link implementation.

Patch #1 refactors existing BPF program-based attachment API and centralizes
high-level query/attach decisions in generic kernel code, while drivers are
kept simple and are instructed with low-level decisions about attaching and
detaching specific bpf_prog. This also makes QUERY command unnecessary, and
patch #8 removes support for it from all kernel drivers. If that's a bad idea,
we can drop that patch altogether.

With refactoring in patch #1, adding bpf_xdp_link is completely transparent to
drivers, they are still functioning at the level of "effective" bpf_prog, that
should be called in XDP data path.

Corresponding libbpf support for BPF XDP link is added in patch #5.

v3->v4:
- fix a compilation warning in one of drivers (Jakub);

v2->v3:
- fix build when CONFIG_BPF_SYSCALL=n (kernel test robot);

v1->v2:
- fix prog refcounting bug (David);
- split dev_change_xdp_fd() changes into 2 patches (David);
- add extack messages to all user-induced errors (David).
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

47960ad6

bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands · e8407fde

由 Andrii Nakryiko 提交于 7月 21, 2020

Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().

This patch was compile-tested on allyesconfig.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com

e8407fde

selftests/bpf: Add BPF XDP link selftests · fe48230c

由 Andrii Nakryiko 提交于 7月 21, 2020

Add selftest validating all the attachment logic around BPF XDP link. Test
also link updates and get_obj_info() APIs.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-9-andriin@fb.com

fe48230c

libbpf: Add support for BPF XDP link · dc8698ca

由 Andrii Nakryiko 提交于 7月 21, 2020

Sync UAPI header and add support for using bpf_link-based XDP attachment.
Make xdp/ prog type set expected attach type. Kernel didn't enforce
attach_type for XDP programs before, so there is no backwards compatiblity
issues there.

Also fix section_names selftest to recognize that xdp prog types now have
expected attach type.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com

dc8698ca

bpf: Implement BPF XDP link-specific introspection APIs · c1931c97

由 Andrii Nakryiko 提交于 7月 21, 2020

Implement XDP link-specific show_fdinfo and link_info to emit ifindex.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com

c1931c97

bpf, xdp: Implement LINK_UPDATE for BPF XDP link · 026a4c28

由 Andrii Nakryiko 提交于 7月 21, 2020

Add support for LINK_UPDATE command for BPF XDP link to enable reliable
replacement of underlying BPF program.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com

026a4c28

bpf, xdp: Add bpf_link-based XDP attachment API · aa8d3a71

由 Andrii Nakryiko 提交于 7月 21, 2020

Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
BPF_LINK_CREATE command.

bpf_xdp_link is mutually exclusive with direct BPF program attachment,
previous BPF program should be detached prior to attempting to create a new
bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
can't be replaced by other BPF program attachment or link attachment. It will
be detached only when the last BPF link FD is closed.

bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
will become defunct, but won't be destroyed until last FD is closed.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com

aa8d3a71

bpf, xdp: Extract common XDP program attachment logic · d4baa936

由 Andrii Nakryiko 提交于 7月 21, 2020

Further refactor XDP attachment code. dev_change_xdp_fd() is split into two
parts: getting bpf_progs from FDs and attachment logic, working with
bpf_progs. This makes attachment logic a bit more straightforward and
prepares code for bpf_xdp_link inclusion, which will share the common logic.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com

d4baa936

bpf, xdp: Maintain info on attached XDP BPF programs in net_device · 7f0a8382

由 Andrii Nakryiko 提交于 7月 21, 2020

Instead of delegating to drivers, maintain information about which BPF
programs are attached in which XDP modes (generic/skb, driver, or hardware)
locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.

Such re-organization simplifies existing code already. But it also allows to
further add bpf_link-based XDP attachments without drivers having to know
about any of this at all, which seems like a good setup.
XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
install/uninstall active BPF program. All the higher-level concerns about
prog/link interaction will be contained within generic driver-agnostic logic.

All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
when resetting installed programs. That seems unnecessary, plus most drivers
don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
should be enough of an indicator of what is required of driver to correctly
reset active BPF program. dev_xdp_uninstall() is also generalized as an
iteration over all three supported mode.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com

7f0a8382

bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALL · 6cc7d1e8

由 Andrii Nakryiko 提交于 7月 21, 2020

Similarly to bpf_prog, make bpf_link and related generic API available
unconditionally to make it easier to have bpf_link support in various parts of
the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com

6cc7d1e8

bpf: Fix build on architectures with special bpf_user_pt_regs_t · 2b9b305f

由 Song Liu 提交于 7月 24, 2020

Architectures like s390, powerpc, arm64, riscv have speical definition of
bpf_user_pt_regs_t. So we need to cast the pointer before passing it to
bpf_get_stack(). This is similar to bpf_get_stack_tp().

Fixes: 03d42fd2d83f ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724200503.3629591-1-songliubraving@fb.com

2b9b305f

bpf/local_storage: Fix build without CONFIG_CGROUP · dfcdf0e9

由 YiFei Zhu 提交于 7月 24, 2020

local_storage.o has its compile guard as CONFIG_BPF_SYSCALL, which
does not imply that CONFIG_CGROUP is on. Including cgroup-internal.h
when CONFIG_CGROUP is off cause a compilation failure.

Fixes: f67cfc233706 ("bpf: Make cgroup storages shared between programs on the same cgroup")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724211753.902969-1-zhuyifei1999@gmail.com

dfcdf0e9

Merge branch 'shared-cgroup-storage' · 36f72484

由 Alexei Starovoitov 提交于 7月 23, 2020

YiFei Zhu says:

====================
To access the storage in a CGROUP_STORAGE map, one uses
bpf_get_local_storage helper, which is extremely fast due to its
use of per-CPU variables. However, its whole code is built on
the assumption that one map can only be used by one program at any
time, and this prohibits any sharing of data between multiple
programs using these maps, eliminating a lot of use cases, such
as some per-cgroup configuration storage, written to by a
setsockopt program and read by a cg_sock_addr program.

Why not use other map types? The great part of CGROUP_STORAGE map
is that it is isolated by different cgroups its attached to. When
one program uses bpf_get_local_storage, even on the same map, it
gets different storages if it were run as a result of attaching
to different cgroups. The kernel manages the storages, simplifying
BPF program or userspace. In theory, one could probably use other
maps like array or hash to do the same thing, but it would be a
major overhead / complexity. Userspace needs to know when a cgroup
is being freed in order to free up a space in the replacement map.

This patch set introduces a significant change to the semantics of
CGROUP_STORAGE map type. Instead of each storage being tied to one
single attachment, it is shared across different attachments to
the same cgroup, and persists until either the map or the cgroup
attached to is being freed.

User may use u64 as the key to the map, and the result would be
that the attach type become ignored during key comparison, and
programs of different attach types will share the same storage if
the cgroups they are attached to are the same.

How could this break existing users?
* Users that uses detach & reattach / program replacement as a
  shortcut to zeroing the storage. Since we need sharing between
  programs, we cannot zero the storage. Users that expect this
  behavior should either attach a program with a new map, or
  explicitly zero the map with a syscall.
This case is dependent on undocumented implementation details,
so the impact should be very minimal.

Patch 1 introduces a test on the old expected behavior of the map
type.

Patch 2 introduces a test showing how two programs cannot share
one such map.

Patch 3 implements the change of semantics to the map.

Patch 4 amends the new test such that it yields the behavior we
expect from the change.

Patch 5 documents the map type.

Changes since RFC:
* Clarify commit message in patch 3 such that it says the lifetime
  of the storage is ended at the freeing of the cgroup_bpf, rather
  than the cgroup itself.
* Restored an -ENOMEM check in __cgroup_bpf_attach.
* Update selftests for recent change in network_helpers API.

Changes since v1:
* s/CHECK_FAIL/CHECK/
* s/bpf_prog_attach/bpf_program__attach_cgroup/
* Moved test__start_subtest to test_cg_storage_multi.
* Removed some redundant CHECK_FAIL where they are already CHECK-ed.

Changes since v2:
* Lock cgroup_mutex during map_free.
* Publish new storages only if attach is successful, by tracking
  exactly which storages are reused in an array of bools.
* Mention bpftool map dump showing a value of zero for attach_type
  in patch 3 commit message.

Changes since v3:
* Use a much simpler lookup and allocate-if-not-exist from the fact
  that cgroup_mutex is locked during attach.
* Removed an unnecessary spinlock hold.

Changes since v4:
* Changed semantics so that if the key type is struct
  bpf_cgroup_storage_key the map retains isolation between different
  attach types. Sharing between different attach types only occur
  when key type is u64.
* Adapted tests and docs for the above change.

Changes since v5:
* Removed redundant NULL check before bpf_link__destroy.
* Free BPF object explicitly, after asserting that object failed to
  load, in the event that the object did not fail to load.
* Rename variable in bpf_cgroup_storage_key_cmp for clarity.
* Added a lot of information to Documentation, more or less copied
  from what Martin KaFai Lau wrote.
====================
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

36f72484

Documentation/bpf: Document CGROUP_STORAGE map type · 4e15f460

由 YiFei Zhu 提交于 7月 23, 2020

The machanics and usage are not very straightforward. Given the
changes it's better to document how it works and how to use it,
rather than having to rely on the examples and implementation to
infer what is going on.
Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/b412edfbb05cb1077c9e2a36a981a54ee23fa8b3.1595565795.git.zhuyifei@google.com

4e15f460

selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress · 3573f384

由 YiFei Zhu 提交于 7月 23, 2020

This mirrors the original egress-only test. The cgroup_storage is
now extended to have two packet counters, one for egress and one
for ingress. We also extend to have two egress programs to test
that egress will always share with other egress origrams in the
same cgroup. The behavior of the counters are exactly the same as
the original egress-only test.

The test is split into two, one "isolated" test that when the key
type is struct bpf_cgroup_storage_key, which contains the attach
type, programs of different attach types will see different
storages. The other, "shared" test that when the key type is u64,
programs of different attach types will see the same storage if
they are attached to the same cgroup.
Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c756f5f1521227b8e6e90a453299dda722d7324d.1595565795.git.zhuyifei@google.com

3573f384

Merge branch 'fix-bpf_get_stack-with-PEBS' · 90065c06

由 Alexei Starovoitov 提交于 7月 23, 2020

Song Liu says:

====================
Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, perf subsystem fetches callchain early,
and marks perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similar issue exists when BPF program calls get_perf_callchain() via
helper functions. For more information about this issue, please refer to
discussions in [1].

This set fixes this issue with helper proto bpf_get_stackid_pe and
bpf_get_stack_pe.

[1] https://lore.kernel.org/lkml/ED7B9430-6489-4260-B3C5-9CFA2E3AA87A@fb.com/

Changes v4 => v5:
1. Return -EPROTO instead of -EINVAL on PERF_EVENT_IOC_SET_BPF errors.
   (Alexei)
2. Let libbpf print a hint message when PERF_EVENT_IOC_SET_BPF returns
   -EPROTO. (Alexei)

Changes v3 => v4:
1. Fix error check logic in bpf_get_stackid_pe and bpf_get_stack_pe.
   (Alexei)
2. Do not allow attaching BPF programs with bpf_get_stack|stackid to
   perf_event with precise_ip > 0, but not proper callchain. (Alexei)
3. Add selftest get_stackid_cannot_attach.

Changes v2 => v3:
1. Fix handling of stackmap skip field. (Andrii)
2. Simplify the code in a few places. (Andrii)

Changes v1 => v2:
1. Simplify the design and avoid introducing new helper function. (Andrii)
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

90065c06

bpf: Make cgroup storages shared between programs on the same cgroup · 7d9c3427

由 YiFei Zhu 提交于 7月 23, 2020

This change comes in several parts:

One, the restriction that the CGROUP_STORAGE map can only be used
by one program is removed. This results in the removal of the field
'aux' in struct bpf_cgroup_storage_map, and removal of relevant
code associated with the field, and removal of now-noop functions
bpf_free_cgroup_storage and bpf_cgroup_storage_release.

Second, we permit a key of type u64 as the key to the map.
Providing such a key type indicates that the map should ignore
attach type when comparing map keys. However, for simplicity newly
linked storage will still have the attach type at link time in
its key struct. cgroup_storage_check_btf is adapted to accept
u64 as the type of the key.

Third, because the storages are now shared, the storages cannot
be unconditionally freed on program detach. There could be two
ways to solve this issue:
* A. Reference count the usage of the storages, and free when the
last program is detached.
* B. Free only when the storage is impossible to be referred to
again, i.e. when either the cgroup_bpf it is attached to, or
the map itself, is freed.
Option A has the side effect that, when the user detach and
reattach a program, whether the program gets a fresh storage
depends on whether there is another program attached using that
storage. This could trigger races if the user is multi-threaded,
and since nondeterminism in data races is evil, go with option B.

The both the map and the cgroup_bpf now tracks their associated
storages, and the storage unlink and free are removed from
cgroup_bpf_detach and added to cgroup_bpf_release and
cgroup_storage_map_free. The latter also new holds the cgroup_mutex
to prevent any races with the former.

Fourth, on attach, we reuse the old storage if the key already
exists in the map, via cgroup_storage_lookup. If the storage
does not exist yet, we create a new one, and publish it at the
last step in the attach process. This does not create a race
condition because for the whole attach the cgroup_mutex is held.
We keep track of an array of new storages that was allocated
and if the process fails only the new storages would get freed.
Signed-off-by: NYiFei Zhu <zhuyifei@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com

7d9c3427

selftests/bpf: Add get_stackid_cannot_attach · 346938e9

由 Song Liu 提交于 7月 23, 2020

This test confirms that BPF program that calls bpf_get_stackid() cannot
attach to perf_event with precise_ip > 0 but not PERF_SAMPLE_CALLCHAIN;
and cannot attach if the perf_event has exclude_callchain_kernel.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-6-songliubraving@fb.com

346938e9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功