提交 · fcf85729d8efed1c756842ddb27e84f5f4ebc501 · openeuler / Kernel

29 4月, 2018 14 次提交

Merge branch 'fix-bpf-helpers-doc' · fcf85729

由 Alexei Starovoitov 提交于 4月 29, 2018

Andrey Ignatov says:

====================
BPF helpers documentation in UAPI refers to kernel ctx structures when it
has to refer to user visible ones. Fix it.
====================
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fcf85729

bpf: Sync bpf.h to tools/ · 96871b9f

由 Andrey Ignatov 提交于 4月 28, 2018

The patch syncs bpf.h to tools/.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

96871b9f

bpf: Fix helpers ctx struct types in uapi doc · a3ef8e9a

由 Andrey Ignatov 提交于 4月 28, 2018

Helpers may operate on two types of ctx structures: user visible ones
(e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones
(e.g. `struct bpf_sock_ops_kern`) in kernel implementation.

UAPI documentation must refer to only user visible structures.

The patch replaces references to `_kern` structures in BPF helpers
description by corresponding user visible structures.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

a3ef8e9a

Merge branch 'bpf_get_stack' · f60ad0a0

由 Alexei Starovoitov 提交于 4月 29, 2018

Yonghong Song says:

====================
Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table regardless of
whether BPF_F_REUSE_STACKID is specified or not,
so some stack traces may be missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Patches #1 and #2 implemented the core kernel support.
Patch #3 removes two never-hit branches in verifier.
Patches #4 and #5 are two verifier improves to make
bpf programming easier. Patch #6 synced the new helper
to tools headers. Patch #7 moved perf_event polling code
and ksym lookup code from samples/bpf to
tools/testing/selftests/bpf. Patch #8 added a verifier
test in tools/bpf for new verifier change.
Patches #9 and #10 added tests for raw tracepoint prog
and tracepoint prog respectively.

Changelogs:
  v8 -> v9:
    . make function perf_event_mmap (in trace_helpers.c) extern
      to decouple perf_event_mmap and perf_event_poller.
    . add jit enabled handling for kernel stack verification
      in Patch #9. Since we did not have a good way to
      verify jit enabled kernel stack, just return true if
      the kernel stack is not empty.
    . In path #9, using raw_syscalls/sys_enter instead of
      sched/sched_switch, removed calling cmd
      "task 1 dd if=/dev/zero of=/dev/null" which is left
      with dangling process after the program exited.
  v7 -> v8:
    . rebase on top of latest bpf-next
    . simplify BPF_ARSH dst_reg->smin_val/smax_value tracking
    . rewrite the description of bpf_get_stack() in uapi bpf.h
      based on new format.
  v6 -> v7:
    . do perf callchain buffer allocation inside the
      verifier. so if the prog->has_callchain_buf is set,
      it is guaranteed that the buffer has been allocated.
    . change condition "trace_nr <= skip" to "trace_nr < skip"
      so that for zero size buffer, return 0 instead of -EFAULT
  v5 -> v6:
    . after refining return register smax_value and umax_value
      for helpers bpf_get_stack and bpf_probe_read_str,
      bounds and var_off of the return register are further refined.
    . added missing commit message for tools header sync commit.
    . removed one unnecessary empty line.
  v4 -> v5:
    . relied on dst_reg->var_off to refine umin_val/umax_val
      in verifier handling BPF_ARSH value range tracking,
      suggested by Edward.
  v3 -> v4:
    . fixed a bug when meta ptr is set to NULL in check_func_arg.
    . introduced tnum_arshift and added detailed comments for
      the underlying implementation
    . avoided using VLA in tools/bpf test_progs.
  v2 -> v3:
    . used meta to track helper memory size argument
    . implemented range checking for ARSH in verifier
    . moved perf event polling and ksym related functions
      from samples/bpf to tools/bpf
    . added test to compare build id's between bpf_get_stackid
      and bpf_get_stack
  v1 -> v2:
    . fixed compilation error when CONFIG_PERF_EVENTS is not enabled
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

f60ad0a0

tools/bpf: add a test for bpf_get_stack with tracepoint prog · 79b45350

由 Yonghong Song 提交于 4月 28, 2018

The test_stacktrace_map and test_stacktrace_build_id are
enhanced to call bpf_get_stack in the helper to get the
stack trace as well.  The stack traces from bpf_get_stack
and bpf_get_stackid are compared to ensure that for the
same stack as represented as the same hash, their ip addresses
or build id's must be the same.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

79b45350

tools/bpf: add a test for bpf_get_stack with raw tracepoint prog · 173965fb

由 Yonghong Song 提交于 4月 28, 2018

The test attached a raw_tracepoint program to raw_syscalls/sys_enter.
It tested to get stack for user space, kernel space and user
space with build_id request. It also tested to get user
and kernel stack into the same buffer with back-to-back
bpf_get_stack helper calls.

If jit is not enabled, the user space application will check
to ensure that the kernel function for raw_tracepoint
___bpf_prog_run is part of the stack.

If jit is enabled, we did not have a reliable way to
verify the kernel stack, so just assume the kernel stack
is good when the kernel stack size is greater than 0.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

173965fb

tools/bpf: add a verifier test case for bpf_get_stack helper and ARSH · 2abe611c

由 Yonghong Song 提交于 4月 28, 2018

The test_verifier already has a few ARSH test cases.
This patch adds a new test case which takes advantage of newly
improved verifier behavior for bpf_get_stack and ARSH.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2abe611c

samples/bpf: move common-purpose trace functions to selftests · 28dbf861

由 Yonghong Song 提交于 4月 28, 2018

There is no functionality change in this patch. The common-purpose
trace functions, including perf_event polling and ksym lookup,
are moved from trace_output_user.c and bpf_load.c to
selftests/bpf/trace_helpers.c so that these function can
be reused later in selftests.
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

28dbf861

tools/bpf: add bpf_get_stack helper to tools headers · de2ff05f

由 Yonghong Song 提交于 4月 28, 2018

The tools header file bpf.h is synced with kernel uapi bpf.h.
The new helper is also added to bpf_helpers.h.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

de2ff05f

bpf/verifier: improve register value range tracking with ARSH · 9cbe1f5a

由 Yonghong Song 提交于 4月 28, 2018

When helpers like bpf_get_stack returns an int value
and later on used for arithmetic computation, the LSH and ARSH
operations are often required to get proper sign extension into
64-bit. For example, without this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0)
With this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
With better range of "R8", later on when "R8" is added to other register,
e.g., a map pointer or scalar-value register, the better register
range can be derived and verifier failure may be avoided.

In our later example,
    ......
    usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
    if (usize < 0)
        return 0;
    ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
    ......
Without improving ARSH value range tracking, the register representing
"max_len - usize" will have smin_value equal to S64_MIN and will be
rejected by verifier.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9cbe1f5a

bpf: remove never-hit branches in verifier adjust_scalar_min_max_vals · afbe1a5b

由 Yonghong Song 提交于 4月 28, 2018

In verifier function adjust_scalar_min_max_vals,
when src_known is false and the opcode is BPF_LSH/BPF_RSH,
early return will happen in the function. So remove
the branch in handling BPF_LSH/BPF_RSH when src_known is false.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

afbe1a5b

bpf/verifier: refine retval R0 state for bpf_get_stack helper · 849fa506

由 Yonghong Song 提交于 4月 28, 2018

The special property of return values for helpers bpf_get_stack
and bpf_probe_read_str are captured in verifier.
Both helpers return a negative error code or
a length, which is equal to or smaller than the buffer
size argument. This additional information in the
verifier can avoid the condition such as "retval > bufsize"
in the bpf program. For example, for the code blow,
    usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
    if (usize < 0 || usize > max_len)
        return 0;
The verifier may have the following errors:
    52: (85) call bpf_get_stack#65
     R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
     R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R9_w=inv800 R10=fp0,call_-1
    53: (bf) r8 = r0
    54: (bf) r1 = r8
    55: (67) r1 <<= 32
    56: (bf) r2 = r1
    57: (77) r2 >>= 32
    58: (25) if r2 > 0x31f goto pc+33
     R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512,
                         umax_value=18446744069414584320,
                         var_off=(0x0; 0xffffffff00000000))
     R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff))
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R8=inv(id=0) R9=inv800 R10=fp0,call_-1
    59: (1f) r9 -= r8
    60: (c7) r1 s>>= 32
    61: (bf) r2 = r7
    62: (0f) r2 += r1
    math between map_value pointer and register with unbounded
    min value is not allowed
The failure is due to llvm compiler optimization where register "r2",
which is a copy of "r1", is tested for condition while later on "r1"
is used for map_ptr operation. The verifier is not able to track such
inst sequence effectively.

Without the "usize > max_len" condition, there is no llvm optimization
and the below generated code passed verifier:
    52: (85) call bpf_get_stack#65
     R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
     R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
     R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R9_w=inv800 R10=fp0,call_-1
    53: (b7) r1 = 0
    54: (bf) r8 = r0
    55: (67) r8 <<= 32
    56: (c7) r8 s>>= 32
    57: (6d) if r1 s> r8 goto pc+24
     R0=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
     R1=inv0 R6=ctx(id=0,off=0,imm=0)
     R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
     R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800
     R10=fp0,call_-1
    58: (bf) r2 = r7
    59: (0f) r2 += r8
    60: (1f) r9 -= r8
    61: (bf) r1 = r6
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

849fa506

bpf: add bpf_get_stack helper · c195651e

由 Yonghong Song 提交于 4月 28, 2018

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

c195651e

bpf: change prototype for stack_map_get_build_id_offset · 5f412632

由 Yonghong Song 提交于 4月 28, 2018

This patch didn't incur functionality change. The function prototype
got changed so that the same function can be reused later.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

5f412632

27 4月, 2018 26 次提交

bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON · 2c25fc9a

由 Leo Yan 提交于 4月 27, 2018

When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.
Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2c25fc9a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 79741a38

由 David S. Miller 提交于 4月 26, 2018

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-04-27

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add extensive BPF helper description into include/uapi/linux/bpf.h
   and a new script bpf_helpers_doc.py which allows for generating a
   man page out of it. Thus, every helper in BPF now comes with proper
   function signature, detailed description and return code explanation,
   from Quentin.

2) Migrate the BPF collect metadata tunnel tests from BPF samples over
   to the BPF selftests and further extend them with v6 vxlan, geneve
   and ipip tests, simplify the ipip tests, improve documentation and
   convert to bpf_ntoh*() / bpf_hton*() api, from William.

3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
   access stack and packet memory. Extend this to allow such helpers
   to also use map values, which enabled use cases where value from
   a first lookup can be directly used as a key for a second lookup,
   from Paul.

4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
   order to retrieve XFRM state information containing SPI, peer
   address and reqid values, from Eyal.

5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
   and SUB instructions with negative immediate into the opposite
   operation with a positive immediate such that nfp can better fit
   small immediates into instructions. Savings in instruction count
   up to 4% have been observed, from Jakub.

6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
   and add support for dumping this through bpftool, from Jiri.

7) Move the BPF sockmap samples over into BPF selftests instead since
   sockmap was rather a series of tests than sample anyway and this way
   this can be run from automated bots, from John.

8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
   with generic XDP, from Nikita.

9) Some follow-up cleanups to BTF, namely, removing unused defines from
   BTF uapi header and renaming 'name' struct btf_* members into name_off
   to make it more clear they are offsets into string section, from Martin.

10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
    not run directly but invoked from test_sock_addr.sh, from Yonghong.

11) Remove redundant ret assignment in sample BPF loader, from Wang.

12) Add couple of missing files to BPF selftest's gitignore, from Anders.

There are two trivial merge conflicts while pulling:

  1) Remove samples/sockmap/Makefile since all sockmap tests have been
     moved to selftests.
  2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
     file since git should ignore all of them.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79741a38

samples, bpf: remove redundant ret assignment in bpf_load_program() · c0885f61

由 Wang Sheng-Hui 提交于 4月 25, 2018

2 redundant ret assignments removed:

* 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
  label 'done'. No 'ret = 1' needed before the error jump.

* After the '/* load programs */' part, if everything goes well, then
  the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
  If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
  after the for clause will make the error skipped.

  For example, if some BPF code cannot provide supported program types
  in ELF SEC("unknown"), the for clause will not call load_and_attach()
  to load the BPF code. 1 should be returned to callees instead of 0.
Signed-off-by: NWang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c0885f61

Merge branch 'bpf-uapi-helper-doc' · a6712d45

由 Daniel Borkmann 提交于 4月 27, 2018

Quentin Monnet says:

====================
eBPF helper functions can be called from within eBPF programs to perform
a variety of tasks that would be otherwise hard or impossible to do with
eBPF itself. There is a growing number of such helper functions in the
kernel, but documentation is scarce. The main user space header file
does contain a short commented description of most helpers, but it is
somewhat outdated and not complete. It is more a "cheat sheet" than a
real documentation accessible to new eBPF developers.

This commit attempts to improve the situation by replacing the existing
overview for the helpers with a more developed description. Furthermore,
a Python script is added to generate a manual page for eBPF helpers. The
workflow is the following, and requires the rst2man utility:

    $ ./scripts/bpf_helpers_doc.py \
            --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
    $ man /tmp/bpf-helpers.7

The objective is to keep all documentation related to the helpers in a
single place, and to be able to generate from here a manual page that
could be packaged in the man-pages repository and shipped with most
distributions.

Additionally, parsing the prototypes of the helper functions could
hopefully be reused, with a different Printer object, to generate
header files needed in some eBPF-related projects.

Regarding the description of each helper, it comprises several items:

- The function prototype.
- A description of the function and of its arguments (except for a
  couple of cases, when there are no arguments and the return value
  makes the function usage really obvious).
- A description of return values (if not void).

Additional items such as the list of compatible eBPF program and map
types for each helper, Linux kernel version that introduced the helper,
GPL-only restriction, and commit hash could be added in the future, but
it was decided on the mailing list to leave them aside for now.

For several helpers, descriptions are inspired (at times, nearly copied)
from the commit logs introducing them in the kernel--Many thanks to
their respective authors! Some sentences were also adapted from comments
from the reviews, thanks to the reviewers as well. Descriptions were
completed as much as possible, the objective being to have something easily
accessible even for people just starting with eBPF. There is probably a bit
more work to do in this direction for some helpers.

Some RST formatting is used in the descriptions (not in function
prototypes, to keep them readable, but the Python script provided in
order to generate the RST for the manual page does add formatting to
prototypes, to produce something pretty) to get "bold" and "italics" in
manual pages. Hopefully, the descriptions in bpf.h file remains
perfectly readable. Note that the few trailing white spaces are
intentional, removing them would break paragraphs for rst2man.

The descriptions should ideally be updated each time someone adds a new
helper, or updates the behaviour (new socket option supported, ...) or
the interface (new flags available, ...) of existing ones.

To ease the review process, the documentation has been split into several
patches.

v3 -> v4:
- Add a patch (#9) for newly added BPF helpers.
- Add a patch (#10) to update UAPI bpf.h version under tools/.
- Use SPDX tag in Python script.
- Several fixes on man page header and footer, and helpers documentation.
  Please refer to individual patches for details.

RFC v2 -> PATCH v3:
Several fixes on man page header and footer, and helpers documentation.
Please refer to individual patches for details.

RFC v1 -> RFC v2:
- Remove "For" (compatible program and map types), "Since" (minimal
  Linux kernel version required), "GPL only" sections and commit hashes
  for the helpers.
- Add comment on top of the description list to explain how this
  documentation is supposed to be processed.
- Update Python script accordingly (remove the same sections, and remove
  paragraphs on program types and GPL restrictions from man page
  header).
- Split series into several patches.
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org

a6712d45

bpf: update bpf.h uapi header for tools · 9cde0c88

由 Quentin Monnet 提交于 4月 25, 2018

Update tools/include/uapi/linux/bpf.h file in order to reflect the
changes for BPF helper functions documentation introduced in previous
commits.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9cde0c88

bpf: add documentation for eBPF helpers (65-66) · 2d020dd7

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helper from Nikita:
- bpf_xdp_adjust_tail()

Helper from Eyal:
- bpf_skb_get_xfrm_state()

v4:
- New patch (helpers did not exist yet for previous versions).

Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2d020dd7

bpf: add documentation for eBPF helpers (58-64) · ab127040

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by John:

- bpf_redirect_map()
- bpf_sk_redirect_map()
- bpf_sock_map_update()
- bpf_msg_redirect_map()
- bpf_msg_apply_bytes()
- bpf_msg_cork_bytes()
- bpf_msg_pull_data()

v4:
- bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
  "his" to "this". Also add a paragraph on performance improvement over
  bpf_redirect() helper.

v3:
- bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
  for generic XDP but supported on native XDP.
- bpf_msg_pull_data(): Clarify comment about invalidated verifier
  checks.

Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ab127040

bpf: add documentation for eBPF helpers (51-57) · 7aa79a86

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

v4:
- bpf_perf_event_read_value(): State that this helper should be
  preferred over bpf_perf_event_read().

v3:
- bpf_perf_event_read_value(): Fix time of selection for perf event type
  in description. Remove occurences of "cores" to avoid confusion with
  "CPU".
- bpf_bind(): Remove last paragraph of description, which was off topic.

Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NYonghong Song <yhs@fb.com>
[for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
Acked-by: NAndrey Ignatov <rdna@fb.com>
[for bpf_bind()]
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

7aa79a86

bpf: add documentation for eBPF helpers (42-50) · c6b5fb86

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helper from Kaixu:
- bpf_perf_event_read()

Helpers from Martin:
- bpf_skb_under_cgroup()
- bpf_xdp_adjust_head()

Helpers from Sargun:
- bpf_probe_write_user()
- bpf_current_task_under_cgroup()

Helper from Thomas:
- bpf_skb_change_head()

Helper from Gianluca:
- bpf_probe_read_str()

Helpers from Chenbo:
- bpf_get_socket_cookie()
- bpf_get_socket_uid()

v4:
- bpf_perf_event_read(): State that bpf_perf_event_read_value() should
  be preferred over this helper.
- bpf_skb_change_head(): Clarify comment about invalidated verifier
  checks.
- bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
  checks.
- bpf_probe_write_user(): Add that dst must be a valid user space
  address.
- bpf_get_socket_cookie(): Improve description by making clearer that
  the cockie belongs to the socket, and state that it remains stable for
  the life of the socket.

v3:
- bpf_perf_event_read(): Fix time of selection for perf event type in
  description. Remove occurences of "cores" to avoid confusion with
  "CPU".

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Gianluca Borello <g.borello@gmail.com>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
[for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c6b5fb86

bpf: add documentation for eBPF helpers (33-41) · fa15601a

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_hash_recalc()
- bpf_skb_change_tail()
- bpf_skb_pull_data()
- bpf_csum_update()
- bpf_set_hash_invalid()
- bpf_get_numa_node_id()
- bpf_set_hash()
- bpf_skb_adjust_room()
- bpf_xdp_adjust_meta()

v4:
- bpf_skb_change_tail(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_pull_data(): Clarify the motivation for using this helper or
  bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
  *skb*. Clarify comment about invalidated verifier checks.
- bpf_csum_update(): Fix description of checksum (entire packet, not IP
  checksum). Fix a typo: "header" instead of "helper".
- bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
- bpf_get_numa_node_id(): State that the helper is not restricted to
  programs attached to sockets.
- bpf_skb_adjust_room(): Clarify comment about invalidated verifier
  checks.
- bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
  checks.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

fa15601a

bpf: add documentation for eBPF helpers (23-32) · 1fdd08be

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_prandom_u32()
- bpf_get_smp_processor_id()
- bpf_get_cgroup_classid()
- bpf_get_route_realm()
- bpf_skb_load_bytes()
- bpf_csum_diff()
- bpf_skb_get_tunnel_opt()
- bpf_skb_set_tunnel_opt()
- bpf_skb_change_proto()
- bpf_skb_change_type()

v4:
- bpf_get_prandom_u32(): Warn that the prng is not cryptographically
  secure.
- bpf_get_smp_processor_id(): Fix a typo (case).
- bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
  being limited to cgroup v1, and to egress path.
- bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
  Add a note about usage with TC and advantage of clsact. Fix a typo in
  return value ("sdb" instead of "skb").
- bpf_skb_load_bytes(): Make explicit loading large data loads it to the
  eBPF stack.
- bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
  bpf_l3|l4_csum_replace().
- bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
  metadata" mode, and example of this with Geneve.
- bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
  description.
- bpf_skb_change_proto(): Mention that the main use case is NAT64.
  Clarify comment about invalidated verifier checks.

v3:
- bpf_get_prandom_u32(): Fix helper name :(. Add description, including
  a note on the internal random state.
- bpf_get_smp_processor_id(): Add description, including a note on the
  processor id remaining stable during program run.
- bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
  required to use the helper. Add a reference to related documentation.
  State that placing a task in net_cls controller disables cgroup-bpf.
- bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
  required to use this helper.
- bpf_skb_load_bytes(): Fix comment on current use cases for the helper.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1fdd08be

bpf: add documentation for eBPF helpers (12-22) · c456dec4

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Alexei:

- bpf_get_current_pid_tgid()
- bpf_get_current_uid_gid()
- bpf_get_current_comm()
- bpf_skb_vlan_push()
- bpf_skb_vlan_pop()
- bpf_skb_get_tunnel_key()
- bpf_skb_set_tunnel_key()
- bpf_redirect()
- bpf_perf_event_output()
- bpf_get_stackid()
- bpf_get_current_task()

v4:
- bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
  note on bpf_redirect_map() providing better performance. Replace "Save
  for" with "Except for".
- bpf_skb_vlan_push(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
  mode, and example tunneling protocols with which it can be used.
- bpf_skb_set_tunnel_key(): Add a reference to the description of
  bpf_skb_get_tunnel_key().
- bpf_perf_event_output(): Specify that, and for what purpose, the
  helper can be used with programs attached to TC and XDP.

v3:
- bpf_skb_get_tunnel_key(): Change and improve description and example.
- bpf_redirect(): Improve description of BPF_F_INGRESS flag.
- bpf_perf_event_output(): Fix first sentence of description. Delete
  wrong statement on context being evaluated as a struct pt_reg. Remove
  the long yet incomplete example.
- bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
  configurable.

Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c456dec4

bpf: add documentation for eBPF helpers (01-11) · ad4a5223

由 Quentin Monnet 提交于 4月 25, 2018

Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Alexei:

- bpf_map_lookup_elem()
- bpf_map_update_elem()
- bpf_map_delete_elem()
- bpf_probe_read()
- bpf_ktime_get_ns()
- bpf_trace_printk()
- bpf_skb_store_bytes()
- bpf_l3_csum_replace()
- bpf_l4_csum_replace()
- bpf_tail_call()
- bpf_clone_redirect()

v4:
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_map_update_elem(): Add "const" qualifier for key and value.
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_skb_store_bytes(): Clarify comment about invalidated verifier
  checks.
- bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
  about bpf_csum_diff().
- bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
  note about bpf_csum_diff().
- bpf_tail_call(): Bring minor edits to description.
- bpf_clone_redirect(): Add a note about the relation with
  bpf_redirect(). Also clarify comment about invalidated verifier
  checks.

v3:
- bpf_map_lookup_elem(): Fix description of restrictions for flags
  related to the existence of the entry.
- bpf_trace_printk(): State that trace_pipe can be configured. Fix
  return value in case an unknown format specifier is met. Add a note on
  kernel log notice when the helper is used. Edit example.
- bpf_tail_call(): Improve comment on stack inheritance.
- bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.

Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ad4a5223

bpf: add script and prepare bpf.h for new helpers documentation · 56a092c8

由 Quentin Monnet 提交于 4月 25, 2018

Remove previous "overview" of eBPF helpers from user bpf.h header.
Replace it by a comment explaining how to process the new documentation
(to come in following patches) with a Python script to produce RST, then
man page documentation.

Also add the aforementioned Python script under scripts/. It is used to
process include/uapi/linux/bpf.h and to extract helper descriptions, to
turn it into a RST document that can further be processed with rst2man
to produce a man page. The script takes one "--filename <path/to/file>"
option. If the script is launched from scripts/ in the kernel root
directory, it should be able to find the location of the header to
parse, and "--filename <path/to/file>" is then optional. If it cannot
find the file, then the option becomes mandatory. RST-formatted
documentation is printed to standard output.

Typical workflow for producing the final man page would be:

    $ ./scripts/bpf_helpers_doc.py \
            --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
    $ man /tmp/bpf-helpers.7

Note that the tool kernel-doc cannot be used to document eBPF helpers,
whose signatures are not available directly in the header files
(pre-processor directives are used to produce them at the beginning of
the compilation process).

v4:
- Also remove overviews for newly added bpf_xdp_adjust_tail() and
  bpf_skb_get_xfrm_state().
- Remove vague statement about what helpers are restricted to GPL
  programs in "LICENSE" section for man page footer.
- Replace license boilerplate with SPDX tag for Python script.

v3:
- Change license for man page.
- Remove "for safety reasons" from man page header text.
- Change "packets metadata" to "packets" in man page header text.
- Move and fix comment on helpers introducing no overhead.
- Remove "NOTES" section from man page footer.
- Add "LICENSE" section to man page footer.
- Edit description of file include/uapi/linux/bpf.h in man page footer.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

56a092c8

Merge branch 'bpf-tunnel-metadata-selftests' · 3f13de6d

由 Daniel Borkmann 提交于 4月 27, 2018

William Tu says:

====================
The patch series provide end-to-end eBPF tunnel testsute.  A common topology
is created below for all types of tunnels:

Topology:
---------
     root namespace   |     at_ns0 namespace
                      |
      -----------     |     -----------
      | tnl dev |     |     | tnl dev |  (overlay network)
      -----------     |     -----------
      metadata-mode   |     native-mode
       with bpf       |
                      |
      ----------      |     ----------
      |  veth1  | --------- |  veth0  |  (underlay network)
      ----------    peer    ----------

Device Configuration
--------------------
 Root namespace with metadata-mode tunnel + BPF
 Device names and addresses:
       veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
       tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)

 Namespace at_ns0 with native tunnel
 Device names and addresses:
       veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
       tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)

End-to-end ping packet flow
---------------------------
 Most of the tests start by namespace creation, device configuration,
 then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'
 from root namespace, the following operations happen:
 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
    with remote_ip=172.16.1.200 and others.
 3) Outer tunnel header is prepended and route the packet to veth1's egress
 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
 6) Forward the packet to the overlay tnl dev

Test Cases
-----------------------------
 Tunnel Type |  BPF Programs
-----------------------------
 GRE:          gre_set_tunnel, gre_get_tunnel
 IP6GRE:       ip6gretap_set_tunnel, ip6gretap_get_tunnel
 ERSPAN:       erspan_set_tunnel, erspan_get_tunnel
 IP6ERSPAN:    ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
 VXLAN:        vxlan_set_tunnel, vxlan_get_tunnel
 IP6VXLAN:     ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
 GENEVE:       geneve_set_tunnel, geneve_get_tunnel
 IP6GENEVE:    ip6geneve_set_tunnel, ip6geneve_get_tunnel
 IPIP:         ipip_set_tunnel, ipip_get_tunnel
 IP6IP:        ipip6_set_tunnel, ipip6_get_tunnel,
               ip6ip6_set_tunnel, ip6ip6_get_tunnel
 XFRM:         xfrm_get_state
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

3f13de6d

samples/bpf: remove the bpf tunnel testsuite. · b05cd740

由 William Tu 提交于 4月 26, 2018

Move the testsuite to
selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

b05cd740

selftests/bpf: bpf tunnel test. · 933a741e

由 William Tu 提交于 4月 26, 2018

The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
changes from the original:
    1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
    2) simplify the original ipip tests (remove iperf tests)
    3) improve documentation
    4) use bpf_ntoh* and bpf_hton* api

In summary, 'test_tunnel_kern.o' contains the following bpf program:
  GRE: gre_set_tunnel, gre_get_tunnel
  IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
  ERSPAN: erspan_set_tunnel, erspan_get_tunnel
  IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
  VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
  IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
  GENEVE: geneve_set_tunnel, geneve_get_tunnel
  IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
  IPIP: ipip_set_tunnel, ipip_get_tunnel
  IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
         ip6ip6_set_tunnel, ip6ip6_get_tunnel
  XFRM: xfrm_get_state
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

933a741e

bpf: fix xdp_generic for bpf_adjust_tail usecase · f7613120

由 Nikita V. Shirokov 提交于 4月 25, 2018

When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
pointer, so it was pointing to the new "end of the packet". However skb's
len field wasn't properly modified, so on the wire ethernet frame had
original (or even bigger, if adjust_head was used) size. This diff is
fixing this.

Fixes: 198d83bb (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail")
Signed-off-by: NNikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f7613120

tools, bpftool: Display license GPL compatible in prog show/list · 9b984a20

由 Jiri Olsa 提交于 4月 26, 2018

Display the license "gpl" string in bpftool prog command, like:

  # bpftool prog list
  5: tracepoint  name func  tag 57cd311f2e27366b  gpl
          loaded_at Apr 26/09:37  uid 0
          xlated 16B  not jited  memlock 4096B

  # bpftool --json --pretty prog show
  [{
          "id": 5,
          "type": "tracepoint",
          "name": "func",
          "tag": "57cd311f2e27366b",
          "gpl_compatible": true,
          "loaded_at": "Apr 26/09:37",
          "uid": 0,
          "bytes_xlated": 16,
          "jited": false,
          "bytes_memlock": 4096
      }
  ]
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9b984a20

tools, bpf: Sync bpf.h uapi header · fb6ef42b

由 Jiri Olsa 提交于 4月 25, 2018

Syncing the bpf.h uapi header with tools.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

fb6ef42b

bpf: Add gpl_compatible flag to struct bpf_prog_info · b85fab0e

由 Jiri Olsa 提交于 4月 25, 2018

Adding gpl_compatible flag to struct bpf_prog_info
so it can be dumped via bpf_prog_get_info_by_fd and
displayed via bpftool progs dump.

Alexei noticed 4-byte hole in struct bpf_prog_info,
so we put the u32 flags field in there, and we can
keep adding bit fields in there without breaking
user space.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

b85fab0e

Merge branch 'udp-gso' · cb586c63

由 David S. Miller 提交于 4月 26, 2018

Willem de Bruijn says:

====================
udp gso

Segmentation offload reduces cycles/byte for large packets by
amortizing the cost of protocol stack traversal.

This patchset implements GSO for UDP. A process can concatenate and
submit multiple datagrams to the same destination in one send call
by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
or passing an analogous cmsg at send time.

The stack will send the entire large (up to network layer max size)
datagram through the protocol layer. At the GSO layer, it is broken
up in individual segments. All receive the same network layer header
and UDP src and dst port. All but the last segment have the same UDP
header, but the last may differ in length and checksum.

Initial results show a significant reduction in UDP cycles/byte.
See the main patch for more details and benchmark results.

        udp
          876 MB/s 14873 msg/s 624666 calls/s
            11,205,777,429      cycles

        udp gso
         2139 MB/s 36282 msg/s 36282 calls/s
            11,204,374,561      cycles

The patch set is broken down as follows:
- patch 1 is a prerequisite: code rearrangement, noop otherwise
- patch 2 implements the gso logic
- patch 3 adds protocol stack support for UDP_SEGMENT
- patch 4,5,7 are refinements
- patch 6 adds the cmsg interface
- patch 8..11 are tests

This idea was presented previously at netconf 2017-2
http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

Changes v1 -> v2
  - Convert __udp_gso_segment to modify headers after skb_segment
  - Split main patch into two, one for gso logic, one for UDP_SEGMENT

Changes RFC -> v1
  - MSG_MORE:
      fixed, by allowing checksum offload with corking if gso
  - SKB_GSO_UDP_L4:
      made independent from SKB_GSO_UDP
      and removed skb_is_ufo() wrapper
  - NETIF_F_GSO_UDP_L4:
      add to netdev_features_string
      and to netdev-features.txt
      add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value
  - UDP_MAX_SEGMENTS:
      introduce limit on number of segments per gso skb
      to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU
  - CHECKSUM_PARTIAL:
      test against missing feature after ndo_features_check
      if not supported return error, analogous to udp_send_check
  - MSG_ZEROCOPY: removed, deferred for now
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb586c63

selftests: udp gso benchmark · 3a687bef

由 Willem de Bruijn 提交于 4月 26, 2018

Send udp data between a source and sink, optionally with udp gso.
The two processes are expected to be run on separate hosts.

A script is included that runs them together over loopback in a
single namespace for functionality testing.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a687bef

selftests: udp gso with corking · 3f12817f

由 Willem de Bruijn 提交于 4月 26, 2018

Corked sockets take a different path to construct a udp datagram than
the lockless fast path. Test this alternate path.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f12817f

selftests: udp gso with connected sockets · e5b2d91c

由 Willem de Bruijn 提交于 4月 26, 2018

Connected sockets use path mtu instead of device mtu.

Test this path by inserting a route mtu that is lower than the device
mtu. Verify that the path mtu for the connection matches this lower
number, then run the same test as in the connectionless case.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5b2d91c

selftests: udp gso · a1607257

由 Willem de Bruijn 提交于 4月 26, 2018

Validate udp gso, including edge cases (such as min/max gso sizes).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1607257

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功