提交 · 1a6ac1d59dc3b4077c643c3be70f9e650e267afe · openanolis / cloud-kernel

15 5月, 2018 22 次提交

bpf, doc: convert bpf_design_QA.rst to use RST formatting · 1a6ac1d5

由 Jesper Dangaard Brouer 提交于 5月 14, 2018

The RST formatting is done such that that when rendered or converted
to different formats, an automatic index with links are created to the
subsections.

Thus, the questions are created as sections (or subsections), in-order
to get the wanted auto-generated FAQ/QA index.

Special thanks to Quentin Monnet <quentin.monnet@netronome.com> who
have reviewed and corrected both RST formatting and GitHub rendering
issues in this file.  Those commits have been squashed.

I've manually tested that this also renders nicely if included as part
of the kernel 'make htmldocs'.  As the end-goal is for this to become
more integrated with kernel-doc project/movement.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1a6ac1d5

bpf, doc: rename txt files to rst files · 192092fa

由 Jesper Dangaard Brouer 提交于 5月 14, 2018

This will cause them to get auto rendered, e.g. when viewing them on GitHub.
Followup patches will correct the content to be RST compliant.

Also adjust README.rst to point to the renamed files.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

192092fa

bpf, doc: add basic README.rst file · 4712c1b2

由 Jesper Dangaard Brouer 提交于 5月 14, 2018

A README.rst file in a directory have special meaning for sites like
github, which auto renders the contents. Plus search engines like
Google also index these README.rst files.

Auto rendering allow us to use links, for (re)directing eBPF users to
other places where docs live. The end-goal would be to direct users
towards https://www.kernel.org/doc/html/latest but we haven't written
the full docs yet, so we start out small and take this incrementally.

This directory itself contains some useful docs, which can be linked
to from the README.rst file (verified this works for github).
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

4712c1b2

Merge branch 'fix-samples' · 1d827879

由 Alexei Starovoitov 提交于 5月 14, 2018

Jakub Kicinski says:

====================
Following patches address build issues after recent move to libbpf.
For out-of-tree builds we would see the following error:

gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or directory

libbpf build system is now always invoked explicitly rather than
relying on building single objects most of the time.  We need to
resolve the friction between Kbuild and tools/ build system.

Mini-library called libbpf.h in samples is renamed to bpf_insn.h,
using linux/filter.h seems not completely trivial since some samples
get upset when order on include search path in changed.  We do have
to rename libbpf.h, however, because otherwise it's hard to reliably
get to libbpf's header in out-of-tree builds.

v2:
 - fix the build error harder (patch 3);
 - add patch 5 (make clang less noisy).
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1d827879

samples: bpf: make the build less noisy · 768759ed

由 Jakub Kicinski 提交于 5月 14, 2018

Building samples with clang ignores the $(Q) setting, always
printing full command to the output.  Make it less verbose.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

768759ed

samples: bpf: move libbpf from object dependencies to libs · 0cc54db1

由 Jakub Kicinski 提交于 5月 14, 2018

Make complains that it doesn't know how to make libbpf.a:

scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern

Now that we have it as a dependency of the sources simply add libbpf.a
to libraries not objects.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

0cc54db1

samples: bpf: fix build after move to compiling full libbpf.a · 787360f8

由 Jakub Kicinski 提交于 5月 14, 2018

There are many ways users may compile samples, some of them got
broken by commit 5f938057 ("samples: bpf: compile and link
against full libbpf").  Improve path resolution and make libbpf
building a dependency of source files to force its build.

Samples should now again build with any of:
 cd samples/bpf; make
 make samples/bpf/
 make -C samples/bpf
 cd samples/bpf; make O=builddir
 make samples/bpf/ O=builddir
 make -C samples/bpf O=builddir
 export KBUILD_OUTPUT=builddir
 make samples/bpf/
 make -C samples/bpf

Fixes: 5f938057 ("samples: bpf: compile and link against full libbpf")
Reported-by: NBjörn Töpel <bjorn.topel@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

787360f8

samples: bpf: rename libbpf.h to bpf_insn.h · 8d930450

由 Jakub Kicinski 提交于 5月 14, 2018

The libbpf.h file in samples is clashing with libbpf's header.
Since it only includes a subset of filter.h instruction helpers
rename it to bpf_insn.h.  Drop the unnecessary include of bpf/bpf.h.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

8d930450

samples: bpf: include bpf/bpf.h instead of local libbpf.h · 2bf3e2ef

由 Jakub Kicinski 提交于 5月 14, 2018

There are two files in the tree called libbpf.h which is becoming
problematic.  Most samples don't actually need the local libbpf.h
they simply include it to get to bpf/bpf.h.  Include bpf/bpf.h
directly instead.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2bf3e2ef

Merge branch 'bpf-jit-cleanups' · fb40c9dd

由 Alexei Starovoitov 提交于 5月 14, 2018

Daniel Borkmann says:

====================
This series follows up mostly with with some minor cleanups on top
of 'Move ld_abs/ld_ind to native BPF' as well as implements better
32/64 bit immediate load into register and saves tail call init on
cBPF for the arm64 JIT. Last but not least we add a couple of test
cases. For details please see individual patches. Thanks!

v1 -> v2:
  - Minor fix in i64_i16_blocks() to remove 24 shift.
  - Added last two patches.
  - Added Acks from prior round.
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fb40c9dd

bpf: add ld64 imm test cases · a82d8cd3

由 Daniel Borkmann 提交于 5月 14, 2018

Add test cases where we combine semi-random imm values, mainly for testing
JITs when they have different encoding options for 64 bit immediates in
order to reduce resulting image size.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

a82d8cd3

bpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf · 56ea6a8b

由 Daniel Borkmann 提交于 5月 14, 2018

We can trivially save 4 bytes in prologue for cBPF since tail calls
can never be used from there. The register push/pop is pairwise,
here, x25 (fp) and x26 (tcc), so no point in changing that, only
reset to zero is not needed.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

56ea6a8b

bpf, arm64: optimize 32/64 immediate emission · 6d2eea6f

由 Daniel Borkmann 提交于 5月 14, 2018

Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.

For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: ffff, shift: 16, result: 00000000ffffabab
  * movk: ffff, shift: 32, result: 0000ffffffffabab
  * movk: ffff, shift: 48, result: ffffffffffffabab

Whereas after the patch the same load only needs a single
instruction:

  * movn: 5454, shift:  0, result: ffffffffffffabab

Another example where two extra instructions can be saved:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: 1f2f, shift: 16, result: 000000001f2fabab
  * movk: ffff, shift: 32, result: 0000ffff1f2fabab
  * movk: ffff, shift: 48, result: ffffffff1f2fabab

After the patch:

  * movn: e0d0, shift: 16, result: ffffffff1f2fffff
  * movk: abab, shift:  0, result: ffffffff1f2fabab

Another example with movz, before:

  * movz: 0000, shift:  0, result: 0000000000000000
  * movk: fea0, shift: 32, result: 0000fea000000000

After:

  * movz: fea0, shift: 32, result: 0000fea000000000

Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1 ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

6d2eea6f

bpf, arm64: save 4 bytes of unneeded stack space · 09ece3d0

由 Daniel Borkmann 提交于 5月 14, 2018

Follow-up to 816d9ef3 ("bpf, arm64: remove ld_abs/ld_ind") in
that the extra 4 byte JIT scratchpad is not needed anymore since it
was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

09ece3d0

bpf, arm32: save 4 bytes of unneeded stack space · 38ca9306

由 Daniel Borkmann 提交于 5月 14, 2018

The extra skb_copy_bits() buffer is not used anymore, therefore
remove the extra 4 byte stack space requirement.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

38ca9306

bpf, x64: clean up retpoline emission slightly · 36256009

由 Daniel Borkmann 提交于 5月 14, 2018

Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by
cleaning up the macro, aligning comments and spacing.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

36256009

bpf, sparc: remove unused variable · 631b1e3b

由 Daniel Borkmann 提交于 5月 14, 2018

Since fe83963b ("bpf, sparc64: remove ld_abs/ld_ind") it's not
used anymore therefore remove it.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

631b1e3b

bpf, mips: remove unused function · 0631b658

由 Daniel Borkmann 提交于 5月 14, 2018

The ool_skb_header_pointer() and size_to_len() is unused same as
tmp_offset, therefore remove all of them.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

0631b658

samples/bpf: xdp_monitor, accept short options · 53ea24c2

由 Prashant Bhole 提交于 5月 14, 2018

Updated optstring parameter for getopt_long() to accept short options.
Also updated usage() function.
Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

53ea24c2

Merge branch 'bpf-stackmap-nmi' · 1772be37

由 Daniel Borkmann 提交于 5月 14, 2018

Song Liu says:
====================
Changes v2 -> v3:
  Improve syntax based on suggestion by Tobin C. Harding.

Changes v1 -> v2:
  1. Rename some variables to (hopefully) reduce confusion;
  2. Check irq_work status with IRQ_WORK_BUSY (instead of work->sem);
  3. In Kconfig, let BPF_SYSCALL select IRQ_WORK;
  4. Add static to DEFINE_PER_CPU();
   5. Remove pr_info() in stack_map_init().
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1772be37

bpf: add selftest for stackmap with build_id in NMI context · 13790d1c

由 Song Liu 提交于 5月 07, 2018

This new test captures stackmap with build_id with hardware event
PERF_COUNT_HW_CPU_CYCLES.

Because we only support one ips-to-build_id lookup per cpu in NMI
context, stack_amap will not be able to do the lookup in this test.
Therefore, we didn't do compare_stack_ips(), as it will alwasy fail.

urandom_read.c is extended to run configurable cycles so that it can be
caught by the perf event.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

13790d1c

bpf: enable stackmap with build_id in nmi context · bae77c5e

由 Song Liu 提交于 5月 07, 2018

Currently, we cannot parse build_id in nmi context because of
up_read(&current->mm->mmap_sem), this makes stackmap with build_id
less useful. This patch enables parsing build_id in nmi by putting
the up_read() call in irq_work. To avoid memory allocation in nmi
context, we use per cpu variable for the irq_work. As a result, only
one irq_work per cpu is allowed. If the irq_work is in-use, we
fallback to only report ips.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

bae77c5e

11 5月, 2018 18 次提交

Merge branch 'bpf-perf-rb-libbpf' · a84880ef

由 Daniel Borkmann 提交于 5月 11, 2018

Jakub Kicinski says:

====================
This series started out as a follow up to the bpftool perf event dumping
patches.

As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
code and improve accuracy of timestamps.

Remaining patches are trying to move perf event loop into libbpf as
suggested by Alexei.  One user for this new function is bpftool which
links with libbpf nicely, the other, unfortunately, is in samples/bpf.
Remaining patches make samples/bpf link against full libbpf.a (not just
a handful of objects).  Once we have full power of libbpf at our disposal
we can convert some of XDP samples to use libbpf loader instead of
bpf_load.c.  My understanding is that this is the desired direction,
at least for networking code.
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a84880ef

samples: bpf: convert some XDP samples from bpf_load to libbpf · be5bca44

由 Jakub Kicinski 提交于 5月 10, 2018

Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers.  This way newcomers will be exposed to the
recommended library from the start.

Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

be5bca44

tools: bpf: don't complain about no kernel version for networking code · 17387dd5

由 Jakub Kicinski 提交于 5月 10, 2018

BPF programs only have to specify the target kernel version for
tracing related hooks, in networking world that requirement does
not really apply. Loosen the checks in libbpf to reflect that.

bpf_object__open() users will continue to see the error for backward
compatibility (and because prog_type is not available there).

Error code for NULL file name is changed from ENOENT to EINVAL,
as it seems more appropriate, hopefully, that's an OK change.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

17387dd5

tools: bpf: improve comments in libbpf.h · 2eb57bb8

由 Jakub Kicinski 提交于 5月 10, 2018

Fix spelling mistakes, improve and clarify the language of comments
in libbpf.h.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2eb57bb8

tools: bpf: move the event reading loop to libbpf · d0cabbb0

由 Jakub Kicinski 提交于 5月 10, 2018

There are two copies of event reading loop - in bpftool and
trace_helpers "library".  Consolidate them and move the code
to libbpf.  Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.
Suggested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

d0cabbb0

samples: bpf: compile and link against full libbpf · 5f938057

由 Jakub Kicinski 提交于 5月 10, 2018

samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against.  Just compile the full library and link statically
against it.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5f938057

samples: bpf: rename struct bpf_map_def to avoid conflict with libbpf · 74662ea5

由 Jakub Kicinski 提交于 5月 10, 2018

Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def. The version in bpf_load.h has
more fields. libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields. Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

74662ea5

tools: bpftool: use PERF_SAMPLE_TIME instead of reading the clock · e3687510

由 Jakub Kicinski 提交于 5月 10, 2018

Ask the kernel to include sample time in each even instead of
reading the clock.  This is also more accurate because our
clock reading was done when user space would dump the buffer,
not when sample was produced.
Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

e3687510

bpf: sync tools bpf.h uapi header · cb9c28ef

由 Prashant Bhole 提交于 5月 09, 2018

Sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure.
Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

cb9c28ef

selftests/bpf: Fix bash reference in Makefile · 91bc07c9

由 Joe Stringer 提交于 5月 10, 2018

'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.

Fixes the following warning during compilation:

	/bin/sh: 1: Syntax error: "&" unexpected
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

91bc07c9

Merge branch 'bpf-fib-lookup-helper' · ff1f56d9

由 Daniel Borkmann 提交于 5月 11, 2018

David Ahern says:

====================
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet is expected to continue up the stack
for full processing.

The response from a FIB and neighbor lookup is either the egress index
with the bpf_fib_lookup struct filled in with dmac and gateway or
0 meaning the packet should continue up the stack. In time we can
revisit this to return the FIB lookup result errno if it is one of the
special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP
programs can do an early drop if desired.

Patches 1-6 do some more refactoring to IPv6 with the end goal of
extracting a FIB lookup function that aligns with fib_lookup for IPv4,
basically returning a fib6_info without creating a dst based entry.

Patch 7 adds lookup functions to the ipv6 stub. These are needed since
bpf is built into the kernel and ipv6 may not be built or loaded.

Patch 8 adds the bpf helper and 9 adds a sample program.

v3
- remove ETH_ALEN and in6_addr from uapi header

v2
- removed pkt_access from bpf_func_proto as noticed by Daniel
- added check in that IPv6 forwarding is enabled
- added DaveM's ack on patches 1-7 and 9 based on v1 response and
  fact that no changes were made to them in v2

v1
- updated commit messages and cover letter
- added comment to sample program noting lack of verification on
  egress device supporting XDP

RFC v2
- fixed use of foward helper from cls_act as noted by Daniel
- in patch 1 rename fib6_lookup_1 as well for consistency
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ff1f56d9

samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP · fe616055

由 David Ahern 提交于 5月 09, 2018

Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.

Take this only as a simple example of fast-path forwarding.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

fe616055

bpf: Provide helper to do forwarding lookups in kernel FIB table · 87f5fc7e

由 David Ahern 提交于 5月 09, 2018

Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.

If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.

On successful lookup the nexthop dmac, current device smac and egress
device index are returned.

The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.

Header rewrite is left to the XDP program.

The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
  straight to the table associated with the device (expert setting for
  those looking to maximize throughput)

- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
  Default is an ingress lookup.

Initial performance numbers collected by Jesper, forwarded packets/sec:

       Full stack    XDP FIB lookup    XDP Direct lookup
IPv4   1,947,969       7,074,156          7,415,333
IPv6   1,728,000       6,165,504          7,262,720

These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

87f5fc7e

net/ipv6: Add fib lookup stubs for use in bpf helper · 65a2022e

由 David Ahern 提交于 5月 09, 2018

Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

65a2022e

net/ipv6: Update fib6 tracepoint to take fib6_info · d4bea421

由 David Ahern 提交于 5月 09, 2018

Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

d4bea421

net/ipv6: Add fib6_lookup · 138118ec

由 David Ahern 提交于 5月 09, 2018

Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).

Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.

Caller must hold rcu lock as no reference is taken on the returned
fib entry.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

138118ec

net/ipv6: Refactor fib6_rule_action · cc065a9e

由 David Ahern 提交于 5月 09, 2018

Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

cc065a9e

net/ipv6: Extract table lookup from ip6_pol_route · 1d053da9

由 David Ahern 提交于 5月 09, 2018

ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.

ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1d053da9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功