提交 · 65a2022e89a4760f9702837e2d9d15a39a9c68a3 · openanolis / cloud-kernel

11 5月, 2018 7 次提交

net/ipv6: Add fib lookup stubs for use in bpf helper · 65a2022e

由 David Ahern 提交于 5月 09, 2018

Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

65a2022e

net/ipv6: Update fib6 tracepoint to take fib6_info · d4bea421

由 David Ahern 提交于 5月 09, 2018

Similar to IPv4, IPv6 should use the FIB lookup result in the
tracepoint.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

d4bea421

net/ipv6: Add fib6_lookup · 138118ec

由 David Ahern 提交于 5月 09, 2018

Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules,
but returns a FIB entry, fib6_info, rather than a dst based rt6_info.
fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled)
to 60% faster than any of the dst based lookup methods (without custom
rules) and 25% faster with custom rules (e.g., l3mdev rule).

Since the lookup function has a completely different signature,
fib6_rule_action is split into 2 paths: the existing one is
renamed __fib6_rule_action and a new one for the fib6_info path
is added. fib6_rule_action decides which to call based on the
lookup_ptr. If it is fib6_table_lookup then the new path is taken.

Caller must hold rcu lock as no reference is taken on the returned
fib entry.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

138118ec

net/ipv6: Refactor fib6_rule_action · cc065a9e

由 David Ahern 提交于 5月 09, 2018

Move source address lookup from fib6_rule_action to a helper. It will be
used in a later patch by a second variant for fib6_rule_action.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

cc065a9e

net/ipv6: Extract table lookup from ip6_pol_route · 1d053da9

由 David Ahern 提交于 5月 09, 2018

ip6_pol_route is used for ingress and egress FIB lookups. Refactor it
moving the table lookup into a separate fib6_table_lookup that can be
invoked separately and export the new function.

ip6_pol_route now calls fib6_table_lookup and uses the result to generate
a dst based rt6_info.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1d053da9

net/ipv6: Rename rt6_multipath_select · 3b290a31

由 David Ahern 提交于 5月 09, 2018

Rename rt6_multipath_select to fib6_multipath_select and export it.
A later patch wants access to it similar to IPv4's fib_select_path.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

3b290a31

net/ipv6: Rename fib6_lookup to fib6_node_lookup · 6454743b

由 David Ahern 提交于 5月 09, 2018

Rename fib6_lookup to fib6_node_lookup to better reflect what it
returns. The fib6_lookup name will be used in a later patch for
an IPv6 equivalent to IPv4's fib_lookup.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6454743b

10 5月, 2018 2 次提交

xsk: fix 64-bit division · ea7e3435

由 Björn Töpel 提交于 5月 07, 2018

i386 builds report:
  net/xdp/xdp_umem.o: In function `xdp_umem_reg':
  xdp_umem.c:(.text+0x47e): undefined reference to `__udivdi3'

This fix uses div_u64 instead of the GCC built-in.

Fixes: c0c77d8f ("xsk: add user memory registration support sockopt")
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Tested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ea7e3435

bpf: xdp: allow offloads to store into rx_queue_index · 0d830032

由 Jakub Kicinski 提交于 5月 08, 2018

It's fairly easy for offloaded XDP programs to select the RX queue
packets go to.  We need a way of expressing this in the software.
Allow write to the rx_queue_index field of struct xdp_md for
device-bound programs.

Skip convert_ctx_access callback entirely for offloads.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

0d830032

08 5月, 2018 6 次提交

flow_dissector: do not rely on implicit casts · d869dea6

由 Paolo Abeni 提交于 5月 07, 2018

This change fixes a couple of type mismatch reported by the sparse
tool, explicitly using the requested type for the offending arguments.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d869dea6

net: core: rework basic flow dissection helper · 72a338bc

由 Paolo Abeni 提交于 5月 04, 2018

When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.

The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.

v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
  as per DaveM suggestion
Suggested-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72a338bc

net: ipv6/gre: Add GRO support · 0c1dd2a1

由 Eran Ben Elisha 提交于 5月 07, 2018

Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells
infrastructure.

Performance testing: 55% higher badwidth.
Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel
while GRO on the physical interface is disabled.
CPU: Intel Xeon E312xx (Sandy Bridge)
NIC: Mellanox Technologies MT27700 Family [ConnectX-4]
Before (GRO not working in tunnel) : 2.47 Gbits/sec
After  (GRO working in tunnel)     : 3.85 Gbits/sec
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c1dd2a1

net: ipv6: Fix typo in ipv6_find_hdr() documentation · 6f2f8212

由 Tariq Toukan 提交于 5月 07, 2018

Fix 'an' into 'and', and use a comma instead of a period.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f2f8212

net/9p: correct the variable name in v9fs_get_trans_by_name() comment · 3a443bd6

由 Sun Lianwen 提交于 5月 05, 2018

The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".
Signed-off-by: NSun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a443bd6

vlan: correct the file path in vlan_dev_change_flags() comment · 84a2fba2

由 Sun Lianwen 提交于 5月 05, 2018

The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.
Signed-off-by: NSun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84a2fba2

07 5月, 2018 7 次提交

netfilter: nft_dynset: fix timeout updates on 32bit · b13468dc

由 Florian Westphal 提交于 4月 27, 2018

This must now use a 64bit jiffies value, else we set
a bogus timeout on 32bit.

Fixes: 8e1102d5 ("netfilter: nf_tables: support timeouts larger than 23 days")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b13468dc

netfilter: ctnetlink: export nf_conntrack_max · 538c5672

由 Florent Fourcot 提交于 5月 06, 2018

IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.

This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.
Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

538c5672

netfilter: extract Passive OS fingerprint infrastructure from xt_osf · bfb15f2a

由 Fernando Fernandez Mancera 提交于 5月 03, 2018

Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.
Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bfb15f2a

F
netfilter: nf_nat: remove unused ct arg from lookup functions · 3a2e86f6
由 Florian Westphal 提交于 4月 26, 2018
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
3a2e86f6

netfilter: ip6t_srh: extend SRH matching for previous, next and last SID · c1c7e44b

由 Ahmed Abdelsalam 提交于 4月 25, 2018

IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.

When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.

An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.

This patch extends SRH match to support matching previous SID, next SID,
and last SID.
Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c1c7e44b

netfilter: nft_numgen: enable hashing of one element · 75e72f05

由 Laura Garcia Liebana 提交于 4月 23, 2018

The modulus in the hash function was limited to > 1 as initially
there was no sense to create a hashing of just one element.

Nevertheless, there are certain cases specially for load balancing
where this case needs to be addressed.

This patch fixes the following error.

Error: Could not process rule: Numerical result out of range
add rule ip nftlb lb01 dnat to jhash ip saddr mod 1 map { 0: 192.168.0.10 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The solution comes to force the hash to 0 when the modulus is 1.
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>

75e72f05

netfilter: nft_numgen: add map lookups for numgen statements · d734a288

由 Laura Garcia Liebana 提交于 4月 22, 2018

This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.

For this purpose, different ops have been included to extend the
current numgen inc functions.

Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d734a288

05 5月, 2018 1 次提交

net/ipv6: rename rt6_next to fib6_next · 8fb11a9a

由 David Ahern 提交于 5月 04, 2018

This slipped through the cracks in the followup set to the fib6_info flip.
Rename rt6_next to fib6_next.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb11a9a

04 5月, 2018 17 次提交

smc: add support for splice() · 9014db20

由 Stefan Raspl 提交于 5月 03, 2018

Provide an implementation for splice() when we are using SMC. See
smc_splice_read() for further details.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9014db20

smc: allocate RMBs as compound pages · 2ef4f27a

由 Stefan Raspl 提交于 5月 03, 2018

Preparatory work for splice() support.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ef4f27a

smc: make smc_rx_wait_data() generic · b51fa1b1

由 Stefan Raspl 提交于 5月 03, 2018

Turn smc_rx_wait_data into a generic function that can be used at various
instances to wait on traffic to complete with varying criteria.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b51fa1b1

smc: simplify abort logic · c8b8ec8e

由 Stefan Raspl 提交于 5月 03, 2018

Some of the conditions to exit recv() are common in two pathes - cleaning up
code by moving the check up so we have it only once.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8b8ec8e

xfrm: use a dedicated slab cache for struct xfrm_state · 565f0fa9

由 Mathias Krause 提交于 5月 03, 2018

struct xfrm_state is rather large (768 bytes here) and therefore wastes
quite a lot of memory as it falls into the kmalloc-1024 slab cache,
leaving 256 bytes of unused memory per XFRM state object -- a net waste
of 25%.

Using a dedicated slab cache for struct xfrm_state reduces the level of
internal fragmentation to a minimum.

On my configuration SLUB chooses to create a slab cache covering 4
pages holding 21 objects, resulting in an average memory waste of ~13
bytes per object -- a net waste of only 1.6%.

In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
states.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

565f0fa9

bpf: add skb_load_bytes_relative helper · 4e1ec56c

由 Daniel Borkmann 提交于 5月 04, 2018

This adds a small BPF helper similar to bpf_skb_load_bytes() that
is able to load relative to mac/net header offset from the skb's
linear data. Compared to bpf_skb_load_bytes(), it takes a fifth
argument namely start_header, which is either BPF_HDR_START_MAC
or BPF_HDR_START_NET. This allows for a more flexible alternative
compared to LD_ABS/LD_IND with negative offset. It's enabled for
tc BPF programs as well as sock filter program types where it's
mainly useful in reuseport programs to ease access to lower header
data.

Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

4e1ec56c

bpf: implement ld_abs/ld_ind in native bpf · e0cea7ce

由 Daniel Borkmann 提交于 5月 04, 2018

The main part of this work is to finally allow removal of LD_ABS
and LD_IND from the BPF core by reimplementing them through native
eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
keeping them around in native eBPF caused way more trouble than
actually worth it. To just list some of the security issues in
the past:

  * fdfaf64e ("x86: bpf_jit: support negative offsets")
  * 35607b02 ("sparc: bpf_jit: fix loads from negative offsets")
  * e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
  * 07aee943 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
  * 6d59b7db ("bpf, s390x: do not reload skb pointers in non-skb context")
  * 87338c8e ("bpf, ppc64: do not reload skb pointers in non-skb context")

For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
these days due to their limitations and more efficient/flexible
alternatives that have been developed over time such as direct
packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
register, the load happens in host endianness and its exception
handling can yield unexpected behavior. The latter is explained
in depth in f6b1b3bf ("bpf: fix subprog verifier bypass by
div/mod by 0 exception") with similar cases of exceptions we had.
In native eBPF more recent program types will disable LD_ABS/LD_IND
altogether through may_access_skb() in verifier, and given the
limitations in terms of exception handling, it's also disabled
in programs that use BPF to BPF calls.

In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
to access packet data. It is not used in seccomp-BPF but programs
that use it for socket filtering or reuseport for demuxing with
cBPF. This is mostly relevant for applications that have not yet
migrated to native eBPF.

The main complexity and source of bugs in LD_ABS/LD_IND is coming
from their implementation in the various JITs. Most of them keep
the model around from cBPF times by implementing a fastpath written
in asm. They use typically two from the BPF program hidden CPU
registers for caching the skb's headlen (skb->len - skb->data_len)
and skb->data. Throughout the JIT phase this requires to keep track
whether LD_ABS/LD_IND are used and if so, the two registers need
to be recached each time a BPF helper would change the underlying
packet data in native eBPF case. At least in eBPF case, available
CPU registers are rare and the additional exit path out of the
asm written JIT helper makes it also inflexible since not all
parts of the JITer are in control from plain C. A LD_ABS/LD_IND
implementation in eBPF therefore allows to significantly reduce
the complexity in JITs with comparable performance results for
them, e.g.:

test_bpf             tcpdump port 22             tcpdump complex
x64      - before    15 21 10                    14 19  18
         - after      7 10 10                     7 10  15
arm64    - before    40 91 92                    40 91 151
         - after     51 64 73                    51 62 113

For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
and cache the skb's headlen and data in the cBPF prologue. The
BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
used as a local temporary variable. This allows to shrink the
image on x86_64 also for seccomp programs slightly since mapping
to %rsi is not an ereg. In callee-saved R8 and R9 we now track
skb data and headlen, respectively. For normal prologue emission
in the JITs this does not add any extra instructions since R8, R9
are pushed to stack in any case from eBPF side. cBPF uses the
convert_bpf_ld_abs() emitter which probes the fast path inline
already and falls back to bpf_skb_load_helper_{8,16,32}() helper
relying on the cached skb data and headlen as well. R8 and R9
never need to be reloaded due to bpf_helper_changes_pkt_data()
since all skb access in cBPF is read-only. Then, for the case
of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
does neither cache skb data and headlen nor has an inlined fast
path. The reason for the latter is that native eBPF does not have
any extra registers available anyway, but even if there were, it
avoids any reload of skb data and headlen in the first place.
Additionally, for the negative offsets, we provide an alternative
bpf_skb_load_bytes_relative() helper in eBPF which operates
similarly as bpf_skb_load_bytes() and allows for more flexibility.
Tested myself on x64, arm64, s390x, from Sandipan on ppc64.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

e0cea7ce

bpf: migrate ebpf ld_abs/ld_ind tests to test_verifier · 93731ef0

由 Daniel Borkmann 提交于 5月 04, 2018

Remove all eBPF tests involving LD_ABS/LD_IND from test_bpf.ko. Reason
is that the eBPF tests from test_bpf module do not go via BPF verifier
and therefore any instruction rewrites from verifier cannot take place.

Therefore, move them into test_verifier which runs out of user space,
so that verfier can rewrite LD_ABS/LD_IND internally in upcoming patches.
It will have the same effect since runtime tests are also performed from
there. This also allows to finally unexport bpf_skb_vlan_{push,pop}_proto
and keep it internal to core kernel.

Additionally, also add further cBPF LD_ABS/LD_IND test coverage into
test_bpf.ko suite.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

93731ef0

bpf: prefix cbpf internal helpers with bpf_ · b390134c

由 Daniel Borkmann 提交于 5月 04, 2018

No change in functionality, just remove the '__' prefix and replace it
with a 'bpf_' prefix instead. We later on add a couple of more helpers
for cBPF and keeping the scheme with '__' is suboptimal there.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b390134c

xsk: statistics support · af75d9e0

由 Magnus Karlsson 提交于 5月 02, 2018

In this commit, a new getsockopt is added: XDP_STATISTICS. This is
used to obtain stats from the sockets.

v2: getsockopt now returns size of stats structure.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

af75d9e0

xsk: support for Tx · 35fcde7f

由 Magnus Karlsson 提交于 5月 02, 2018

Here, Tx support is added. The user fills the Tx queue with frames to
be sent by the kernel, and let's the kernel know using the sendmsg
syscall.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

35fcde7f

dev: packet: make packet_direct_xmit a common function · 865b03f2

由 Magnus Karlsson 提交于 5月 02, 2018

The new dev_direct_xmit will be used by AF_XDP in later commits.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

865b03f2

xsk: add Tx queue setup and mmap support · f6145903

由 Magnus Karlsson 提交于 5月 02, 2018

Another setsockopt (XDP_TX_QUEUE) is added to let the process allocate
a queue, where the user process can pass frames to be transmitted by
the kernel.

The mmapping of the queue is done using the XDP_PGOFF_TX_QUEUE offset.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

f6145903

xsk: add umem completion queue support and mmap · fe230832

由 Magnus Karlsson 提交于 5月 02, 2018

Here, we add another setsockopt for registered user memory (umem)
called XDP_UMEM_COMPLETION_QUEUE. Using this socket option, the
process can ask the kernel to allocate a queue (ring buffer) and also
mmap it (XDP_UMEM_PGOFF_COMPLETION_QUEUE) into the process.

The queue is used to explicitly pass ownership of umem frames from the
kernel to user process. This will be used by the TX path to tell user
space that a certain frame has been transmitted and user space can use
it for something else, if it wishes.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fe230832

xsk: wire up XDP_SKB side of AF_XDP · 02671e23

由 Björn Töpel 提交于 5月 02, 2018

This commit wires up the xskmap to XDP_SKB layer.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

02671e23

xsk: wire up XDP_DRV side of AF_XDP · 1b1a251c

由 Björn Töpel 提交于 5月 02, 2018

This commit wires up the xskmap to XDP_DRV layer.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1b1a251c

bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP · fbfc504a

由 Björn Töpel 提交于 5月 02, 2018

The xskmap is yet another BPF map, very much inspired by
dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
adds AF_XDP sockets into the map, and by using the bpf_redirect_map
helper, an XDP program can redirect XDP frames to an AF_XDP socket.

Note that a socket that is bound to certain ifindex/queue index will
*only* accept XDP frames from that netdev/queue index. If an XDP
program tries to redirect from a netdev/queue index other than what
the socket is bound to, the frame will not be received on the socket.

A socket can reside in multiple maps.

v3: Fixed race and simplified code.
v2: Removed one indirection in map lookup.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fbfc504a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功