提交 · 0412bd931f5f94d1054e958415c4a945d8ee62f4 · openeuler / Kernel

17 4月, 2016 1 次提交

vxlan: synchronously and race-free destruction of vxlan sockets · 0412bd93

由 Hannes Frederic Sowa 提交于 4月 08, 2016

Due to the fact that the udp socket is destructed asynchronously in a
work queue, we have some nondeterministic behavior during shutdown of
vxlan tunnels and creating new ones. Fix this by keeping the destruction
process synchronous in regards to the user space process so IFF_UP can
be reliably set.

udp_tunnel_sock_release destroys vs->sock->sk if reference counter
indicates so. We expect to have the same lifetime of vxlan_sock and
vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
only destruct the whole socket after we can be sure it cannot be found
by searching vxlan_net->sock_list.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0412bd93

12 4月, 2016 1 次提交

vxlan: fix incorrect type · 61618eea

由 Jiri Benc 提交于 4月 11, 2016

The protocol is 16bit, not 32bit.

Fixes: e1e5314d ("vxlan: implement GPE")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61618eea

08 4月, 2016 1 次提交

vxlan: change vxlan to use UDP socket GRO · 5602c48c

由 Tom Herbert 提交于 4月 05, 2016

Adapt vxlan_gro_receive, vxlan_gro_complete to take a socket argument.
Set these functions in tunnel_config.  Don't set udp_offloads any more.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5602c48c

07 4月, 2016 3 次提交

vxlan: implement GPE · e1e5314d

由 Jiri Benc 提交于 4月 05, 2016

Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is
possible to support static configuration, too, if there is demand for it).

The GPE header parsing has to be moved before iptunnel_pull_header, as we
need to know the protocol.

v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode"
    (now called "raw mode") is added by this patch. This mode does not allow
    Ethernet header to be encapsulated in VXLAN-GPE when using ip route to
    specify the encapsulation, IP header is encapsulated instead. The patch
    does support Ethernet to be encapsulated, though, using ETH_P_TEB in
    skb->protocol. This will be utilized by other COLLECT_METADATA users
    (openvswitch in particular).

    If there is ever demand for Ethernet encapsulation with VXLAN-GPE using
    ip route, it's easy to add a new flag switching the interface to
    "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now,
    leave this out, it seems we don't need it.

    Disallowed more flag combinations, especially RCO with GPE.
    Added comment explaining that GBP and GPE cannot be set together.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1e5314d

vxlan: move fdb code to common location in vxlan_xmit · 47e5d1b0

由 Jiri Benc 提交于 4月 05, 2016

Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not
make sense with the former, as it needs populated fdb which does not happen
in metadata mode.

After this cleanup, the fdb code in vxlan_xmit is moved to a common location
and can be later skipped for VXLAN-GPE which does not necessarily carry
inner Ethernet header.

v2: changed commit description to not reference L3 mode
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47e5d1b0

vxlan: move Ethernet initialization to a separate function · 0c867c9b

由 Jiri Benc 提交于 4月 05, 2016

This will allow to initialize vxlan in ARPHRD_NONE mode based on the passed
rtnl attributes.

v2: renamed "l2mode" to "ether".
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c867c9b

22 3月, 2016 1 次提交

vxlan: fix too large pskb_may_pull with remote checksum · 7d34fa75

由 Jiri Benc 提交于 3月 21, 2016

vxlan_remcsum is called after iptunnel_pull_header and thus the skb has
vxlan header already pulled. Don't include vxlan header again in the
calculation.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d34fa75

21 3月, 2016 1 次提交

vxlan: fix populating tclass in vxlan6_get_route · eaa93bf4

由 Daniel Borkmann 提交于 3月 18, 2016

Jiri mentioned that flowi6_tos of struct flowi6 is never used/read
anywhere. In fact, rest of the kernel uses the flowi6's flowlabel,
where the traffic class _and_ the flowlabel (aka flowinfo) is encoded.

For example, for policy routing, fib6_rule_match() uses ip6_tclass()
that is applied on the flowlabel member for matching on tclass. Similar
fix is needed for geneve, where flowi6_tos is set as well. Installing
a v6 blackhole rule that f.e. matches on tos is now working with vxlan.

Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class")
Reported-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eaa93bf4

14 3月, 2016 1 次提交

gro: Defer clearing of flush bit in tunnel paths · c194cf93

由 Alexander Duyck 提交于 3月 09, 2016

This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler. Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE. By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c194cf93

12 3月, 2016 2 次提交

vxlan: support setting IPv6 flow label · e7f70af1

由 Daniel Borkmann 提交于 3月 09, 2016

This work adds support for setting the IPv6 flow label for vxlan per
device and through collect metadata (ip_tunnel_key) frontends. The
vxlan dst cache does not need any special considerations here, for
the cases where caches can be used, the label is static per cache.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7f70af1

ip_tunnel: add support for setting flow label via collect metadata · 13461144

由 Daniel Borkmann 提交于 3月 09, 2016

This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13461144

09 3月, 2016 2 次提交

vxlan: allow setting ipv6 traffic class · 1400615d

由 Daniel Borkmann 提交于 3月 04, 2016

We can already do that for IPv4, but IPv6 support was missing. Add
it for vxlan, so it can be used with collect metadata frontends.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1400615d

bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit · db3c6139

由 Daniel Borkmann 提交于 3月 04, 2016

The assumptions from commit 0c1d70af ("net: use dst_cache for vxlan
device"), 468dfffc ("geneve: add dst caching support") and 3c1cb4d2
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.

While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.

Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.

Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.

Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
Fixes: 468dfffc ("geneve: add dst caching support")
Fixes: 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db3c6139

05 3月, 2016 1 次提交

vxlan: use reset to set header pointers · 6297b91c

由 Zhang Shengju 提交于 3月 03, 2016

Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add operation
in set function.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6297b91c

04 3月, 2016 1 次提交

vxlan: fix missing options_len update on RX with collect metadata · 4024fcf7

由 Daniel Borkmann 提交于 3月 02, 2016

When signalling to metadata consumers that the metadata_dst entry
carries additional GBP extension data for vxlan (TUNNEL_VXLAN_OPT),
the dst's vxlan_metadata information is populated, but options_len
is left to zero. F.e. in ovs, ovs_flow_key_extract() checks for
options_len before extracting the data through ip_tunnel_info_opts_get().

Geneve uses ip_tunnel_info_opts_set() helper in receive path, which
sets options_len internally, vxlan however uses ip_tunnel_info_opts(),
so when filling vxlan_metadata, we do need to update options_len.

Fixes: 4c222798 ("ip-tunnel: Use API to access tunnel metadata options.")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4024fcf7

27 2月, 2016 1 次提交

net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump. · 472681d5

由 MINOURA Makoto / 箕浦真提交于 2月 25, 2016

When the send skbuff reaches the end, nlmsg_put and friends returns
-EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
within a for_each_netdev loop and the first fdb entry of a following
netdev could fit in the remaining skbuff.  This breaks the mechanism
of cb->args[0] and idx to keep track of the entries that are already
dumped, which results missing entries in bridge fdb show command.
Signed-off-by: NMinoura Makoto <minoura@valinux.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

472681d5

26 2月, 2016 5 次提交

vxlan: simplify metadata_dst usage in vxlan_rcv · 10a5af23

由 Jiri Benc 提交于 2月 23, 2016

Now when the packet is scrubbed early, the metadata_dst can be assigned to
the skb as soon as it is allocated. This simplifies the error cleanup path,
as the dst will be freed by kfree_skb. It is also not necessary to pass it
as a parameter to functions anymore.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10a5af23

vxlan: consolidate rx handling to a single function · f2d1968e

由 Jiri Benc 提交于 2月 23, 2016

Now when both vxlan_udp_encap_recv and vxlan_rcv are much shorter, combine
them into a single function.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2d1968e

vxlan: move ECN decapsulation to a separate function · 760c6805

由 Jiri Benc 提交于 2月 23, 2016

It simplifies the vxlan_rcv function.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

760c6805

vxlan: move inner L2 header processing to a separate function · 1ab016e2

由 Jiri Benc 提交于 2月 23, 2016

This code will be different for VXLAN-GPE, so move it to a separate
function. It will also make the rx path less spaghetti-like.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ab016e2

vxlan: consolidate GBP handling even more · 64f87d36

由 Jiri Benc 提交于 2月 23, 2016

Now when the packet is scrubbed early, skb->mark can be set in the GBP
handling code.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64f87d36

22 2月, 2016 1 次提交

VXLAN: Support outer IPv4 Tx checksums by default · 6ceb31ca

由 Alexander Duyck 提交于 2月 19, 2016

This change makes it so that if UDP CSUM is not specified we will default
to enabling it. The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for VXLAN
tunnels on devices that don't know how to parse tunnel headers.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ceb31ca

19 2月, 2016 5 次提交

vxlan: do not use fdb in metadata mode · f468a729

由 Jiri Benc 提交于 2月 16, 2016

In metadata mode, the vxlan interface is not supposed to use the fdb control
plane but an external one (openvswitch or static routes). With the current
code, packets may leak into the fdb handling code which usually causes them
to be dropped anyway but may have strange side effects.

Just drop the packets directly when in metadata mode if the destination data
are not correctly provided on egress.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f468a729

vxlan: clear IFF_TX_SKB_SHARING · 82a0f6b4

由 Jiri Benc 提交于 2月 16, 2016

ether_setup sets IFF_TX_SKB_SHARING but this is not supported by vxlan
as it modifies the skb on xmit.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82a0f6b4

iptunnel: scrub packet in iptunnel_pull_header · 7f290c94

由 Jiri Benc 提交于 2月 18, 2016

Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f290c94

vxlan: move vxlan device lookup before iptunnel_pull_header · c9e78efb

由 Jiri Benc 提交于 2月 18, 2016

This is in preparation for iptunnel_pull_header calling skb_scrub_packet.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9e78efb

vxlan: tun_id is 64bit, not 32bit · 07dabf20

由 Jiri Benc 提交于 2月 18, 2016

The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to
convert the vni to tun_id correctly.

Fixes: 54bfd872 ("vxlan: keep flags and vni in network byte order")
Reported-by: NPaolo Abeni <pabeni@redhat.com>
Tested-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07dabf20

18 2月, 2016 7 次提交

vxlan: treat vni in metadata based tunnels consistently · b9167b2e

由 Jiri Benc 提交于 2月 16, 2016

For metadata based tunnels, VNI is ignored when doing vxlan device lookups
(because such tunnel receives all VNIs). However, this was not honored by
vxlan_xmit_one when doing encapsulation bypass. Move the check for metadata
based tunnel to the common place where it belongs.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9167b2e

vxlan: clean up rx error path · 288b01c8

由 Jiri Benc 提交于 2月 16, 2016

When there are unrecognized flags present in the vxlan header, it doesn't
make much sense to return the packet for further UDP processing, especially
considering that for other invalid flag combinations we drop the packet
because of previous checks.

This means we return positive value only at the beginning of the function
where tun_dst is not yet allocated. This allows us to get rid of the
bad_flags and error jump labels.

When we're dropping packet, we need to free tun_dst now.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288b01c8

vxlan: clean up extension handling on rx · f14ecebb

由 Jiri Benc 提交于 2月 16, 2016

Bring the extension handling to a single place and move the actual handling
logic out of vxlan_udp_encap_recv as much as possible.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f14ecebb

vxlan: move GBP header parsing to a separate function · 3288af08

由 Jiri Benc 提交于 2月 16, 2016

To make vxlan_udp_encap_recv shorter and more comprehensible.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3288af08

vxlan: simplify vxlan_remcsum · be5cfeab

由 Jiri Benc 提交于 2月 16, 2016

Part of the parameters is not needed. Simplify the caller of this function
in preparation of making vxlan rx more comprehensible.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be5cfeab

vxlan: keep flags and vni in network byte order · 54bfd872

由 Jiri Benc 提交于 2月 16, 2016

Prevent repeated conversions from and to network order in the fast path.

To achieve this, define all flag constants in big endian order and store VNI
as __be32. To prevent confusion between the actual VNI value and the VNI
field from the header (which contains additional reserved byte), strictly
distinguish between "vni" and "vni_field".
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54bfd872

vxlan: introduce vxlan_hdr · d4ac05ff

由 Jiri Benc 提交于 2月 16, 2016

Currently, pointer to the vxlan header is kept in a local variable. It has
to be reloaded whenever the pskb pull operations are performed which usually
happens somewhere deep in called functions.

Create a vxlan_hdr function and use it to reference the vxlan header
instead.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4ac05ff

17 2月, 2016 2 次提交

net: add dst_cache to ovs vxlan lwtunnel · d71785ff

由 Paolo Abeni 提交于 2月 12, 2016

In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71785ff

net: use dst_cache for vxlan device · 0c1d70af

由 Paolo Abeni 提交于 2月 12, 2016

In case of UDP traffic with datagram length
below MTU this give about 3% performance increase
when tunneling over ipv4 and about 70% when
tunneling over ipv6.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c1d70af

12 2月, 2016 2 次提交

net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads · 6fa79666

由 Edward Cree 提交于 2月 11, 2016

All users now pass false, so we can remove it, and remove the code that
 was conditional upon it.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fa79666

net: vxlan: enable local checksum offload · b5708501

由 Edward Cree 提交于 2月 11, 2016

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5708501

11 2月, 2016 1 次提交

vxlan: udp_tunnel duplicate include net/udp_tunnel.h · 40d29af0

由 stephen hemminger 提交于 2月 09, 2016

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40d29af0

10 2月, 2016 1 次提交

vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · 7e059158

由 David Wragg 提交于 2月 10, 2016

Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: NDavid Wragg <david@weave.works>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e059158

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功