提交 · be73b3043bf465455d4c9b88f68e03b6447bcfb0 · openeuler / Kernel

02 8月, 2017 1 次提交

vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP · be73b304

由 K. Den 提交于 8月 01, 2017

In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.

However, since b7fe10e5("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.

This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.

Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: NKoichiro Den <den@klaipeden.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be73b304

05 7月, 2017 1 次提交

net, vxlan: convert vxlan_sock.refcnt from atomic_t to refcount_t · 66af846f

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66af846f

03 7月, 2017 2 次提交

vxlan: fix hlist corruption · 69e76661

由 Jiri Benc 提交于 7月 02, 2017

It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69e76661

vxlan: correctly set vxlan->net when creating the device in a netns · 889ce937

由 Sabrina Dubroca 提交于 6月 30, 2017

Commit a985343b ("vxlan: refactor verification and application of
configuration") modified vxlan device creation, and replaced the
assignment of vxlan->net to src_net with dev_net(netdev) in ->setup().

But dev_net(netdev) is not the same as src_net. At the time ->setup()
is called, dev_net hasn't been set yet, so we end up creating the
socket for the vxlan device in init_net.

Fix this by bringing back the assignment of vxlan->net during device
creation.

Fixes: a985343b ("vxlan: refactor verification and application of configuration")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

889ce937

28 6月, 2017 1 次提交

vxlan: fix incorrect nlattr access in MTU check · 019b13ae

由 Matthias Schiffer 提交于 6月 27, 2017

The access to the wrong variable could lead to a NULL dereference and
possibly other invalid memory reads in vxlan newlink/changelink requests
with a IFLA_MTU attribute.

Fixes: a985343b "vxlan: refactor verification and application of configuration"
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

019b13ae

27 6月, 2017 3 次提交

net: add netlink_ext_ack argument to rtnl_link_ops.validate · a8b8a889

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b8a889

net: add netlink_ext_ack argument to rtnl_link_ops.changelink · ad744b22

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad744b22

net: add netlink_ext_ack argument to rtnl_link_ops.newlink · 7a3f4a18

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a3f4a18

21 6月, 2017 6 次提交

vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses · 49f810f0

由 Matthias Schiffer 提交于 6月 19, 2017

As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.

This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49f810f0

vxlan: fix snooping for link-local IPv6 addresses · 87613de9

由 Matthias Schiffer 提交于 6月 19, 2017

If VXLAN is run over link-local IPv6 addresses, it is necessary to store
the ifindex in the FDB entries. Otherwise, the used interface is undefined
and unicast communication will most likely fail.

Support for link-local IPv4 addresses should be possible as well, but as
the semantics aren't as well defined as for IPv6, and there doesn't seem to
be much interest in having the support, it's not implemented for now.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87613de9

vxlan: check valid combinations of address scopes · 0f22a3c6

由 Matthias Schiffer 提交于 6月 19, 2017

* Multicast addresses are never valid as local address
* Link-local IPv6 unicast addresses may only be used as remote when the
  local address is link-local as well
* Don't allow link-local IPv6 local/remote addresses without interface

We also store in the flags field if link-local addresses are used for the
follow-up patches that actually make VXLAN over link-local IPv6 work.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f22a3c6

vxlan: improve validation of address family configuration · ce44a4ae

由 Matthias Schiffer 提交于 6月 19, 2017

Address families of source and destination addresses must match, and
changelink operations can't change the address family.

In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses
IPv4 or IPv6.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce44a4ae

vxlan: get rid of redundant vxlan_dev.flags · dc5321d7

由 Matthias Schiffer 提交于 6月 19, 2017

There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc5321d7

vxlan: refactor verification and application of configuration · a985343b

由 Matthias Schiffer 提交于 6月 19, 2017

The vxlan_dev_configure function was mixing validation and application of
the vxlan configuration; this could easily lead to bugs with the changelink
operation, as it was hard to see if the function wcould return an error
after parts of the configuration had already been applied.

This commit splits validation and application out of vxlan_dev_configure as
separate functions to make it clearer where error returns are allowed and
where the vxlan_dev or net_device may be configured. Log messages in these
functions are removed, as it is generally unexpected to find error output
for netlink requests in the kernel log. Userspace should be able to handle
errors based on the error codes returned via netlink just fine.

In addition, some validation and initialization is moved to vxlan_validate
and vxlan_setup respectively to improve grouping of similar settings.

Finally, this also fixes two actual bugs:

* if set, conf->mtu would overwrite dev->mtu in each changelink operation,
  reverting other changes of dev->mtu
* the "if (!conf->dst_port)" branch would never be run, as conf->dst_port
  was set in vxlan_setup before. This caused VXLAN-GPE to use the same
  default port as other VXLAN sockets instead of the intended IANA-assigned
  4790.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a985343b

16 6月, 2017 2 次提交

networking: make skb_push & __skb_push return void pointers · d58ff351

由 Johannes Berg 提交于 6月 16, 2017

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d58ff351

networking: convert many more places to skb_put_zero() · b080db58

由 Johannes Berg 提交于 6月 16, 2017

There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b080db58

12 6月, 2017 1 次提交

vxlan: dont migrate permanent fdb entries during learn · e0090a9e

由 Roopa Prabhu 提交于 6月 11, 2017

This patch fixes vxlan_snoop to not move permanent fdb entries
on learn events. This is consistent with the bridge fdb
handling of permanent entries.

Fixes: 26a41ae6 ("vxlan: only migrate dynamic FDB entries")
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0090a9e

08 6月, 2017 2 次提交

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

由 David S. Miller 提交于 5月 08, 2017

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf124db5

vxlan: use a more suitable function when assigning NULL · 57d88182

由 Mark Bloch 提交于 6月 07, 2017

When stopping the vxlan interface we detach it from the socket.
Use RCU_INIT_POINTER() and not rcu_assign_pointer() to do so.
Suggested-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57d88182

03 6月, 2017 1 次提交

vxlan: fix use-after-free on deletion · a53cb29b

由 Mark Bloch 提交于 6月 02, 2017

Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.

We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().

Fixes: 56ef9c90 ("vxlan: Move socket initialization to within rtnl scope")
Acked-by: NJiri Benc <jbenc@redhat.com>
Tested-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NMark Bloch <markb@mellanox.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a53cb29b

02 6月, 2017 1 次提交

vxlan: eliminate cached dst leak · 35cf2845

由 Lance Richardson 提交于 5月 29, 2017

After commit 0c1d70af ("net: use dst_cache for vxlan device"),
cached dst entries could be leaked when more than one remote was
present for a given vxlan_fdb entry, causing subsequent netns
operations to block indefinitely and "unregister_netdevice: waiting
for lo to become free." messages to appear in the kernel log.

Fix by properly releasing cached dst and freeing resources in this
case.

Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35cf2845

01 5月, 2017 2 次提交

vxlan: do not output confusing error message · baf4d786

由 Jiri Benc 提交于 4月 27, 2017

The message "Cannot bind port X, err=Y" creates only confusion. In metadata
based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
no error message should be printed. But when IPv6 tunnel was requested, such
failure is fatal. The vxlan_socket_create does not know when the error is
harmless and when it's not.

Instead of passing such information down to vxlan_socket_create, remove the
message completely. It's not useful. We propagate the error code up to the
user space and the port number comes from the user space. There's nothing in
the message that the process creating vxlan interface does not know.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baf4d786

vxlan: correctly handle ipv6.disable module parameter · d074bf96

由 Jiri Benc 提交于 4月 27, 2017

When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d074bf96

04 4月, 2017 1 次提交

vxlan: fix ND proxy when skb doesn't have transport header offset · f1fb08f6

由 Vincent Bernat 提交于 4月 02, 2017

When an incoming frame is tagged or when GRO is disabled, the skb
handled to vxlan_xmit() doesn't contain a valid transport header
offset. This makes ND proxying fail.

We combine two changes: replace use of skb_transport_offset() and ensure
the necessary amount of skb is linear just before using it:

 - In vxlan_xmit(), when determining if we have an ICMPv6 neighbor
   discovery packet, just check if it is an ICMPv6 packet and rely on
   neigh_reduce() to do more checks if this is the case. The use of
   pskb_may_pull() is replaced by skb_header_pointer() for just the IPv6
   header.

 - In neigh_reduce(), add pskb_may_pull() for IPv6 header and neighbor
   discovery message since this was removed from vxlan_xmit(). Replace
   skb_transport_header() with ipv6_hdr() + 1.

 - In vxlan_na_create(), replace first skb_transport_offset() with
   ipv6_hdr() + 1 and second with skb_network_offset() + sizeof(struct
   ipv6hdr). Additionally, ensure we pskb_may_pull() the whole skb as we
   need it to iterate over the options.
Signed-off-by: NVincent Bernat <vincent@bernat.im>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1fb08f6

02 4月, 2017 1 次提交

vxlan: vxlan dev should inherit lowerdev's gso_max_size · d6acfeb1

由 Felix Manlunas 提交于 3月 29, 2017

vxlan dev currently ignores lowerdev's gso_max_size, which adversely
affects TSO performance of liquidio if it's the lowerdev.  Egress TCP
packets' skb->len often exceed liquidio's advertised gso_max_size.  This
may happen on other NIC drivers.

Fix it by assigning lowerdev's gso_max_size to that of vxlan dev.  Might as
well do likewise for gso_max_segs.

Single flow TSO throughput of liquidio as lowerdev (using iperf3):

    Before the patch:    139 Mbps
    After the patch :   8.68 Gbps
    Percent increase:  6,144 %
Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: NSatanand Burla <satananda.burla@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6acfeb1

29 3月, 2017 1 次提交

vxlan: don't age NTF_EXT_LEARNED fdb entries · def499c9

由 Roopa Prabhu 提交于 3月 27, 2017

vxlan driver already implicitly supports installing
of external fdb entries with NTF_EXT_LEARNED. This
patch just makes sure these entries are not aged
by the vxlan driver. An external entity managing these
entries will age them out. This is consistent with
the use of NTF_EXT_LEARNED in the bridge driver.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def499c9

14 3月, 2017 1 次提交

vxlan: fix ovs support · c80498e3

由 Nicolas Dichtel 提交于 3月 13, 2017

The required changes in the function vxlan_dev_create() were missing
in commit 8bcdc4f3.
The vxlan device is not registered anymore after this patch and the error
path causes an stack dump:
 WARNING: CPU: 3 PID: 1498 at net/core/dev.c:6713 rollback_registered_many+0x9d/0x3f0

Fixes: 8bcdc4f3 ("vxlan: add changelink support")
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c80498e3

13 3月, 2017 1 次提交

vxlan: use appropriate family on L3 miss · 8f48ba71

由 Vincent Bernat 提交于 3月 10, 2017

When sending a L3 miss, the family is set to AF_INET even for IPv6. This
causes userland (eg "ip monitor") to be confused. Ensure we send the
appropriate family in this case. For L2 miss, keep using AF_INET.
Signed-off-by: NVincent Bernat <vincent@bernat.im>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f48ba71

02 3月, 2017 1 次提交

vxlan: lock RCU on TX path · 56de859e

由 Jakub Kicinski 提交于 2月 24, 2017

There is no guarantees that callers of the TX path will hold
the RCU lock.  Grab it explicitly.

Fixes: c6fcc4fc ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56de859e

25 2月, 2017 2 次提交

vxlan: don't allow overwrite of config src addr · 1158632b

由 Brian Russell 提交于 2月 24, 2017

When using IPv6 transport and a default dst, a pointer to the configured
source address is passed into the route lookup. If no source address is
configured, then the value is overwritten.

IPv6 route lookup ignores egress ifindex match if the source address is set,
so if egress ifindex match is desired, the source address must be passed
as any. The overwrite breaks this for subsequent lookups.

Avoid this by copying the configured address to an existing stack variable
and pass a pointer to that instead.

Fixes: 272d96a5 ("net: vxlan: lwt: Use source ip address during route lookup.")
Signed-off-by: NBrian Russell <brussell@brocade.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1158632b

vxlan: correctly validate VXLAN ID against VXLAN_N_VID · 4e37d691

由 Matthias Schiffer 提交于 2月 23, 2017

The incorrect check caused an off-by-one error: the maximum VID 0xffffff
was unusable.

Fixes: d342894c ("vxlan: virtual extensible lan")
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e37d691

22 2月, 2017 2 次提交

vxlan: remove unused variable saddr in neigh_reduce · 8dcd81a9

由 Roopa Prabhu 提交于 2月 20, 2017

silences the below warning:
    drivers/net/vxlan.c: In function ‘neigh_reduce’:
    drivers/net/vxlan.c:1599:25: warning: variable ‘saddr’ set but not used
    [-Wunused-but-set-variable]
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dcd81a9

vxlan: add changelink support · 8bcdc4f3

由 Roopa Prabhu 提交于 2月 20, 2017

This patch adds changelink rtnl op support for vxlan netdevs.
code changes involve:
    - refactor vxlan_newlink into vxlan_nl2conf to be
    used by vxlan_newlink and vxlan_changelink
    - vxlan_nl2conf and vxlan_dev_configure take a
    changelink argument to isolate changelink checks
    and updates.
    - Allow changing only a few attributes:
        - return -EOPNOTSUPP for attributes that cannot
        be changed for now. Incremental patches can
        make the non-supported one available in the future
        if needed.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bcdc4f3

18 2月, 2017 1 次提交

vxlan: fix oops in dev_fill_metadata_dst · 22f0708a

由 Paolo Abeni 提交于 2月 17, 2017

Since the commit 0c1d70af ("net: use dst_cache for vxlan device")
vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
dst_cache pointer, so the latter should explicitly check for
valid dst_cache ptr. Unfortunately the commit d71785ff ("net: add
dst_cache to ovs vxlan lwtunnel") removed said check.

As a result is possible to trigger a null pointer access calling
vxlan_fill_metadata_dst(), e.g. with:

ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
	type=vxlan options:remote_ip=192.168.1.1 \
	options:key=1234 options:dst_port=4789 ofport_request=10
ip address add dev ovs-br0 172.16.1.2/24
ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
	targets=\"172.16.1.1:1234\" sampling=1
iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234

This commit addresses the issue passing to vxlan_get_route() the
dst_cache already available into the lwt info processed by
vxlan_fill_metadata_dst().

Fixes: d71785ff ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22f0708a

12 2月, 2017 1 次提交

vxlan: remove vni zero check and drop for COLLECT_METADATA · 98eb253c

由 Roopa Prabhu 提交于 2月 10, 2017

This patch drops the vni zero check for COLLECT_METADATA mode.
It is not really needed, vni zero is a valid vni.

Fixes: 3ad7a4b1 ("vxlan: support fdb and learning in COLLECT_METADATA mode"
Reported-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98eb253c

04 2月, 2017 1 次提交

vxlan: support fdb and learning in COLLECT_METADATA mode · 3ad7a4b1

由 Roopa Prabhu 提交于 1月 31, 2017

Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.

use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.

Changes introduced by this patch:
    - allow learning and forwarding database state in vxlan netdev in
      COLLECT_METADATA mode. Current behaviour is not changed
      by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
      to support the new bridge friendly mode.
    - A single fdb table hashed by (mac, vni) to allow fdb entries with
      multiple vnis in the same fdb table
    - rx path already has the vni
    - tx path expects a vni in the packet with dst_metadata
    - prior to this series, fdb remote_dsts carried remote vni and
      the vxlan device carrying the fdb table represented the
      source vni. With the vxlan device now representing multiple vnis,
      this patch adds a src vni attribute to the fdb entry. The remote
      vni already uses NDA_VNI attribute. This patch introduces
      NDA_SRC_VNI netlink attribute to represent the src vni in a multi
      vni fdb table.

iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.

before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self

after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ad7a4b1

25 1月, 2017 2 次提交

vxlan: do not age static remote mac entries · efb5f68f

由 Balakrishnan Raman 提交于 1月 23, 2017

Mac aging is applicable only for dynamically learnt remote mac
entries. Check for user configured static remote mac entries
and skip aging.
Signed-off-by: NBalakrishnan Raman <ramanb@cumulusnetworks.com>
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efb5f68f

vxlan: don't flush static fdb entries on admin down · 8b3f9337

由 Roopa Prabhu 提交于 1月 23, 2017

This patch skips flushing static fdb entries in
ndo_stop, but flushes all fdb entries during vxlan
device delete. This is consistent with the bridge
driver fdb
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b3f9337

21 1月, 2017 1 次提交

vxlan: preserve type of dst_port parm for encap_bypass_if_local() · 1f6cc07e

由 Lance Richardson 提交于 1月 18, 2017

Eliminate sparse warning by maintaining type of dst_port
as __be16.
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f6cc07e

18 1月, 2017 1 次提交

vxlan: fix byte order of vxlan-gpe port number · d5ff72d9

由 Lance Richardson 提交于 1月 16, 2017

vxlan->cfg.dst_port is in network byte order, so an htons()
is needed here. Also reduced comment length to stay closer
to 80 column width (still slightly over, however).

Fixes: e1e5314d ("vxlan: implement GPE")
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5ff72d9

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功