提交 · b90a1269184a3ff374562d243419ad2fa9d3b1aa · openeuler / Kernel

04 8月, 2020 2 次提交

net: openvswitch: make masks cache size configurable · 9bf24f59

由 Eelco Chaudron 提交于 7月 31, 2020

This patch makes the masks cache size configurable, or with
a size of 0, disable it.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf24f59

net: openvswitch: add masks cache hit counter · 9d2f627b

由 Eelco Chaudron 提交于 7月 31, 2020

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d2f627b

18 7月, 2020 1 次提交

net: openvswitch: reorder masks array based on usage · eac87c41

由 Eelco Chaudron 提交于 7月 15, 2020

This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.

Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.

Kernel without this patch     :  7.71 Gbps
Kernel with this patch applied: 14.52 Gbps

We also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Tested-by: NAndrew Theurer <atheurer@redhat.com>
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eac87c41

03 4月, 2020 1 次提交

net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry · 64948427

由 Tonghao Zhang 提交于 3月 26, 2020

The struct sw_flow is protected by RCU, when traversing them,
use hlist_for_each_entry_rcu.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64948427

19 2月, 2020 1 次提交

flow_table.c: Use built-in RCU list checking · a2cfb96c

由 Madhuparna Bhowmik 提交于 2月 19, 2020

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.
Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2cfb96c

04 11月, 2019 8 次提交

net: openvswitch: fix possible memleak on destroy flow-table · 50b0e61b

由 Tonghao Zhang 提交于 11月 01, 2019

When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50b0e61b

net: openvswitch: add likely in flow_lookup · 0a3e0137

由 Tonghao Zhang 提交于 11月 01, 2019

The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a3e0137

net: openvswitch: simplify the flow_hash · 515b65a4

由 Tonghao Zhang 提交于 11月 01, 2019

Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

515b65a4

net: openvswitch: optimize flow-mask looking up · 57f7d7b9

由 Tonghao Zhang 提交于 11月 01, 2019

The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.

One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.

+-------------------------------------------+
| M | NULL | ...                  | NULL | M|
+-------------------------------------------+

In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask.

         "break"
            v
+-------------------------------------------+
| M | M |  NULL |...           | NULL | NULL|
+-------------------------------------------+

This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_find
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57f7d7b9

net: openvswitch: optimize flow mask cache hash collision · a7f35e78

由 Tonghao Zhang 提交于 11月 01, 2019

Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.

Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7f35e78

net: openvswitch: shrink the mask array if necessary · 1689754d

由 Tonghao Zhang 提交于 11月 01, 2019

When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1689754d

net: openvswitch: convert mask list in mask array · 4bc63b1b

由 Tonghao Zhang 提交于 11月 01, 2019

Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.

Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bc63b1b

net: openvswitch: add flow-mask cache for performance · 04b7d136

由 Tonghao Zhang 提交于 11月 01, 2019

The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.

Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.

Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b7d136

20 7月, 2019 1 次提交

net: openvswitch: rename flow_stats to sw_flow_stats · aef833c5

由 Pablo Neira Ayuso 提交于 7月 19, 2019

There is a flow_stats structure defined in include/net/flow_offload.h
and a follow up patch adds #include <net/flow_offload.h> to
net/sch_generic.h.

This breaks compilation since OVS codebase includes net/sock.h which
pulls in linux/filter.h which includes net/sch_generic.h.

In file included from ./include/net/sch_generic.h:18:0,
                 from ./include/linux/filter.h:25,
                 from ./include/net/sock.h:59,
                 from ./include/linux/tcp.h:19,
                 from net/openvswitch/datapath.c:24

This definition takes precedence on OVS since it is placed in the
networking core, so rename flow_stats in OVS to sw_flow_stats since
this structure is contained in sw_flow.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aef833c5

05 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 · c9422999

由 Thomas Gleixner 提交于 5月 29, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation inc 51 franklin street fifth floor boston ma
  02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 21 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c9422999

13 3月, 2019 1 次提交

openvswitch: convert to kvmalloc · ee9c5e67

由 Kent Overstreet 提交于 3月 11, 2019

Patch series "generic radix trees; drop flex arrays".

This patch (of 7):

There was no real need for this code to be using flexarrays, it's just
implementing a hash table - ideally it would be using rhashtables, but
that conversion would be significantly more complicated.

Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NMatthew Wilcox <willy@infradead.org>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee9c5e67

20 7月, 2017 1 次提交

openvswitch: Optimize operations for OvS flow_stats. · c4b2bf6b

由 Tonghao Zhang 提交于 7月 17, 2017

When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).

The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.

VM01 --- OvS-VM --- VM02

Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.

	4.23%  [kernel]            [k] _raw_spin_unlock_irqrestore
	3.62%  [kernel]            [k] __do_softirq
	3.16%  [kernel]            [k] __memcpy
	3.02%  [kernel]            [k] flow_free
	2.42%  libc-2.17.so        [.] __memcpy_ssse3_back
	2.18%  [kernel]            [k] copy_user_generic_unrolled
	2.17%  [kernel]            [k] find_next_bit

When applied this patch, perf-top show as below: Not shown on
the list anymore.

	4.11%  [kernel]            [k] _raw_spin_unlock_irqrestore
	3.79%  [kernel]            [k] __do_softirq
	3.46%  [kernel]            [k] __memcpy
	2.73%  libc-2.17.so        [.] __memcpy_ssse3_back
	2.25%  [kernel]            [k] copy_user_generic_unrolled
	1.89%  libc-2.17.so        [.] _int_malloc
	1.53%  ovs-vswitchd        [.] xlate_actions

With this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).

This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b2bf6b

19 9月, 2016 2 次提交

openvswitch: use percpu flow stats · db74a333

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: pravin shelar <pshelar@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db74a333

openvswitch: fix flow stats accounting when node 0 is not possible · 40773966

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40773966

07 10月, 2015 1 次提交

openvswitch: add tunnel protocol to sw_flow_key · 00a93bab

由 Jiri Benc 提交于 10月 05, 2015

Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00a93bab

05 10月, 2015 1 次提交

ovs: do not allocate memory from offline numa node · 598c12d0

由 Konstantin Khlebnikov 提交于 10月 02, 2015

When openvswitch tries allocate memory from offline numa node 0:
stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
[ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
This patch disables numa affinity in this case.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

598c12d0

23 9月, 2015 1 次提交

openvswitch: Zero flows on allocation. · ae5f2fb1

由 Jesse Gross 提交于 9月 21, 2015

When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: NJesse Gross <jesse@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae5f2fb1

21 8月, 2015 1 次提交

ip_tunnels: add IPv6 addresses to ip_tunnel_key · c1ea5d67

由 Jiri Benc 提交于 8月 20, 2015

Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1ea5d67

22 7月, 2015 2 次提交

openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes · bac541e4

由 Chris J Arges 提交于 7月 21, 2015

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bac541e4

openvswitch: Make tunnel set action attach a metadata dst · 34ae932a

由 Thomas Graf 提交于 7月 21, 2015

Utilize the new metadata dst to attach encapsulation instructions to
the skb. The existing egress_tun_info via the OVS_CB() is left in
place until all tunnel vports have been converted to the new method.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34ae932a

08 2月, 2015 1 次提交

openvswitch: Initialize unmasked key and uid len · ca539345

由 Pravin B Shelar 提交于 2月 06, 2015

Flow alloc needs to initialize unmasked key pointer. Otherwise
it can crash kernel trying to free random unmasked-key pointer.

general protection fault: 0000 [#1] SMP
3.19.0-rc6-net-next+ #457
Hardware name: Supermicro X7DWU/X7DWU, BIOS  1.1 04/30/2008
RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196
Call Trace:
 [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch]
 [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch]
 [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
 [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
 [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec
 [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9
 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa
 [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89
 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9
 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95
 [<ffffffff81399898>] ? genl_rcv+0x18/0x37
 [<ffffffff813998a7>] genl_rcv+0x27/0x37
 [<ffffffff81399033>] netlink_unicast+0x103/0x191
 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310
 [<ffffffff811007ad>] ? might_fault+0x50/0xa0
 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a
 [<ffffffff8135c799>] sock_sendmsg+0xb/0xd
 [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348
 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348
 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e
 [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16
 [<ffffffff81411852>] system_call_fastpath+0x12/0x17

Fixes: 74ed7ab9("openvswitch: Add support for unique flow IDs.")
CC: Joe Stringer <joestringer@nicira.com>
Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca539345

27 1月, 2015 3 次提交

openvswitch: Add support for unique flow IDs. · 74ed7ab9

由 Joe Stringer 提交于 1月 21, 2015

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74ed7ab9

openvswitch: Use sw_flow_key_range for key ranges. · 272c2cf8

由 Joe Stringer 提交于 1月 21, 2015

These minor tidyups make a future patch a little tidier.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

272c2cf8

openvswitch: Refactor ovs_flow_tbl_insert(). · d29ab6f8

由 Joe Stringer 提交于 1月 21, 2015

Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d29ab6f8

11 12月, 2014 1 次提交

net: replace remaining users of arch_fast_hash with jhash · 87545899

由 Daniel Borkmann 提交于 12月 10, 2014

This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.

While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.

Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.

  [1] https://patchwork.ozlabs.org/patch/299369/
  [2] https://patchwork.ozlabs.org/patch/418756/

Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87545899

10 11月, 2014 1 次提交

openvswitch: Constify various function arguments · 12eb18f7

由 Thomas Graf 提交于 11月 06, 2014

Help produce better optimized code.
Signed-off-by: NThomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

12eb18f7

06 11月, 2014 1 次提交

openvswitch: Move table destroy to dp-rcu callback. · 9b996e54

由 Pravin B Shelar 提交于 5月 06, 2014

Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@redhat.com>

9b996e54

01 7月, 2014 1 次提交

openvswitch: Use exact lookup for flow_get and flow_del. · 4a46b24e

由 Alex Wang 提交于 6月 30, 2014

Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath.  And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Introduced by 03f0d916 (openvswitch: Mega flow implementation).
Signed-off-by: NAlex Wang <alexw@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

4a46b24e

23 5月, 2014 2 次提交

openvswitch: Fix typo. · eb072659

由 Jarno Rajahalme 提交于 5月 05, 2014

Incorrect struct name was confusing, even though otherwise
inconsequental.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

eb072659

openvswitch: Make flow mask removal symmetric. · 56c19868

由 Jarno Rajahalme 提交于 5月 05, 2014

Masks are inserted when flows are inserted to the table, so it is
logical to correspondingly remove masks when flows are removed from
the table, in ovs_flow_table_remove().

This allows ovs_flow_free() to be called without locking, which will
be used by later patches.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

56c19868

17 5月, 2014 3 次提交

openvswitch: Per NUMA node flow stats. · 63e7959c

由 Jarno Rajahalme 提交于 3月 27, 2014

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

63e7959c

openvswitch: Remove 5-tuple optimization. · 23dabf88

由 Jarno Rajahalme 提交于 3月 27, 2014

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

23dabf88

openvswitch: use const in some local vars and casts · 7085130b

由 Daniele Di Proietto 提交于 1月 23, 2014

In few functions, const formal parameters are assigned or cast to
non-const.
These changes suppress warnings if compiled with -Wcast-qual.
Signed-off-by: NDaniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

7085130b

05 2月, 2014 2 次提交

openvswitch: Fix ovs_flow_free() ovs-lock assert. · e4c6d759

由 Pravin B Shelar 提交于 1月 31, 2014

ovs_flow_free() is not called under ovs-lock during packet
execute path (ovs_packet_cmd_execute()). Since packet execute
does not touch flow->mask, there is no need to take that
lock either. So move assert in case where flow->mask is checked.

Found by code inspection.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

e4c6d759

openvswitch: Fix kernel panic on ovs_flow_free · e80857cc

由 Andy Zhou 提交于 1月 21, 2014

Both mega flow mask's reference counter and per flow table mask list
should only be accessed when holding ovs_mutex() lock. However
this is not true with ovs_flow_table_flush(). The patch fixes this bug.
Reported-by: NJoe Stringer <joestringer@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

e80857cc

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功