1. 04 8月, 2020 2 次提交
  2. 18 7月, 2020 1 次提交
    • E
      net: openvswitch: reorder masks array based on usage · eac87c41
      Eelco Chaudron 提交于
      This patch reorders the masks array every 4 seconds based on their
      usage count. This greatly reduces the masks per packet hit, and
      hence the overall performance. Especially in the OVS/OVN case for
      OpenShift.
      
      Here are some results from the OVS/OVN OpenShift test, which use
      8 pods, each pod having 512 uperf connections, each connection
      sends a 64-byte request and gets a 1024-byte response (TCP).
      All uperf clients are on 1 worker node while all uperf servers are
      on the other worker node.
      
      Kernel without this patch     :  7.71 Gbps
      Kernel with this patch applied: 14.52 Gbps
      
      We also run some tests to verify the rebalance activity does not
      lower the flow insertion rate, which does not.
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Tested-by: NAndrew Theurer <atheurer@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eac87c41
  3. 03 4月, 2020 1 次提交
  4. 19 2月, 2020 1 次提交
  5. 04 11月, 2019 8 次提交
  6. 20 7月, 2019 1 次提交
    • P
      net: openvswitch: rename flow_stats to sw_flow_stats · aef833c5
      Pablo Neira Ayuso 提交于
      There is a flow_stats structure defined in include/net/flow_offload.h
      and a follow up patch adds #include <net/flow_offload.h> to
      net/sch_generic.h.
      
      This breaks compilation since OVS codebase includes net/sock.h which
      pulls in linux/filter.h which includes net/sch_generic.h.
      
      In file included from ./include/net/sch_generic.h:18:0,
                       from ./include/linux/filter.h:25,
                       from ./include/net/sock.h:59,
                       from ./include/linux/tcp.h:19,
                       from net/openvswitch/datapath.c:24
      
      This definition takes precedence on OVS since it is placed in the
      networking core, so rename flow_stats in OVS to sw_flow_stats since
      this structure is contained in sw_flow.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aef833c5
  7. 05 6月, 2019 1 次提交
  8. 13 3月, 2019 1 次提交
  9. 20 7月, 2017 1 次提交
    • T
      openvswitch: Optimize operations for OvS flow_stats. · c4b2bf6b
      Tonghao Zhang 提交于
      When calling the flow_free() to free the flow, we call many times
      (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
      take up our CPU usage if we call the flow_free() frequently.
      When we put all packets to userspace via upcall, and OvS will send
      them back via netlink to ovs_packet_cmd_execute(will call flow_free).
      
      The test topo is shown as below. VM01 sends TCP packets to VM02,
      and OvS forward packtets. When testing, we use perf to report the
      system performance.
      
      VM01 --- OvS-VM --- VM02
      
      Without this patch, perf-top show as below: The flow_free() is
      3.02% CPU usage.
      
      	4.23%  [kernel]            [k] _raw_spin_unlock_irqrestore
      	3.62%  [kernel]            [k] __do_softirq
      	3.16%  [kernel]            [k] __memcpy
      	3.02%  [kernel]            [k] flow_free
      	2.42%  libc-2.17.so        [.] __memcpy_ssse3_back
      	2.18%  [kernel]            [k] copy_user_generic_unrolled
      	2.17%  [kernel]            [k] find_next_bit
      
      When applied this patch, perf-top show as below: Not shown on
      the list anymore.
      
      	4.11%  [kernel]            [k] _raw_spin_unlock_irqrestore
      	3.79%  [kernel]            [k] __do_softirq
      	3.46%  [kernel]            [k] __memcpy
      	2.73%  libc-2.17.so        [.] __memcpy_ssse3_back
      	2.25%  [kernel]            [k] copy_user_generic_unrolled
      	1.89%  libc-2.17.so        [.] _int_malloc
      	1.53%  ovs-vswitchd        [.] xlate_actions
      
      With this patch, the TCP throughput(we dont use Megaflow Cache
      + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
      (maybe ~10% performance imporve).
      
      This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
      that the flow used. And we only check the flow_stats on the cpu we
      used, and it is unncessary to check all possible cpu when getting,
      cleaning, and updating the flow_stats. Adding the cpu_used_mask to
      sw_flow struct does’t increase the cacheline number.
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4b2bf6b
  10. 19 9月, 2016 2 次提交
  11. 07 10月, 2015 1 次提交
  12. 05 10月, 2015 1 次提交
  13. 23 9月, 2015 1 次提交
    • J
      openvswitch: Zero flows on allocation. · ae5f2fb1
      Jesse Gross 提交于
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5f2fb1
  14. 21 8月, 2015 1 次提交
  15. 22 7月, 2015 2 次提交
  16. 08 2月, 2015 1 次提交
    • P
      openvswitch: Initialize unmasked key and uid len · ca539345
      Pravin B Shelar 提交于
      Flow alloc needs to initialize unmasked key pointer. Otherwise
      it can crash kernel trying to free random unmasked-key pointer.
      
      general protection fault: 0000 [#1] SMP
      3.19.0-rc6-net-next+ #457
      Hardware name: Supermicro X7DWU/X7DWU, BIOS  1.1 04/30/2008
      RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196
      Call Trace:
       [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch]
       [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch]
       [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
       [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
       [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec
       [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9
       [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa
       [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89
       [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9
       [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95
       [<ffffffff81399898>] ? genl_rcv+0x18/0x37
       [<ffffffff813998a7>] genl_rcv+0x27/0x37
       [<ffffffff81399033>] netlink_unicast+0x103/0x191
       [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310
       [<ffffffff811007ad>] ? might_fault+0x50/0xa0
       [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a
       [<ffffffff8135c799>] sock_sendmsg+0xb/0xd
       [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218
       [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
       [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348
       [<ffffffff8115a720>] ? fsnotify+0x7c/0x348
       [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf
       [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
       [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e
       [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16
       [<ffffffff81411852>] system_call_fastpath+0x12/0x17
      
      Fixes: 74ed7ab9("openvswitch: Add support for unique flow IDs.")
      CC: Joe Stringer <joestringer@nicira.com>
      Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca539345
  17. 27 1月, 2015 3 次提交
  18. 11 12月, 2014 1 次提交
    • D
      net: replace remaining users of arch_fast_hash with jhash · 87545899
      Daniel Borkmann 提交于
      This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32
      accelerated flow hash if available"), and other remaining arch_fast_hash()
      users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out
      delegations for 30 seconds after recalling them.") where it has been used
      as a hash function for bloom filtering.
      
      While we think that these users are actually not much of concern, it has
      been requested to remove the arch_fast_hash() library bits that arose
      from [1] entirely as per recent discussion [2]. The main argument is that
      using it as a hash may introduce bias due to its linearity (see avalanche
      criterion) and thus makes it less clear (though we tried to document that)
      when this security/performance trade-off is actually acceptable for a
      general purpose library function.
      
      Lets therefore avoid any further confusion on this matter and remove it to
      prevent any future accidental misuse of it. For the time being, this is
      going to make hashing of flow keys a bit more expensive in the ovs case,
      but future work could reevaluate a different hashing discipline.
      
        [1] https://patchwork.ozlabs.org/patch/299369/
        [2] https://patchwork.ozlabs.org/patch/418756/
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: Francesco Fusco <fusco@ntop.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87545899
  19. 10 11月, 2014 1 次提交
  20. 06 11月, 2014 1 次提交
  21. 01 7月, 2014 1 次提交
  22. 23 5月, 2014 2 次提交
  23. 17 5月, 2014 3 次提交
    • J
      openvswitch: Per NUMA node flow stats. · 63e7959c
      Jarno Rajahalme 提交于
      Keep kernel flow stats for each NUMA node rather than each (logical)
      CPU.  This avoids using the per-CPU allocator and removes most of the
      kernel-side OVS locking overhead otherwise on the top of perf reports
      and allows OVS to scale better with higher number of threads.
      
      With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
      rate doubles on a server with two hyper-threaded physical CPUs (16
      logical cores each) compared to the current OVS master.  Tested with
      non-trivial flow table with a TCP port match rule forcing all new
      connections with unique port numbers to OVS userspace.  The IP
      addresses are still wildcarded, so the kernel flows are not considered
      as exact match 5-tuple flows.  This type of flows can be expected to
      appear in large numbers as the result of more effective wildcarding
      made possible by improvements in OVS userspace flow classifier.
      
      Perf results for this test (master):
      
      Events: 305K cycles
      +   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
      +   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
      +   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
      +   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      And after this patch:
      
      Events: 356K cycles
      +   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
      +   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
      +   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
      +   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
      +   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      There is a small increase in kernel spinlock overhead due to the same
      spinlock being shared between multiple cores of the same physical CPU,
      but that is barely visible in the netperf TCP_CRR test performance
      (maybe ~1% performance drop, hard to tell exactly due to variance in
      the test results), when testing for kernel module throughput (with no
      userspace activity, handful of kernel flows).
      
      On flow setup, a single stats instance is allocated (for the NUMA node
      0).  As CPUs from multiple NUMA nodes start updating stats, new
      NUMA-node specific stats instances are allocated.  This allocation on
      the packet processing code path is made to never block or look for
      emergency memory pools, minimizing the allocation latency.  If the
      allocation fails, the existing preallocated stats instance is used.
      Also, if only CPUs from one NUMA-node are updating the preallocated
      stats instance, no additional stats instances are allocated.  This
      eliminates the need to pre-allocate stats instances that will not be
      used, also relieving the stats reader from the burden of reading stats
      that are never used.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      63e7959c
    • J
      openvswitch: Remove 5-tuple optimization. · 23dabf88
      Jarno Rajahalme 提交于
      The 5-tuple optimization becomes unnecessary with a later per-NUMA
      node stats patch.  Remove it first to make the changes easier to
      grasp.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      23dabf88
    • D
      openvswitch: use const in some local vars and casts · 7085130b
      Daniele Di Proietto 提交于
      In few functions, const formal parameters are assigned or cast to
      non-const.
      These changes suppress warnings if compiled with -Wcast-qual.
      Signed-off-by: NDaniele Di Proietto <daniele.di.proietto@gmail.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      7085130b
  24. 05 2月, 2014 2 次提交