1. 03 1月, 2015 1 次提交
  2. 10 11月, 2014 2 次提交
  3. 06 11月, 2014 1 次提交
  4. 18 10月, 2014 2 次提交
  5. 06 10月, 2014 3 次提交
  6. 16 9月, 2014 3 次提交
  7. 23 8月, 2014 1 次提交
  8. 30 6月, 2014 1 次提交
  9. 23 5月, 2014 3 次提交
  10. 17 5月, 2014 4 次提交
    • J
      openvswitch: Use TCP flags in the flow key for stats. · 88d73f6c
      Jarno Rajahalme 提交于
      We already extract the TCP flags for the key, might as well use that
      for stats.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      88d73f6c
    • J
      openvswitch: Per NUMA node flow stats. · 63e7959c
      Jarno Rajahalme 提交于
      Keep kernel flow stats for each NUMA node rather than each (logical)
      CPU.  This avoids using the per-CPU allocator and removes most of the
      kernel-side OVS locking overhead otherwise on the top of perf reports
      and allows OVS to scale better with higher number of threads.
      
      With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
      rate doubles on a server with two hyper-threaded physical CPUs (16
      logical cores each) compared to the current OVS master.  Tested with
      non-trivial flow table with a TCP port match rule forcing all new
      connections with unique port numbers to OVS userspace.  The IP
      addresses are still wildcarded, so the kernel flows are not considered
      as exact match 5-tuple flows.  This type of flows can be expected to
      appear in large numbers as the result of more effective wildcarding
      made possible by improvements in OVS userspace flow classifier.
      
      Perf results for this test (master):
      
      Events: 305K cycles
      +   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
      +   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
      +   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
      +   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      And after this patch:
      
      Events: 356K cycles
      +   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
      +   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
      +   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
      +   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
      +   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      There is a small increase in kernel spinlock overhead due to the same
      spinlock being shared between multiple cores of the same physical CPU,
      but that is barely visible in the netperf TCP_CRR test performance
      (maybe ~1% performance drop, hard to tell exactly due to variance in
      the test results), when testing for kernel module throughput (with no
      userspace activity, handful of kernel flows).
      
      On flow setup, a single stats instance is allocated (for the NUMA node
      0).  As CPUs from multiple NUMA nodes start updating stats, new
      NUMA-node specific stats instances are allocated.  This allocation on
      the packet processing code path is made to never block or look for
      emergency memory pools, minimizing the allocation latency.  If the
      allocation fails, the existing preallocated stats instance is used.
      Also, if only CPUs from one NUMA-node are updating the preallocated
      stats instance, no additional stats instances are allocated.  This
      eliminates the need to pre-allocate stats instances that will not be
      used, also relieving the stats reader from the burden of reading stats
      that are never used.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      63e7959c
    • J
      openvswitch: Remove 5-tuple optimization. · 23dabf88
      Jarno Rajahalme 提交于
      The 5-tuple optimization becomes unnecessary with a later per-NUMA
      node stats patch.  Remove it first to make the changes easier to
      grasp.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      23dabf88
    • J
      openvswitch: Use ether_addr_copy · 8c63ff09
      Joe Perches 提交于
      It's slightly smaller/faster for some architectures.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      8c63ff09
  11. 29 3月, 2014 1 次提交
    • F
      openvswitch: fix a possible deadlock and lockdep warning · 4f647e0a
      Flavio Leitner 提交于
      There are two problematic situations.
      
      A deadlock can happen when is_percpu is false because it can get
      interrupted while holding the spinlock. Then it executes
      ovs_flow_stats_update() in softirq context which tries to get
      the same lock.
      
      The second sitation is that when is_percpu is true, the code
      correctly disables BH but only for the local CPU, so the
      following can happen when locking the remote CPU without
      disabling BH:
      
             CPU#0                            CPU#1
        ovs_flow_stats_get()
         stats_read()
       +->spin_lock remote CPU#1        ovs_flow_stats_get()
       |  <interrupted>                  stats_read()
       |  ...                       +-->  spin_lock remote CPU#0
       |                            |     <interrupted>
       |  ovs_flow_stats_update()   |     ...
       |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
       +---------------------------------- spin_lock local CPU#1
      
      This patch disables BH for both cases fixing the deadlocks.
      Acked-by: NJesse Gross <jesse@nicira.com>
      
      =================================
      [ INFO: inconsistent lock state ]
      3.14.0-rc8-00007-g632b06aa #1 Tainted: G          I
      ---------------------------------
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
      (&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
      {SOFTIRQ-ON-W} state was registered at:
      [<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
      [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
      [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
      [<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
      [<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
      [<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
      [<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
      [<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
      [<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
      [<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
      [<ffffffff816c0798>] genl_rcv+0x28/0x40
      [<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
      [<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
      [<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
      [<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
      [<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
      [<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
      [<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
      irq event stamp: 1740726
      hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
      hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
      softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
      softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&cpu_stats->lock)->rlock);
        <Interrupt>
          lock(&(&cpu_stats->lock)->rlock);
      
       *** DEADLOCK ***
      
      5 locks held by swapper/0/0:
       #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
       #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
       #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
       #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
       #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06aa #1
      Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
       0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
       ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
       ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
      Call Trace:
       <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
       [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
       [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
       [<ffffffff810f8963>] mark_lock+0x223/0x2b0
       [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
       [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
       [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
       [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
       [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
       [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
       [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
       [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
       [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
       [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
       [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
       [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
       [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
       [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
       [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
       [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
       [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
       [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
       [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
       [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
       [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
       [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
       [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
       [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
       [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
       [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
       [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
       [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
       [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
       [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
       [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
       [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
       [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
       [<ffffffff8109d47d>] __do_softirq+0x12d/0x480
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f647e0a
  12. 21 3月, 2014 1 次提交
  13. 16 2月, 2014 1 次提交
  14. 07 1月, 2014 1 次提交
    • P
      openvswitch: Per cpu flow stats. · e298e505
      Pravin B Shelar 提交于
      With mega flow implementation ovs flow can be shared between
      multiple CPUs which makes stats updates highly contended
      operation. This patch uses per-CPU stats in cases where a flow
      is likely to be shared (if there is a wildcard in the 5-tuple
      and therefore likely to be spread by RSS). In other situations,
      it uses the current strategy, saving memory and allocation time.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      e298e505
  15. 02 11月, 2013 2 次提交
    • J
      openvswitch: TCP flags matching support. · 5eb26b15
      Jarno Rajahalme 提交于
          tcp_flags=flags/mask
              Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
              bers written in decimal or in hexadecimal prefixed by 0x.   Each
              1-bit  in  mask requires that the corresponding bit in port must
              match.  Each 0-bit in mask causes the corresponding  bit  to  be
              ignored.
      
              TCP  protocol  currently  defines  9 flag bits, and additional 3
              bits are reserved (must be transmitted as zero), see  RFCs  793,
              3168, and 3540.  The flag bits are, numbering from the least
              significant bit:
      
              0: FIN No more data from sender.
      
              1: SYN Synchronize sequence numbers.
      
              2: RST Reset the connection.
      
              3: PSH Push function.
      
              4: ACK Acknowledgement field significant.
      
              5: URG Urgent pointer field significant.
      
              6: ECE ECN Echo.
      
              7: CWR Congestion Windows Reduced.
      
              8: NS  Nonce Sum.
      
              9-11:  Reserved.
      
              12-15: Not matchable, must be zero.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      5eb26b15
    • J
      openvswitch: Widen TCP flags handling. · df23e9f6
      Jarno Rajahalme 提交于
      Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
      The kernel interface remains at 8 bits, which makes no functional
      difference now, as none of the higher bits is currently of interest
      to the userspace.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      df23e9f6
  16. 04 10月, 2013 1 次提交
  17. 12 9月, 2013 1 次提交
  18. 06 9月, 2013 1 次提交
  19. 28 8月, 2013 1 次提交
    • A
      openvswitch: optimize flow compare and mask functions · 5828cd9a
      Andy Zhou 提交于
      Make sure the sw_flow_key structure and valid mask boundaries are always
      machine word aligned. Optimize the flow compare and mask operations
      using machine word size operations. This patch improves throughput on
      average by 15% when CPU is the bottleneck of forwarding packets.
      
      This patch is inspired by ideas and code from a patch submitted by Peter
      Klausler titled "replace memcmp() with specialized comparator".
      However, The original patch only optimizes for architectures
      support unaligned machine word access. This patch optimizes for all
      architectures.
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      5828cd9a
  20. 27 8月, 2013 2 次提交
  21. 24 8月, 2013 2 次提交
  22. 15 8月, 2013 1 次提交
  23. 20 6月, 2013 3 次提交
  24. 15 6月, 2013 1 次提交