1. 23 5月, 2014 2 次提交
    • J
      openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. · 86ec8dba
      Jarno Rajahalme 提交于
      For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
      flow dumps call this with RCU read lock.
      
      ovs_flow_stats_clear() is always called with ovs_mutex, so can use
      ovsl_dereference().
      
      Also, make the ovs_flow_stats_get() 'flow' argument const to make
      later patches cleaner.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      86ec8dba
    • J
      openvswitch: Compact sw_flow_key. · 1139e241
      Jarno Rajahalme 提交于
      Minimize padding in sw_flow_key and move 'tp' top the main struct.
      These changes simplify code when accessing the transport port numbers
      and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
      systems (128->120 bytes).  These changes also make the keys for IPv4
      packets to fit in one cache line.
      
      There is a valid concern for safety of packing the struct
      ovs_key_ipv4_tunnel, as it would be possible to take the address of
      the tun_id member as a __be64 * which could result in unaligned access
      in some systems. However:
      
      - sw_flow_key itself is 64-bit aligned, so the tun_id within is
        always
        64-bit aligned.
      - We never make arrays of ovs_key_ipv4_tunnel (which would force
        every
        second tun_key to be misaligned).
      - We never take the address of the tun_id in to a __be64 *.
      - Whereever we use struct ovs_key_ipv4_tunnel outside the
        sw_flow_key,
        it is in stack (on tunnel input functions), where compiler has full
        control of the alignment.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      1139e241
  2. 17 5月, 2014 2 次提交
    • J
      openvswitch: Per NUMA node flow stats. · 63e7959c
      Jarno Rajahalme 提交于
      Keep kernel flow stats for each NUMA node rather than each (logical)
      CPU.  This avoids using the per-CPU allocator and removes most of the
      kernel-side OVS locking overhead otherwise on the top of perf reports
      and allows OVS to scale better with higher number of threads.
      
      With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
      rate doubles on a server with two hyper-threaded physical CPUs (16
      logical cores each) compared to the current OVS master.  Tested with
      non-trivial flow table with a TCP port match rule forcing all new
      connections with unique port numbers to OVS userspace.  The IP
      addresses are still wildcarded, so the kernel flows are not considered
      as exact match 5-tuple flows.  This type of flows can be expected to
      appear in large numbers as the result of more effective wildcarding
      made possible by improvements in OVS userspace flow classifier.
      
      Perf results for this test (master):
      
      Events: 305K cycles
      +   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
      +   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
      +   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
      +   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      And after this patch:
      
      Events: 356K cycles
      +   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
      +   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
      +   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
      +   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
      +   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      There is a small increase in kernel spinlock overhead due to the same
      spinlock being shared between multiple cores of the same physical CPU,
      but that is barely visible in the netperf TCP_CRR test performance
      (maybe ~1% performance drop, hard to tell exactly due to variance in
      the test results), when testing for kernel module throughput (with no
      userspace activity, handful of kernel flows).
      
      On flow setup, a single stats instance is allocated (for the NUMA node
      0).  As CPUs from multiple NUMA nodes start updating stats, new
      NUMA-node specific stats instances are allocated.  This allocation on
      the packet processing code path is made to never block or look for
      emergency memory pools, minimizing the allocation latency.  If the
      allocation fails, the existing preallocated stats instance is used.
      Also, if only CPUs from one NUMA-node are updating the preallocated
      stats instance, no additional stats instances are allocated.  This
      eliminates the need to pre-allocate stats instances that will not be
      used, also relieving the stats reader from the burden of reading stats
      that are never used.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      63e7959c
    • J
      openvswitch: Remove 5-tuple optimization. · 23dabf88
      Jarno Rajahalme 提交于
      The 5-tuple optimization becomes unnecessary with a later per-NUMA
      node stats patch.  Remove it first to make the changes easier to
      grasp.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      23dabf88
  3. 07 1月, 2014 2 次提交
  4. 02 11月, 2013 2 次提交
    • J
      openvswitch: TCP flags matching support. · 5eb26b15
      Jarno Rajahalme 提交于
          tcp_flags=flags/mask
              Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
              bers written in decimal or in hexadecimal prefixed by 0x.   Each
              1-bit  in  mask requires that the corresponding bit in port must
              match.  Each 0-bit in mask causes the corresponding  bit  to  be
              ignored.
      
              TCP  protocol  currently  defines  9 flag bits, and additional 3
              bits are reserved (must be transmitted as zero), see  RFCs  793,
              3168, and 3540.  The flag bits are, numbering from the least
              significant bit:
      
              0: FIN No more data from sender.
      
              1: SYN Synchronize sequence numbers.
      
              2: RST Reset the connection.
      
              3: PSH Push function.
      
              4: ACK Acknowledgement field significant.
      
              5: URG Urgent pointer field significant.
      
              6: ECE ECN Echo.
      
              7: CWR Congestion Windows Reduced.
      
              8: NS  Nonce Sum.
      
              9-11:  Reserved.
      
              12-15: Not matchable, must be zero.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      5eb26b15
    • J
      openvswitch: Widen TCP flags handling. · df23e9f6
      Jarno Rajahalme 提交于
      Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
      The kernel interface remains at 8 bits, which makes no functional
      difference now, as none of the higher bits is currently of interest
      to the userspace.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      df23e9f6
  5. 04 10月, 2013 1 次提交
  6. 06 9月, 2013 1 次提交
  7. 28 8月, 2013 1 次提交
    • A
      openvswitch: optimize flow compare and mask functions · 5828cd9a
      Andy Zhou 提交于
      Make sure the sw_flow_key structure and valid mask boundaries are always
      machine word aligned. Optimize the flow compare and mask operations
      using machine word size operations. This patch improves throughput on
      average by 15% when CPU is the bottleneck of forwarding packets.
      
      This patch is inspired by ideas and code from a patch submitted by Peter
      Klausler titled "replace memcmp() with specialized comparator".
      However, The original patch only optimizes for architectures
      support unaligned machine word access. This patch optimizes for all
      architectures.
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      5828cd9a
  8. 27 8月, 2013 2 次提交
  9. 24 8月, 2013 1 次提交
    • A
      openvswitch: Mega flow implementation · 03f0d916
      Andy Zhou 提交于
      Add wildcarded flow support in kernel datapath.
      
      Wildcarded flow can improve OVS flow set up performance by avoid sending
      matching new flows to the user space program. The exact performance boost
      will largely dependent on wildcarded flow hit rate.
      
      In case all new flows hits wildcard flows, the flow set up rate is
      within 5% of that of linux bridge module.
      
      Pravin has made significant contributions to this patch. Including API
      clean ups and bug fixes.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      03f0d916
  10. 20 6月, 2013 5 次提交
  11. 15 6月, 2013 1 次提交
  12. 30 3月, 2013 1 次提交
  13. 27 11月, 2012 1 次提交
  14. 04 9月, 2012 2 次提交
  15. 04 5月, 2012 1 次提交
  16. 04 12月, 2011 1 次提交
    • J
      net: Add Open vSwitch kernel components. · ccb1352e
      Jesse Gross 提交于
      Open vSwitch is a multilayer Ethernet switch targeted at virtualized
      environments.  In addition to supporting a variety of features
      expected in a traditional hardware switch, it enables fine-grained
      programmatic extension and flow-based control of the network.
      This control is useful in a wide variety of applications but is
      particularly important in multi-server virtualization deployments,
      which are often characterized by highly dynamic endpoints and the need
      to maintain logical abstractions for multiple tenants.
      
      The Open vSwitch datapath provides an in-kernel fast path for packet
      forwarding.  It is complemented by a userspace daemon, ovs-vswitchd,
      which is able to accept configuration from a variety of sources and
      translate it into packet processing rules.
      
      See http://openvswitch.org for more information and userspace
      utilities.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      ccb1352e