1. 13 11月, 2016 1 次提交
    • J
      openvswitch: add mac_proto field to the flow key · 329f45bc
      Jiri Benc 提交于
      Use a hole in the structure. We support only Ethernet so far and will add
      a support for L2-less packets shortly. We could use a bool to indicate
      whether the Ethernet header is present or not but the approach with the
      mac_proto field is more generic and occupies the same number of bytes in the
      struct, while allowing later extensibility. It also makes the code in the
      next patches more self explaining.
      
      It would be nice to use ARPHRD_ constants but those are u16 which would be
      waste. Thus define our own constants.
      
      Another upside of this is that we can overload this new field to also denote
      whether the flow key is valid. This has the advantage that on
      refragmentation, we don't have to reparse the packet but can rely on the
      stored eth.type. This is especially important for the next patches in this
      series - instead of adding another branch for L2-less packets before calling
      ovs_fragment, we can just remove all those branches completely.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      329f45bc
  2. 13 10月, 2016 1 次提交
  3. 03 10月, 2016 1 次提交
  4. 21 9月, 2016 1 次提交
  5. 19 9月, 2016 2 次提交
  6. 09 9月, 2016 1 次提交
  7. 07 10月, 2015 1 次提交
  8. 01 9月, 2015 1 次提交
  9. 30 8月, 2015 3 次提交
  10. 28 8月, 2015 2 次提交
    • J
      openvswitch: Allow matching on conntrack label · c2ac6673
      Joe Stringer 提交于
      Allow matching and setting the ct_label field. As with ct_mark, this is
      populated by executing the CT action. The label field may be modified by
      specifying a label and mask nested under the CT action. It is stored as
      metadata attached to the connection. Label modification occurs after
      lookup, and will only persist when the conntrack entry is committed by
      providing the COMMIT flag to the CT action. Labels are currently fixed
      to 128 bits in size.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2ac6673
    • J
      openvswitch: Add conntrack action · 7f8a436e
      Joe Stringer 提交于
      Expose the kernel connection tracker via OVS. Userspace components can
      make use of the CT action to populate the connection state (ct_state)
      field for a flow. This state can be subsequently matched.
      
      Exposed connection states are OVS_CS_F_*:
      - NEW (0x01) - Beginning of a new connection.
      - ESTABLISHED (0x02) - Part of an existing connection.
      - RELATED (0x04) - Related to an established connection.
      - INVALID (0x20) - Could not track the connection for this packet.
      - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
      - TRACKED (0x80) - This packet has been sent through conntrack.
      
      When the CT action is executed by itself, it will send the packet
      through the connection tracker and populate the ct_state field with one
      or more of the connection state flags above. The CT action will always
      set the TRACKED bit.
      
      When the COMMIT flag is passed to the conntrack action, this specifies
      that information about the connection should be stored. This allows
      subsequent packets for the same (or related) connections to be
      correlated with this connection. Sending subsequent packets for the
      connection through conntrack allows the connection tracker to consider
      the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
      
      The CT action may optionally take a zone to track the flow within. This
      allows connections with the same 5-tuple to be kept logically separate
      from connections in other zones. If the zone is specified, then the
      "ct_zone" match field will be subsequently populated with the zone id.
      
      IP fragments are handled by transparently assembling them as part of the
      CT action. The maximum received unit (MRU) size is tracked so that
      refragmentation can occur during output.
      
      IP frag handling contributed by Andy Zhou.
      
      Based on original design by Justin Pettit.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Signed-off-by: NJustin Pettit <jpettit@nicira.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f8a436e
  11. 22 7月, 2015 1 次提交
  12. 06 5月, 2015 1 次提交
  13. 15 4月, 2015 1 次提交
    • D
      mm: remove GFP_THISNODE · 4167e9b2
      David Rientjes 提交于
      NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
      
      GFP_THISNODE is a secret combination of gfp bits that have different
      behavior than expected.  It is a combination of __GFP_THISNODE,
      __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
      allocator slowpath to fail without trying reclaim even though it may be
      used in combination with __GFP_WAIT.
      
      An example of the problem this creates: commit e97ca8e5 ("mm: fix
      GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
      that really just wanted __GFP_THISNODE.  The problem doesn't end there,
      however, because even it was a no-op for alloc_misplaced_dst_page(),
      which also sets __GFP_NORETRY and __GFP_NOWARN, and
      migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
      is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
      no-op in these cases since the page allocator special-cases
      __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
      
      It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
      to restrict an allocation to a local node, but remove GFP_THISNODE and
      its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
      wants to avoid reclaim.
      
      This allows the aforementioned functions to actually reclaim as they
      should.  It also enables any future callers that want to do
      __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
      rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.
      
      Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
      it is unchanged.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4167e9b2
  14. 12 2月, 2015 1 次提交
  15. 15 1月, 2015 1 次提交
  16. 14 1月, 2015 1 次提交
  17. 03 1月, 2015 1 次提交
  18. 10 11月, 2014 2 次提交
  19. 06 11月, 2014 1 次提交
  20. 18 10月, 2014 2 次提交
  21. 06 10月, 2014 3 次提交
  22. 16 9月, 2014 3 次提交
  23. 23 8月, 2014 1 次提交
  24. 30 6月, 2014 1 次提交
  25. 23 5月, 2014 3 次提交
  26. 17 5月, 2014 3 次提交
    • J
      openvswitch: Use TCP flags in the flow key for stats. · 88d73f6c
      Jarno Rajahalme 提交于
      We already extract the TCP flags for the key, might as well use that
      for stats.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      88d73f6c
    • J
      openvswitch: Per NUMA node flow stats. · 63e7959c
      Jarno Rajahalme 提交于
      Keep kernel flow stats for each NUMA node rather than each (logical)
      CPU.  This avoids using the per-CPU allocator and removes most of the
      kernel-side OVS locking overhead otherwise on the top of perf reports
      and allows OVS to scale better with higher number of threads.
      
      With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
      rate doubles on a server with two hyper-threaded physical CPUs (16
      logical cores each) compared to the current OVS master.  Tested with
      non-trivial flow table with a TCP port match rule forcing all new
      connections with unique port numbers to OVS userspace.  The IP
      addresses are still wildcarded, so the kernel flows are not considered
      as exact match 5-tuple flows.  This type of flows can be expected to
      appear in large numbers as the result of more effective wildcarding
      made possible by improvements in OVS userspace flow classifier.
      
      Perf results for this test (master):
      
      Events: 305K cycles
      +   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
      +   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
      +   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
      +   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      And after this patch:
      
      Events: 356K cycles
      +   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
      +   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
      +   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
      +   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
      +   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      There is a small increase in kernel spinlock overhead due to the same
      spinlock being shared between multiple cores of the same physical CPU,
      but that is barely visible in the netperf TCP_CRR test performance
      (maybe ~1% performance drop, hard to tell exactly due to variance in
      the test results), when testing for kernel module throughput (with no
      userspace activity, handful of kernel flows).
      
      On flow setup, a single stats instance is allocated (for the NUMA node
      0).  As CPUs from multiple NUMA nodes start updating stats, new
      NUMA-node specific stats instances are allocated.  This allocation on
      the packet processing code path is made to never block or look for
      emergency memory pools, minimizing the allocation latency.  If the
      allocation fails, the existing preallocated stats instance is used.
      Also, if only CPUs from one NUMA-node are updating the preallocated
      stats instance, no additional stats instances are allocated.  This
      eliminates the need to pre-allocate stats instances that will not be
      used, also relieving the stats reader from the burden of reading stats
      that are never used.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      63e7959c
    • J
      openvswitch: Remove 5-tuple optimization. · 23dabf88
      Jarno Rajahalme 提交于
      The 5-tuple optimization becomes unnecessary with a later per-NUMA
      node stats patch.  Remove it first to make the changes easier to
      grasp.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      23dabf88