提交 · e0bb8c44ed5cfcc56b571758ed966ee48779024c · openeuler / raspberrypi-kernel

23 5月, 2014 2 次提交

openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. · 86ec8dba

由 Jarno Rajahalme 提交于 5月 05, 2014

For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

86ec8dba

openvswitch: Compact sw_flow_key. · 1139e241

由 Jarno Rajahalme 提交于 5月 05, 2014

Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

1139e241

17 5月, 2014 2 次提交

openvswitch: Per NUMA node flow stats. · 63e7959c

由 Jarno Rajahalme 提交于 3月 27, 2014

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

63e7959c

openvswitch: Remove 5-tuple optimization. · 23dabf88

由 Jarno Rajahalme 提交于 3月 27, 2014

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

23dabf88

07 1月, 2014 2 次提交

openvswitch: Per cpu flow stats. · e298e505

由 Pravin B Shelar 提交于 10月 29, 2013

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

e298e505

openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit). · 8f49ce11

由 Ben Pfaff 提交于 11月 25, 2013

We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.
Signed-off-by: NBen Pfaff <blp@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

8f49ce11

02 11月, 2013 2 次提交

openvswitch: TCP flags matching support. · 5eb26b15

由 Jarno Rajahalme 提交于 10月 23, 2013

    tcp_flags=flags/mask
        Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
        bers written in decimal or in hexadecimal prefixed by 0x.   Each
        1-bit  in  mask requires that the corresponding bit in port must
        match.  Each 0-bit in mask causes the corresponding  bit  to  be
        ignored.

        TCP  protocol  currently  defines  9 flag bits, and additional 3
        bits are reserved (must be transmitted as zero), see  RFCs  793,
        3168, and 3540.  The flag bits are, numbering from the least
        significant bit:

        0: FIN No more data from sender.

        1: SYN Synchronize sequence numbers.

        2: RST Reset the connection.

        3: PSH Push function.

        4: ACK Acknowledgement field significant.

        5: URG Urgent pointer field significant.

        6: ECE ECN Echo.

        7: CWR Congestion Windows Reduced.

        8: NS  Nonce Sum.

        9-11:  Reserved.

        12-15: Not matchable, must be zero.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

5eb26b15

openvswitch: Widen TCP flags handling. · df23e9f6

由 Jarno Rajahalme 提交于 10月 23, 2013

Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

df23e9f6

04 10月, 2013 1 次提交

openvswitch: Restructure datapath.c and flow.c · e6445719

由 Pravin B Shelar 提交于 10月 03, 2013

Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

e6445719

06 9月, 2013 1 次提交

openvswitch: Fix alignment of struct sw_flow_key. · 0d40f75b

由 Jesse Gross 提交于 9月 05, 2013

sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou <azhou@nicira.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d40f75b

28 8月, 2013 1 次提交

openvswitch: optimize flow compare and mask functions · 5828cd9a

由 Andy Zhou 提交于 8月 27, 2013

Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

5828cd9a

27 8月, 2013 2 次提交

openvswitch: Rename key_len to key_end · 02237373

由 Andy Zhou 提交于 8月 22, 2013

Key_end is a better name describing the ending boundary than key_len.
Rename those variables to make it less confusing.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

02237373

openvswitch: Add SCTP support · a175a723

由 Joe Stringer 提交于 8月 22, 2013

This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.
Reviewed-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

a175a723

24 8月, 2013 1 次提交

openvswitch: Mega flow implementation · 03f0d916

由 Andy Zhou 提交于 8月 07, 2013

Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

03f0d916

20 6月, 2013 5 次提交

openvswitch: Add gre tunnel support. · aa310701

由 Pravin B Shelar 提交于 6月 17, 2013

Add gre vport implementation.  Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa310701

openvswitch: Optimize flow key match for non tunnel flows. · a3e82996

由 Pravin B Shelar 提交于 6月 17, 2013

Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3e82996

openvswitch: Expand action buffer size. · ffe3f432

由 Pravin B Shelar 提交于 6月 17, 2013

MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action
needs extra space on action list, for now increase max actions list limit.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffe3f432

openvswitch: Add tunneling interface. · 7d5437c7

由 Pravin B Shelar 提交于 6月 17, 2013

Add ovs tunnel interface for set tunnel action for userspace.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d5437c7

openvswitch: Copy individual actions. · 74f84a57

由 Pravin B Shelar 提交于 6月 17, 2013

Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74f84a57

15 6月, 2013 1 次提交

openvswitch: Simplify interface ovs_flow_metadata_from_nlattrs() · 93d8fd15

由 Pravin B Shelar 提交于 6月 13, 2013

This is not functional change, this is just code cleanup.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

93d8fd15

30 3月, 2013 1 次提交

openvswitch: Refine Netlink message size calculation and kill FLOW_BUFSIZE · c3ff8cfe

由 Thomas Graf 提交于 3月 29, 2013

Kills the FLOW_BUFSIZE constant which needs to be calculated manually
and replaces it with key_attr_size() based on nla_total_size().
Calculates the size of datapath messages instead of relying on
NLMSG_DEFAULT_SIZE and moves the existing message size calculations
into own functions for clarity.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NJesse Gross <jesse@nicira.com>

c3ff8cfe

27 11月, 2012 1 次提交

openvswitch: add skb mark matching and set action · 39c7caeb

由 Ansis Atteka 提交于 11月 26, 2012

This patch adds support for skb mark matching and set action.
Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

39c7caeb

04 9月, 2012 2 次提交

openvswitch: Increase maximum number of datapath ports. · 15eac2a7

由 Pravin B Shelar 提交于 8月 23, 2012

Use hash table to store ports of datapath. Allow 64K ports per switch.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

15eac2a7

openvswitch: Fix FLOW_BUFSIZE definition. · c303aa94

由 Jesse Gross 提交于 9月 03, 2012

The vlan encapsulation fields in the maximum flow defintion were
never updated when the representation changed before upstreaming.
In theory this could cause a kernel panic when a maximum length
flow is used. In practice this has never happened (to my knowledge)
because skb allocations are padded out to a cache line so you would
need the right combination of flow and packet being sent to userspace.
Signed-off-by: NJesse Gross <jesse@nicira.com>

c303aa94

04 5月, 2012 1 次提交

openvswitch: Replace Nicira Networks. · caf2ee14

由 Raju Subramanian 提交于 5月 03, 2012

Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Signed-off-by: NRaju Subramanian <rsubramanian@nicira.com>
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

caf2ee14

04 12月, 2011 1 次提交

net: Add Open vSwitch kernel components. · ccb1352e

由 Jesse Gross 提交于 10月 25, 2011

Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments.  In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and flow-based control of the network.
This control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the need
to maintain logical abstractions for multiple tenants.

The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding.  It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.

See http://openvswitch.org for more information and userspace
utilities.
Signed-off-by: NJesse Gross <jesse@nicira.com>

ccb1352e