提交 · 329f45bc4f191c663dc156c510816411a4310578 · OpenHarmony / kernel_linux

13 11月, 2016 1 次提交

openvswitch: add mac_proto field to the flow key · 329f45bc

由 Jiri Benc 提交于 11月 10, 2016

Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.

It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.

Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

329f45bc

13 10月, 2016 1 次提交

openvswitch: vlan: remove wrong likely statement · 20ecf1e4

由 Jiri Benc 提交于 10月 10, 2016

This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.

Fixes: 018c1dda ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NEric Garver <e@erig.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20ecf1e4

03 10月, 2016 1 次提交

openvswitch: mpls: set network header correctly on key extract · f7d49bce

由 Jiri Benc 提交于 9月 30, 2016

After the 48d2ab60 ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab60 ("net: mpls: Fixups for GSO")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7d49bce

21 9月, 2016 1 次提交

openvswitch: avoid resetting flow key while installing new flow. · 2279994d

由 pravin shelar 提交于 9月 19, 2016

since commit commit db74a333 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2279994d

19 9月, 2016 2 次提交

openvswitch: use percpu flow stats · db74a333

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: pravin shelar <pshelar@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db74a333

openvswitch: fix flow stats accounting when node 0 is not possible · 40773966

由 Thadeu Lima de Souza Cascardo 提交于 9月 15, 2016

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40773966

09 9月, 2016 1 次提交

openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes · 018c1dda

由 Eric Garver 提交于 9月 07, 2016

Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.
Signed-off-by: NThomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: NEric Garver <e@erig.me>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

018c1dda

07 10月, 2015 1 次提交

openvswitch: add tunnel protocol to sw_flow_key · 00a93bab

由 Jiri Benc 提交于 10月 05, 2015

Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00a93bab

01 9月, 2015 1 次提交

ip-tunnel: Use API to access tunnel metadata options. · 4c222798

由 Pravin B Shelar 提交于 8月 30, 2015

Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Reviewed-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c222798

30 8月, 2015 3 次提交

openvswitch: Remove vport-net · a581b96d

由 Pravin B Shelar 提交于 8月 29, 2015

This structure is not used anymore.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a581b96d

openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers · c30da497

由 Simon Horman 提交于 8月 29, 2015

When an error occurs skipping IPv6 extension headers retain the already
parsed IP protocol and IPv6 addresses in the flow. Also assume that the
packet is not a fragment in the absence of information to the contrary;
that is always use the frag_off value set by ipv6_skip_exthdr().

This allows matching on the IP protocol and IPv6 addresses of packets
with malformed extension headers.
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c30da497

ip_tunnels: record IP version in tunnel info · 7f9562a1

由 Jiri Benc 提交于 8月 28, 2015

There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.

Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f9562a1

28 8月, 2015 2 次提交

openvswitch: Allow matching on conntrack label · c2ac6673

由 Joe Stringer 提交于 8月 26, 2015

Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2ac6673

openvswitch: Add conntrack action · 7f8a436e

由 Joe Stringer 提交于 8月 26, 2015

Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.
Signed-off-by: NJoe Stringer <joestringer@nicira.com>
Signed-off-by: NJustin Pettit <jpettit@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f8a436e

22 7月, 2015 1 次提交

ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic · 1d8fff90

由 Thomas Graf 提交于 7月 21, 2015

Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d8fff90

06 5月, 2015 1 次提交

openvswitch: Use eth_proto_is_802_3 · 6713fc9b

由 Alexander Duyck 提交于 5月 04, 2015

Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6713fc9b

15 4月, 2015 1 次提交

mm: remove GFP_THISNODE · 4167e9b2

由 David Rientjes 提交于 4月 14, 2015

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected.  It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE.  The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should.  It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4167e9b2

12 2月, 2015 1 次提交

openvswitch: Reset key metadata for packet execution. · b35725a2

由 Pravin B Shelar 提交于 2月 10, 2015

Userspace packet execute command pass down flow key for given
packet. But userspace can skip some parameter with zero value.
Therefore kernel needs to initialize key metadata to zero.

Fixes: 07148121 ("openvswitch: Eliminate memset() from flow_extract.")
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b35725a2

15 1月, 2015 1 次提交

openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() · d91641d9

由 Thomas Graf 提交于 1月 15, 2015

Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d91641d9

14 1月, 2015 1 次提交

net: rename vlan_tx_* helpers since "tx" is misleading there · df8a39de

由 Jiri Pirko 提交于 1月 13, 2015

The same macros are used for rx as well. So rename it.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df8a39de

03 1月, 2015 1 次提交

openvswitch: Consistently include VLAN header in flow and port stats. · 24cc59d1

由 Ben Pfaff 提交于 12月 31, 2014

Until now, when VLAN acceleration was in use, the bytes of the VLAN header
were not included in port or flow byte counters.  They were however
included when VLAN acceleration was not used.  This commit corrects the
inconsistency, by always including the VLAN header in byte counters.

Previous discussion at
http://openvswitch.org/pipermail/dev/2014-December/049521.htmlReported-by: NMotonori Shindo <mshindo@vmware.com>
Signed-off-by: NBen Pfaff <blp@nicira.com>
Reviewed-by: NFlavio Leitner <fbl@sysclose.org>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24cc59d1

10 11月, 2014 2 次提交

openvswitch: Add support for OVS_FLOW_ATTR_PROBE. · 05da5898

由 Jarno Rajahalme 提交于 11月 06, 2014

This new flag is useful for suppressing error logging while probing
for datapath features using flow commands.  For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

05da5898

openvswitch: Constify various function arguments · 12eb18f7

由 Thomas Graf 提交于 11月 06, 2014

Help produce better optimized code.
Signed-off-by: NThomas Graf <tgraf@noironetworks.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

12eb18f7

06 11月, 2014 1 次提交

openvswitch: Add basic MPLS support to kernel · 25cd9ba0

由 Simon Horman 提交于 10月 06, 2014

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

25cd9ba0

18 10月, 2014 2 次提交

openvswitch: Set flow-key members. · 25ef1328

由 Pravin B Shelar 提交于 10月 17, 2014

This patch adds missing memset which are required to initialize
flow key member. For example for IP flow we need to initialize
ip.frag for all cases.

Found by inspection.

This bug is introduced by commit 07148121
("openvswitch: Eliminate memset() from flow_extract").
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25ef1328

openvswitch: fix a use after free · 389f4894

由 Li RongQing 提交于 10月 17, 2014

pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory

Fixes: 07148121 ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

389f4894

06 10月, 2014 3 次提交

openvswitch: Add support for Geneve tunneling. · f5796684

由 Jesse Gross 提交于 10月 03, 2014

The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Singed-off-by: NAnsis Atteka <aatteka@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Acked-by: NThomas Graf <tgraf@noironetworks.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5796684

openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure. · f0b128c1

由 Jesse Gross 提交于 10月 03, 2014

Currently, the flow information that is matched for tunnels and
the tunnel data passed around with packets is the same. However,
as additional information is added this is not necessarily desirable,
as in the case of pointers.

This adds a new structure for tunnel metadata which currently contains
only the existing struct. This change is purely internal to the kernel
since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version
of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0b128c1

openvswitch: Eliminate memset() from flow_extract. · 07148121

由 Jesse Gross 提交于 10月 03, 2014

As new protocols are added, the size of the flow key tends to
increase although few protocols care about all of the fields. In
order to optimize this for hashing and matching, OVS uses a variable
length portion of the key. However, when fields are extracted from
the packet we must still zero out the entire key.

This is no longer necessary now that OVS implements masking. Any
fields (or holes in the structure) which are not part of a given
protocol will be by definition not part of the mask and zeroed out
during lookup. Furthermore, since masking already uses variable
length keys this zeroing operation automatically benefits as well.

In principle, the only thing that needs to be done at this point
is remove the memset() at the beginning of flow. However, some
fields assume that they are initialized to zero, which now must be
done explicitly. In addition, in the event of an error we must also
zero out corresponding fields to signal that there is no valid data
present. These increase the total amount of code but very little of
it is executed in non-error situations.

Removing the memset() reduces the profile of ovs_flow_extract()
from 0.64% to 0.56% when tested with large packets on a 10G link.
Suggested-by: NPravin Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07148121

16 9月, 2014 3 次提交

openvswitch: Add recirc and hash action. · 971427f3

由 Andy Zhou 提交于 9月 15, 2014

Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
          Action: output port_bond_a
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

971427f3

openvswitch: Use tun_key only for egress tunnel path. · 8c8b1b83

由 Pravin B Shelar 提交于 9月 15, 2014

Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion.  Following
patch removes its use on ingress path make it egress only parameter.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>

8c8b1b83

openvswitch: refactor ovs flow extract API. · 83c8df26

由 Pravin B Shelar 提交于 9月 15, 2014

OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NAndy Zhou <azhou@nicira.com>

83c8df26

23 8月, 2014 1 次提交

net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer() · 8c6b00c8

由 Andreea-Cristina Bernat 提交于 8月 17, 2014

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c6b00c8

30 6月, 2014 1 次提交

openvswitch: Fix tracking of flags seen in TCP flows. · ad552007

由 Ben Pfaff 提交于 5月 06, 2014

Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c (openvswitch: Use
TCP flags in the flow key for stats.)
Reported-by: NLen Gao <leng@vmware.com>
Signed-off-by: NBen Pfaff <blp@nicira.com>
Acked-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

ad552007

23 5月, 2014 3 次提交

openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. · 86ec8dba

由 Jarno Rajahalme 提交于 5月 05, 2014

For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

86ec8dba

openvswitch: Clarify locking. · bb6f9a70

由 Jarno Rajahalme 提交于 5月 05, 2014

Remove unnecessary locking from functions that are always called with
appropriate locking.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NThomas Graf <tgraf@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

bb6f9a70

openvswitch: Compact sw_flow_key. · 1139e241

由 Jarno Rajahalme 提交于 5月 05, 2014

Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

1139e241

17 5月, 2014 3 次提交

openvswitch: Use TCP flags in the flow key for stats. · 88d73f6c

由 Jarno Rajahalme 提交于 3月 27, 2014

We already extract the TCP flags for the key, might as well use that
for stats.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

88d73f6c

openvswitch: Per NUMA node flow stats. · 63e7959c

由 Jarno Rajahalme 提交于 3月 27, 2014

Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

63e7959c

openvswitch: Remove 5-tuple optimization. · 23dabf88

由 Jarno Rajahalme 提交于 3月 27, 2014

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

23dabf88

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多