提交 · 9c5d03d362519f36cd551aec596388f895c93d2d · openeuler / Kernel

29 8月, 2022 1 次提交

genetlink: start to validate reserved header bytes · 9c5d03d3

由 Jakub Kicinski 提交于 8月 24, 2022

We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c5d03d3

27 8月, 2022 2 次提交

openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests · 347541e2

由 Andrey Zhadchenko 提交于 8月 25, 2022

CRIU needs OVS_DP_ATTR_PER_CPU_PIDS to checkpoint/restore newest
openvswitch versions.
Add pids to generic datapath reply. Limit exported pids amount to
nr_cpu_ids.
Signed-off-by: NAndrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Acked-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

347541e2

openvswitch: allow specifying ifindex of new interfaces · 54c4ef34

由 Andrey Zhadchenko 提交于 8月 25, 2022

CRIU is preserving ifindexes of net devices after restoration. However,
current Open vSwitch API does not allow to target ifindex, so we cannot
correctly restore OVS configuration.

Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired
ifindex.
Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
ifindex.
Signed-off-by: NAndrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Acked-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

54c4ef34

22 8月, 2022 2 次提交

openvswitch: Fix overreporting of drops in dropwatch · c21ab2af

由 Mike Pattrick 提交于 8月 17, 2022

Currently queue_userspace_packet will call kfree_skb for all frames,
whether or not an error occurred. This can result in a single dropped
frame being reported as multiple drops in dropwatch. This functions
caller may also call kfree_skb in case of an error. This patch will
consume the skbs instead and allow caller's to use kfree_skb.
Signed-off-by: NMike Pattrick <mkp@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c21ab2af

openvswitch: Fix double reporting of drops in dropwatch · 1100248a

由 Mike Pattrick 提交于 8月 17, 2022

Frames sent to userspace can be reported as dropped in
ovs_dp_process_packet, however, if they are dropped in the netlink code
then netlink_attachskb will report the same frame as dropped.

This patch checks for error codes which indicate that the frame has
already been freed.
Signed-off-by: NMike Pattrick <mkp@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1100248a

05 2月, 2022 1 次提交

net/sched: Enable tc skb ext allocation on chain miss only when needed · 35d39fec

由 Paul Blakey 提交于 2月 03, 2022

Currently tc skb extension is used to send miss info from
tc to ovs datapath module, and driver to tc. For the tc to ovs
miss it is currently always allocated even if it will not
be used by ovs datapath (as it depends on a requested feature).

Export the static key which is used by openvswitch module to
guard this code path as well, so it will be skipped if ovs
datapath doesn't need it. Enable this code path once
ovs datapath needs it.
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35d39fec

27 7月, 2021 2 次提交

openvswitch: fix sparse warning incorrect type · 076999e4

由 Mark Gray 提交于 7月 23, 2021

fix incorrect type in argument 1 (different address spaces)

../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17:    expected void const *
../net/openvswitch/datapath.c:169:17:    got struct dp_nlsk_pids [noderef] __rcu *upcall_portids

Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159Signed-off-by: NMark Gray <mark.d.gray@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

076999e4

openvswitch: fix alignment issues · 784dcfa5

由 Mark Gray 提交于 7月 23, 2021

Signed-off-by: NMark Gray <mark.d.gray@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

784dcfa5

17 7月, 2021 1 次提交

openvswitch: Introduce per-cpu upcall dispatch · b83d23a2

由 Mark Gray 提交于 7月 15, 2021

The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.

This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:

* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)

This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.

In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:

a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.

The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html

Bugzilla: https://bugzilla.redhat.com/1844576Signed-off-by: NMark Gray <mark.d.gray@redhat.com>
Acked-by: NFlavio Leitner <fbl@sysclose.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b83d23a2

23 6月, 2021 1 次提交

openvswitch: add trace points · c4ab7b56

由 Aaron Conole 提交于 6月 22, 2021

This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline.  When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command.  However, such a command has some
limitations:

  1. When trying to trace packets that go through the CT action,
     the state of the packet can't be determined, and probably
     would be potentially wrong.

  2. Deducing problem packets can sometimes be difficult as well
     even if many of the flows are known

  3. It's possible to use the openvswitch module even without
     the ovs-vswitchd (although, not common use).

Introduce the event tracing points here to make it possible for
working through these problems in kernel space.  The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.
Signed-off-by: NAaron Conole <aconole@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4ab7b56

04 11月, 2020 1 次提交

net: openvswitch: silence suspicious RCU usage warning · fea07a48

由 Eelco Chaudron 提交于 11月 03, 2020

Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().

In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.

Fixes: 9bf24f59 ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuildSigned-off-by: NJakub Kicinski <kuba@kernel.org>

fea07a48

03 10月, 2020 1 次提交

genetlink: move to smaller ops wherever possible · 66a9b928

由 Jakub Kicinski 提交于 10月 02, 2020

Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66a9b928

02 9月, 2020 2 次提交

net: openvswitch: fixes crash if nf_conncount_init() fails · e0afe914

由 Eelco Chaudron 提交于 8月 31, 2020

If nf_conncount_init fails currently the dispatched work is not canceled,
causing problems when the timer fires. This change fixes this by not
scheduling the work until all initialization is successful.

Fixes: a65878d6 ("net: openvswitch: fixes potential deadlock in dp cleanup code")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0afe914

net: openvswitch: improve the coding style · cf3266ad

由 Tonghao Zhang 提交于 9月 01, 2020

Not change the logic, just improve the coding style.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf3266ad

14 8月, 2020 1 次提交

net: openvswitch: introduce common code for flushing flows · 1f3a090b

由 Tonghao Zhang 提交于 8月 12, 2020

To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: NJohan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f3a090b

04 8月, 2020 2 次提交

net: openvswitch: make masks cache size configurable · 9bf24f59

由 Eelco Chaudron 提交于 7月 31, 2020

This patch makes the masks cache size configurable, or with
a size of 0, disable it.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf24f59

net: openvswitch: add masks cache hit counter · 9d2f627b

由 Eelco Chaudron 提交于 7月 31, 2020

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d2f627b

25 7月, 2020 1 次提交

net: openvswitch: fixes potential deadlock in dp cleanup code · a65878d6

由 Eelco Chaudron 提交于 7月 24, 2020

The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.

Fixes: eac87c41 ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: NPaolo <pabeni@redhat.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a65878d6

18 7月, 2020 1 次提交

net: openvswitch: reorder masks array based on usage · eac87c41

由 Eelco Chaudron 提交于 7月 15, 2020

This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.

Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.

Kernel without this patch     :  7.71 Gbps
Kernel with this patch applied: 14.52 Gbps

We also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Tested-by: NAndrew Theurer <atheurer@redhat.com>
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eac87c41

21 4月, 2020 1 次提交

net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce

由 Tonghao Zhang 提交于 4月 17, 2020

syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27de77ce

30 3月, 2020 1 次提交

net: Fix typo of SKB_SGO_CB_OFFSET · a08e7fd9

由 Cambda Zhu 提交于 3月 26, 2020

The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a08e7fd9

04 3月, 2020 1 次提交

openvswitch: add missing attribute validation for hash · b5ab1f1b

由 Jakub Kicinski 提交于 3月 02, 2020

Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7 ("net: openvswitch: add hash info to upcall")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5ab1f1b

19 2月, 2020 1 次提交

datapath.c: Use built-in RCU list checking · 53742e69

由 Madhuparna Bhowmik 提交于 2月 19, 2020

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.
Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53742e69

15 1月, 2020 1 次提交

net: openvswitch: use skb_list_walk_safe helper for gso segments · 2cec4448

由 Jason A. Donenfeld 提交于 1月 13, 2020

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cec4448

10 12月, 2019 1 次提交

treewide: Use sizeof_field() macro · c593642c

由 Pankaj Bharadiya 提交于 12月 09, 2019

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net

c593642c

02 12月, 2019 2 次提交

openvswitch: remove another BUG_ON() · 8a574f86

由 Paolo Abeni 提交于 12月 01, 2019

If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.

Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.

v1 -> v2:
 - do not leak an skb on error

Fixes: aed06778 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a574f86

openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() · 8ffeb03f

由 Paolo Abeni 提交于 12月 01, 2019

All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.

v1 -> v2:
 - clarify the commit message
 - clean the skb and report the error (DaveM)

Fixes: ccb1352e ("net: Add Open vSwitch kernel components.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ffeb03f

27 11月, 2019 1 次提交

openvswitch: fix flow command message size · 4e81c0b3

由 Paolo Abeni 提交于 11月 26, 2019

When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
flow has no UFID, we can exceed the computed size, as
ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
attribute.
Take the above in account when computing the flow command message
size.

Fixes: 74ed7ab9 ("openvswitch: Add support for unique flow IDs.")
Reported-by: NQi Jun Ding <qding@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e81c0b3

16 11月, 2019 1 次提交

net: openvswitch: don't call pad_packet if not necessary · 61ca533c

由 Tonghao Zhang 提交于 11月 14, 2019

The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put		attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put		attrlen
  -> __nla_reserve	attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61ca533c

15 11月, 2019 1 次提交

net: openvswitch: add hash info to upcall · bd1903b7

由 Tonghao Zhang 提交于 11月 13, 2019

When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd1903b7

04 11月, 2019 3 次提交

net: openvswitch: simplify the ovs_dp_cmd_new · eec62ead

由 Tonghao Zhang 提交于 11月 01, 2019

use the specified functions to init resource.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eec62ead

net: openvswitch: don't unlock mutex when changing the user_features fails · 4c76bf69

由 Tonghao Zhang 提交于 11月 01, 2019

Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.

Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c76bf69

net: openvswitch: add flow-mask cache for performance · 04b7d136

由 Tonghao Zhang 提交于 11月 01, 2019

The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.

Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.

Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b7d136

26 10月, 2019 1 次提交

netns: fix GFP flags in rtnl_net_notifyid() · d4e4fdf9

由 Guillaume Nault 提交于 10月 23, 2019

In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
but there are a few paths calling rtnl_net_notifyid() from atomic
context or from RCU critical sections. The later also precludes the use
of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
call is wrong too, as it uses GFP_KERNEL unconditionally.

Therefore, we need to pass the GFP flags as parameter and propagate it
through function calls until the proper flags can be determined.

In most cases, GFP_KERNEL is fine. The exceptions are:
  * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
    indirectly call rtnl_net_notifyid() from RCU critical section,

  * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
    parameter.

Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
by nlmsg_new(). The function is allowed to sleep, so better make the
flags consistent with the ones used in the following
ovs_vport_cmd_fill_info() call.

Found by code inspection.

Fixes: 9a963454 ("netns: notify netns id events")
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4e4fdf9

26 9月, 2019 1 次提交

openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC · ea8564c8

由 Li RongQing 提交于 9月 24, 2019

userspace openvswitch patch "(dpif-linux: Implement the API
functions to allow multiple handler threads read upcall)"
changes its type from U32 to UNSPEC, but leave the kernel
unchanged

and after kernel 6e237d09 "(netlink: Relax attr validation
for fixed length types)", this bug is exposed by the below
warning

	[   57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

Fixes: 5cd667b0 ("openvswitch: Allow each vport to have an array of 'port_id's")
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8564c8

06 9月, 2019 1 次提交

net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c

由 Paul Blakey 提交于 9月 04, 2019

Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
	    prio 1 chain 0 proto ip \
		flower tcp ct_state -trk \
		action ct pipe \
		action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95a7233c

07 8月, 2019 1 次提交

openvswitch: Print error when ovs_execute_actions() fails · aa733660

由 Yifeng Sun 提交于 8月 04, 2019

Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa733660

25 7月, 2019 1 次提交

ovs: datapath: hide clang frame-overflow warnings · 26063790

由 Arnd Bergmann 提交于 7月 22, 2019

Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get

net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]

Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.

The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.

Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26063790

13 7月, 2019 1 次提交

net: openvswitch: do not update max_headroom if new headroom is equal to old headroom · 6b660c41

由 Taehee Yoo 提交于 7月 06, 2019

When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b660c41

06 6月, 2019 1 次提交

net: openvswitch: drop unneeded likely() call around IS_ERR() · b90f5aa4

由 Enrico Weigelt 提交于 6月 05, 2019

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.
Signed-off-by: NEnrico Weigelt <info@metux.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b90f5aa4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功