提交 · 0023f0c43232d66073f80349747f32ff20b198df · openeuler / Kernel

21 8月, 2023 1 次提交

net: openvswitch: fix flow memory leak in ovs_flow_cmd_new · 0023f0c4

由 Fedor Pchelkin 提交于 2月 02, 2023

stable inclusion
from stable-v5.10.168
commit 70154489f531587996f3e9d7cceeee65cff0001d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7URR4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=70154489f531587996f3e9d7cceeee65cff0001d

----------------------------------------------------

[ Upstream commit 0c598aed ]

Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
not freed when an allocation of a key fails.

BUG: memory leak
unreferenced object 0xffff888116668000 (size 632):
  comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
    [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77
    [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957
    [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739
    [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
    [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
    [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515
    [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
    [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
    [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339
    [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934
    [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
    [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
    [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356
    [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
    [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
    [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
    [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

To fix this the patch rearranges the goto labels to reflect the order of
object allocations and adds appropriate goto statements on the error
paths.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 68bb1010 ("openvswitch: Fix flow lookup to use unmasked key")
Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: NEelco Chaudron <echaudro@redhat.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: Nzhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>

0023f0c4

14 8月, 2023 1 次提交

openvswitch: Fix flow lookup to use unmasked key · 1d732b60

由 Eelco Chaudron 提交于 12月 15, 2022

stable inclusion
from stable-v5.10.163
commit 6736b61ecf230dd656464de0f514bdeadb384f20
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6736b61ecf230dd656464de0f514bdeadb384f20

----------------------------------------------------

[ Upstream commit 68bb1010 ]

The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.

However, if we try to add a wildcard rule, the installation fails:

$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)

The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.

This change reverses the commit below, but rather than having the key
on the stack, it's allocated.

Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: Nzhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>

1d732b60

23 5月, 2023 1 次提交

openvswitch: switch from WARN to pr_warn · 7d5a643f

由 Aaron Conole 提交于 10月 25, 2022

stable inclusion
from stable-v5.10.153
commit 79631daa5a5166c6aa4e25f447e8c08ee030637a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79631daa5a5166c6aa4e25f447e8c08ee030637a

--------------------------------

[ Upstream commit fd954cc1 ]

As noted by Paolo Abeni, pr_warn doesn't generate any splat and can still
preserve the warning to the user that feature downgrade occurred.  We
likely cannot introduce other kinds of checks / enforcement here because
syzbot can generate different genl versions to the datapath.

Reported-by: syzbot+31cde0bef4bbf8ba2d86@syzkaller.appspotmail.com
Fixes: 44da5ae5 ("openvswitch: Drop user features if old user space attempted to create datapath")
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NAaron Conole <aconole@redhat.com>
Acked-by: NIlya Maximets <i.maximets@ovn.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NLipeng Sang <sanglipeng1@jd.com>

7d5a643f

19 4月, 2023 2 次提交

openvswitch: Fix overreporting of drops in dropwatch · 8d0630a9

由 Mike Pattrick 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 129ca0db956e1e2483f55d2217756f9258f52737
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=129ca0db956e1e2483f55d2217756f9258f52737

--------------------------------

[ Upstream commit c21ab2af ]

Currently queue_userspace_packet will call kfree_skb for all frames,
whether or not an error occurred. This can result in a single dropped
frame being reported as multiple drops in dropwatch. This functions
caller may also call kfree_skb in case of an error. This patch will
consume the skbs instead and allow caller's to use kfree_skb.
Signed-off-by: NMike Pattrick <mkp@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8d0630a9

openvswitch: Fix double reporting of drops in dropwatch · c0a2b5c3

由 Mike Pattrick 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 4398e8a7fd6abf9cdfc69ee2a75434bf47c2a210
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4398e8a7fd6abf9cdfc69ee2a75434bf47c2a210

--------------------------------

[ Upstream commit 1100248a ]

Frames sent to userspace can be reported as dropped in
ovs_dp_process_packet, however, if they are dropped in the netlink code
then netlink_attachskb will report the same frame as dropped.

This patch checks for error codes which indicate that the frame has
already been freed.
Signed-off-by: NMike Pattrick <mkp@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c0a2b5c3

04 11月, 2020 1 次提交

net: openvswitch: silence suspicious RCU usage warning · fea07a48

由 Eelco Chaudron 提交于 11月 03, 2020

Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().

In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.

Fixes: 9bf24f59 ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuildSigned-off-by: NJakub Kicinski <kuba@kernel.org>

fea07a48

03 10月, 2020 1 次提交

genetlink: move to smaller ops wherever possible · 66a9b928

由 Jakub Kicinski 提交于 10月 02, 2020

Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66a9b928

02 9月, 2020 2 次提交

net: openvswitch: fixes crash if nf_conncount_init() fails · e0afe914

由 Eelco Chaudron 提交于 8月 31, 2020

If nf_conncount_init fails currently the dispatched work is not canceled,
causing problems when the timer fires. This change fixes this by not
scheduling the work until all initialization is successful.

Fixes: a65878d6 ("net: openvswitch: fixes potential deadlock in dp cleanup code")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0afe914

net: openvswitch: improve the coding style · cf3266ad

由 Tonghao Zhang 提交于 9月 01, 2020

Not change the logic, just improve the coding style.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf3266ad

14 8月, 2020 1 次提交

net: openvswitch: introduce common code for flushing flows · 1f3a090b

由 Tonghao Zhang 提交于 8月 12, 2020

To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: NJohan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f3a090b

04 8月, 2020 2 次提交

net: openvswitch: make masks cache size configurable · 9bf24f59

由 Eelco Chaudron 提交于 7月 31, 2020

This patch makes the masks cache size configurable, or with
a size of 0, disable it.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf24f59

net: openvswitch: add masks cache hit counter · 9d2f627b

由 Eelco Chaudron 提交于 7月 31, 2020

Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d2f627b

25 7月, 2020 1 次提交

net: openvswitch: fixes potential deadlock in dp cleanup code · a65878d6

由 Eelco Chaudron 提交于 7月 24, 2020

The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.

Fixes: eac87c41 ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: NPaolo <pabeni@redhat.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a65878d6

18 7月, 2020 1 次提交

net: openvswitch: reorder masks array based on usage · eac87c41

由 Eelco Chaudron 提交于 7月 15, 2020

This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.

Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.

Kernel without this patch     :  7.71 Gbps
Kernel with this patch applied: 14.52 Gbps

We also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Tested-by: NAndrew Theurer <atheurer@redhat.com>
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eac87c41

21 4月, 2020 1 次提交

net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce

由 Tonghao Zhang 提交于 4月 17, 2020

syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27de77ce

30 3月, 2020 1 次提交

net: Fix typo of SKB_SGO_CB_OFFSET · a08e7fd9

由 Cambda Zhu 提交于 3月 26, 2020

The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a08e7fd9

04 3月, 2020 1 次提交

openvswitch: add missing attribute validation for hash · b5ab1f1b

由 Jakub Kicinski 提交于 3月 02, 2020

Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7 ("net: openvswitch: add hash info to upcall")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5ab1f1b

19 2月, 2020 1 次提交

datapath.c: Use built-in RCU list checking · 53742e69

由 Madhuparna Bhowmik 提交于 2月 19, 2020

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.
Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53742e69

15 1月, 2020 1 次提交

net: openvswitch: use skb_list_walk_safe helper for gso segments · 2cec4448

由 Jason A. Donenfeld 提交于 1月 13, 2020

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cec4448

10 12月, 2019 1 次提交

treewide: Use sizeof_field() macro · c593642c

由 Pankaj Bharadiya 提交于 12月 09, 2019

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net

c593642c

02 12月, 2019 2 次提交

openvswitch: remove another BUG_ON() · 8a574f86

由 Paolo Abeni 提交于 12月 01, 2019

If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.

Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.

v1 -> v2:
 - do not leak an skb on error

Fixes: aed06778 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a574f86

openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() · 8ffeb03f

由 Paolo Abeni 提交于 12月 01, 2019

All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.

v1 -> v2:
 - clarify the commit message
 - clean the skb and report the error (DaveM)

Fixes: ccb1352e ("net: Add Open vSwitch kernel components.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ffeb03f

27 11月, 2019 1 次提交

openvswitch: fix flow command message size · 4e81c0b3

由 Paolo Abeni 提交于 11月 26, 2019

When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
flow has no UFID, we can exceed the computed size, as
ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
attribute.
Take the above in account when computing the flow command message
size.

Fixes: 74ed7ab9 ("openvswitch: Add support for unique flow IDs.")
Reported-by: NQi Jun Ding <qding@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e81c0b3

16 11月, 2019 1 次提交

net: openvswitch: don't call pad_packet if not necessary · 61ca533c

由 Tonghao Zhang 提交于 11月 14, 2019

The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put		attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put		attrlen
  -> __nla_reserve	attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61ca533c

15 11月, 2019 1 次提交

net: openvswitch: add hash info to upcall · bd1903b7

由 Tonghao Zhang 提交于 11月 13, 2019

When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd1903b7

04 11月, 2019 3 次提交

net: openvswitch: simplify the ovs_dp_cmd_new · eec62ead

由 Tonghao Zhang 提交于 11月 01, 2019

use the specified functions to init resource.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eec62ead

net: openvswitch: don't unlock mutex when changing the user_features fails · 4c76bf69

由 Tonghao Zhang 提交于 11月 01, 2019

Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.

Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c76bf69

net: openvswitch: add flow-mask cache for performance · 04b7d136

由 Tonghao Zhang 提交于 11月 01, 2019

The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.

Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.

Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Acked-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b7d136

26 10月, 2019 1 次提交

netns: fix GFP flags in rtnl_net_notifyid() · d4e4fdf9

由 Guillaume Nault 提交于 10月 23, 2019

In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
but there are a few paths calling rtnl_net_notifyid() from atomic
context or from RCU critical sections. The later also precludes the use
of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
call is wrong too, as it uses GFP_KERNEL unconditionally.

Therefore, we need to pass the GFP flags as parameter and propagate it
through function calls until the proper flags can be determined.

In most cases, GFP_KERNEL is fine. The exceptions are:
  * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
    indirectly call rtnl_net_notifyid() from RCU critical section,

  * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
    parameter.

Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
by nlmsg_new(). The function is allowed to sleep, so better make the
flags consistent with the ones used in the following
ovs_vport_cmd_fill_info() call.

Found by code inspection.

Fixes: 9a963454 ("netns: notify netns id events")
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4e4fdf9

26 9月, 2019 1 次提交

openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC · ea8564c8

由 Li RongQing 提交于 9月 24, 2019

userspace openvswitch patch "(dpif-linux: Implement the API
functions to allow multiple handler threads read upcall)"
changes its type from U32 to UNSPEC, but leave the kernel
unchanged

and after kernel 6e237d09 "(netlink: Relax attr validation
for fixed length types)", this bug is exposed by the below
warning

	[   57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

Fixes: 5cd667b0 ("openvswitch: Allow each vport to have an array of 'port_id's")
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8564c8

06 9月, 2019 1 次提交

net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c

由 Paul Blakey 提交于 9月 04, 2019

Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
	    prio 1 chain 0 proto ip \
		flower tcp ct_state -trk \
		action ct pipe \
		action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95a7233c

07 8月, 2019 1 次提交

openvswitch: Print error when ovs_execute_actions() fails · aa733660

由 Yifeng Sun 提交于 8月 04, 2019

Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa733660

25 7月, 2019 1 次提交

ovs: datapath: hide clang frame-overflow warnings · 26063790

由 Arnd Bergmann 提交于 7月 22, 2019

Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get

net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]

Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.

The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.

Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26063790

13 7月, 2019 1 次提交

net: openvswitch: do not update max_headroom if new headroom is equal to old headroom · 6b660c41

由 Taehee Yoo 提交于 7月 06, 2019

When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Tested-by: NGreg Rose <gvrose8192@gmail.com>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b660c41

06 6月, 2019 1 次提交

net: openvswitch: drop unneeded likely() call around IS_ERR() · b90f5aa4

由 Enrico Weigelt 提交于 6月 05, 2019

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.
Signed-off-by: NEnrico Weigelt <info@metux.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b90f5aa4

05 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 · c9422999

由 Thomas Gleixner 提交于 5月 29, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation inc 51 franklin street fifth floor boston ma
  02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 21 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c9422999

04 5月, 2019 1 次提交

net: openvswitch: return an error instead of doing BUG_ON() · a734d1f4

由 Eelco Chaudron 提交于 5月 02, 2019

For all other error cases in queue_userspace_packet() the error is
returned, so it makes sense to do the same for these two error cases.
Reported-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Acked-by: NFlavio Leitner <fbl@sysclose.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a734d1f4

28 4月, 2019 3 次提交

genetlink: optionally validate strictly/dumps · ef6243ac

由 Johannes Berg 提交于 4月 26, 2019

Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.

Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
     {
            .cmd = X,
    +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
            ...
     },
    ...
    };

For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef6243ac

netlink: make validation more configurable for future strictness · 8cb08174

由 Johannes Berg 提交于 4月 26, 2019

We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb08174

netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de

由 Michal Kubecek 提交于 4月 26, 2019

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae0be8de

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功