net, sched: fix soft lockup in tc_classify (628185cf) · 提交 · openanolis / cloud-kernel

提交 628185cf 编写于 12月 21, 2016 作者: D Daniel Borkmann 提交者： David S. Miller 12月 26, 2016

net, sched: fix soft lockup in tc_classify

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: NShahar Klein <shahark@mellanox.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Tested-by: NShahar Klein <shahark@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

上级 eb9def61

隐藏空白更改

内联并排

浏览文件 @ 628185cf

...	@@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff skb, struct nlmsghdr n)	...	@@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff skb, struct nlmsghdr n)
	unsigned long cl;		unsigned long cl;
	unsigned long fh;		unsigned long fh;
	int err;		int err;
	int tp_created = 0;		int tp_created;

	if ((n->nlmsg_type != RTM_GETTFILTER) &&		if ((n->nlmsg_type != RTM_GETTFILTER) &&
	!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))		!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
	return -EPERM;		return -EPERM;

	replay:		replay:
			tp_created = 0;

	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);		err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
	if (err < 0)		if (err < 0)
	return err;		return err;
...		...

想要评论请注册或

openanolis / cloud-kernel 大约 2 年 前同步成功

net, sched: fix soft lockup in tc_classify

openanolis / cloud-kernel
大约 2 年前同步成功