提交 680aea08 编写于 作者: A Amit Cohen 提交者: Jakub Kicinski

net: ipv4: Emit notification when fib hardware flags are changed

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
  routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Acked-by: NRoopa Prabhu <roopa@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
上级 1e7bdec6
...@@ -178,6 +178,26 @@ min_adv_mss - INTEGER ...@@ -178,6 +178,26 @@ min_adv_mss - INTEGER
The advertised MSS depends on the first hop route MTU, but will The advertised MSS depends on the first hop route MTU, but will
never be lower than this setting. never be lower than this setting.
fib_notify_on_flag_change - INTEGER
Whether to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/
RTM_F_TRAP flags are changed.
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.
It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.
The notifications will indicate to user-space the state of the route.
Default: 0 (Do not emit notifications.)
Possible values:
- 0 - Do not emit notifications.
- 1 - Emit notifications.
IP Fragmentation: IP Fragmentation:
ipfrag_high_thresh - LONG INTEGER ipfrag_high_thresh - LONG INTEGER
......
...@@ -188,6 +188,8 @@ struct netns_ipv4 { ...@@ -188,6 +188,8 @@ struct netns_ipv4 {
int sysctl_udp_wmem_min; int sysctl_udp_wmem_min;
int sysctl_udp_rmem_min; int sysctl_udp_rmem_min;
int sysctl_fib_notify_on_flag_change;
#ifdef CONFIG_NET_L3_MASTER_DEV #ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept; int sysctl_udp_l3mdev_accept;
#endif #endif
......
...@@ -1871,6 +1871,8 @@ static __net_init int inet_init_net(struct net *net) ...@@ -1871,6 +1871,8 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.sysctl_igmp_llm_reports = 1; net->ipv4.sysctl_igmp_llm_reports = 1;
net->ipv4.sysctl_igmp_qrv = 2; net->ipv4.sysctl_igmp_qrv = 2;
net->ipv4.sysctl_fib_notify_on_flag_change = 0;
return 0; return 0;
} }
......
...@@ -1038,6 +1038,8 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri) ...@@ -1038,6 +1038,8 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri)
void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri) void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
{ {
struct fib_alias *fa_match; struct fib_alias *fa_match;
struct sk_buff *skb;
int err;
rcu_read_lock(); rcu_read_lock();
...@@ -1045,9 +1047,34 @@ void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri) ...@@ -1045,9 +1047,34 @@ void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri)
if (!fa_match) if (!fa_match)
goto out; goto out;
if (fa_match->offload == fri->offload && fa_match->trap == fri->trap)
goto out;
fa_match->offload = fri->offload; fa_match->offload = fri->offload;
fa_match->trap = fri->trap; fa_match->trap = fri->trap;
if (!net->ipv4.sysctl_fib_notify_on_flag_change)
goto out;
skb = nlmsg_new(fib_nlmsg_size(fa_match->fa_info), GFP_ATOMIC);
if (!skb) {
err = -ENOBUFS;
goto errout;
}
err = fib_dump_info(skb, 0, 0, RTM_NEWROUTE, fri, 0);
if (err < 0) {
/* -EMSGSIZE implies BUG in fib_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
kfree_skb(skb);
goto errout;
}
rtnl_notify(skb, net, 0, RTNLGRP_IPV4_ROUTE, NULL, GFP_ATOMIC);
goto out;
errout:
rtnl_set_sk_err(net, RTNLGRP_IPV4_ROUTE, err);
out: out:
rcu_read_unlock(); rcu_read_unlock();
} }
......
...@@ -1354,6 +1354,15 @@ static struct ctl_table ipv4_net_table[] = { ...@@ -1354,6 +1354,15 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dointvec_minmax, .proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE .extra1 = SYSCTL_ONE
}, },
{
.procname = "fib_notify_on_flag_change",
.data = &init_net.ipv4.sysctl_fib_notify_on_flag_change,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
{ } { }
}; };
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册