提交 · 12d94a804946af291e24b80fc53ec86264765781 · openeuler / raspberrypi-kernel

16 8月, 2017 1 次提交

ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80

由 Eric Dumazet 提交于 8月 15, 2017

Based on a syzkaller report [1], I found that a per cpu allocation
failure in snmp6_alloc_dev() would then lead to NULL dereference in
ip6_route_dev_notify().

It seems this is a very old bug, thus no Fixes tag in this submission.

Let's add in6_dev_put_clear() helper, as we will probably use
it elsewhere (once available/present in net-next)

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff88019f456680 task.stack: ffff8801c6e58000
RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
 in6_dev_put include/net/addrconf.h:335 [inline]
 ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
 call_netdevice_notifiers net/core/dev.c:1694 [inline]
 rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
 register_netdev+0x1a/0x30 net/core/dev.c:7669
 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
 ops_init+0x10a/0x570 net/core/net_namespace.c:118
 setup_net+0x313/0x710 net/core/net_namespace.c:294
 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
 SYSC_unshare kernel/fork.c:2347 [inline]
 SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512c9
RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
ffff8801c6e5f1b0
RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
ffff8801c6e5f1b0
RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
ffff8801c6e5f1b0
---[ end trace e441d046c6410d31 ]---
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12d94a80

15 8月, 2017 1 次提交

ipv6: release rt6->rt6i_idev properly during ifdown · e5645f51

由 Wei Wang 提交于 8月 14, 2017

When a dst is created by addrconf_dst_alloc() for a host route or an
anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
points to a real device.
When the real device goes down, the current cleanup code only checks for
dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
refcount leak on the real device in the above situation.
This patch makes sure to always release the refcount taken on
rt6->rt6i_idev during dst_dev_put().

Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of
dst_free()")
Reported-by: NJohn Stultz <john.stultz@linaro.org>
Tested-by: NJohn Stultz <john.stultz@linaro.org>
Tested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5645f51

04 8月, 2017 1 次提交

ipv6: set rt6i_protocol properly in the route when it is installed · b91d5329

由 Xin Long 提交于 8月 03, 2017

After commit c2ed1880 ("net: ipv6: check route protocol when
deleting routes"), ipv6 route checks rt protocol when trying to
remove a rt entry.

It introduced a side effect causing 'ip -6 route flush cache' not
to work well. When flushing caches with iproute, all route caches
get dumped from kernel then removed one by one by sending DELROUTE
requests to kernel for each cache.

The thing is iproute sends the request with the cache whose proto
is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
it. But in kernel the rt_cache protocol is still 0, which causes
the cache not to be matched and removed.

So the real reason is rt6i_protocol in the route is not set when
it is allocated. As David Ahern's suggestion, this patch is to
set rt6i_protocol properly in the route when it is installed and
remove the codes setting rtm_protocol according to rt6i_flags in
rt6_fill_node.

This is also an improvement to keep rt6i_protocol consistent with
rtm_protocol.

Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes")
Reported-by: NJianlin Shi <jishi@redhat.com>
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b91d5329

06 7月, 2017 1 次提交

net: ipv6: Compare lwstate in detecting duplicate nexthops · f06b7549

由 David Ahern 提交于 7月 05, 2017

Lennert reported a failure to add different mpls encaps in a multipath
route:

  $ ip -6 route add 1234::/16 \
        nexthop encap mpls 10 via fe80::1 dev ens3 \
        nexthop encap mpls 20 via fe80::1 dev ens3
  RTNETLINK answers: File exists

The problem is that the duplicate nexthop detection does not compare
lwtunnel configuration. Add it.

Fixes: 19e42e45 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reported-by: NJoão Taveira Araújo <joao.taveira@gmail.com>
Reported-by: NLennert Buytenhek <buytenh@wantstofly.org>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f06b7549

22 6月, 2017 1 次提交

ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER · 76da0704

由 WANG Cong 提交于 6月 20, 2017

In commit 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.

We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.

Fixes: 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: Njeffy <jeffy.chen@rock-chips.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76da0704

18 6月, 2017 7 次提交

net: remove DST_NOCACHE flag · a4c2fd7f

由 Wei Wang 提交于 6月 17, 2017

DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4c2fd7f

net: remove DST_NOGC flag · b2a9c0ed

由 Wei Wang 提交于 6月 17, 2017

Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2a9c0ed

ipv6: get rid of icmp6 dst garbage collector · db916649

由 Wei Wang 提交于 6月 17, 2017

icmp6 dst route is currently ref counted during creation and will be
freed by user during its call of dst_release(). So no need of a garbage
collector for it.
Remove all icmp6 dst garbage collector related code.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db916649

ipv6: mark DST_NOGC and remove the operation of dst_free() · 587fea74

由 Wei Wang 提交于 6月 17, 2017

With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv6 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
to dst_free() and its related functions.
At this point, all dst created in ipv6 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Also, as icmp6 dst route is refcounted during creation and will be freed
by user during its call of dst_release(), there is no need to add this
dst to the icmp6 gc list as well.
Instead, we need to add it into uncached list so that when a
NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
these icmp6 dst as well and release the net device properly.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

587fea74

ipv6: call dst_hold_safe() properly · ad65a2f0

由 Wei Wang 提交于 6月 17, 2017

Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when
necessary to avoid double free issue on the dst.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad65a2f0

ipv6: take dst->__refcnt for insertion into fib6 tree · 1cfb71ee

由 Wei Wang 提交于 6月 17, 2017

In IPv6 routing code, struct rt6_info is created for each static route
and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
ref count is not taken.
As explained in the previous patch, this leads to the need of the dst
garbage collector.

This patch holds ref count of dst before inserting the route into fib6
tree and properly releases the dst when deleting it from the fib6 tree
as a preparation in order to fully get rid of dst gc later.

Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
no user is referencing the dst.

And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
dst->__refcnt to 1.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cfb71ee

net: use loopback dev when generating blackhole route · 1dbe3252

由 Wei Wang 提交于 6月 17, 2017

Existing ipv4/6_blackhole_route() code generates a blackhole route
with dst->dev pointing to the passed in dst->dev.
It is not necessary to hold reference to the passed in dst->dev
because the packets going through this route are dropped anyway.
A loopback interface is good enough so that we don't need to worry about
releasing this dst->dev when this dev is going down.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dbe3252

08 6月, 2017 1 次提交

net: ipv6: Release route when device is unregistering · 8397ed36

由 David Ahern 提交于 6月 07, 2017

Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:

$ ifdown bond2    # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2

Steps to reproduce:
    echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
    ip link add dev bond12 type bond
    ip link add dev bond13 type bond
    ip addr add 2001:db8:2::0/64 dev bond12
    ip addr add 2001:db8:3::0/64 dev bond13
    ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
    ip link del dev bond12
    ip link del dev bond13

The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.

Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8397ed36

30 5月, 2017 2 次提交

net: add extack arg to lwtunnel build state · 9ae28727

由 David Ahern 提交于 5月 27, 2017

Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ae28727

net: lwtunnel: Add extack to encap attr validation · c255bd68

由 David Ahern 提交于 5月 27, 2017

Pass extack down to lwtunnel_valid_encap_type and
lwtunnel_valid_encap_type_attr. Add messages for unknown
or unsupported encap types.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c255bd68

27 5月, 2017 1 次提交

net: ipv6: RTM_GETROUTE: return matched fib result when requested · 18c3a61c

由 Roopa Prabhu 提交于 5月 25, 2017

This patch adds support to return matched fib result when RTM_F_FIB_MATCH
flag is specified in RTM_GETROUTE request. This is useful for user-space
applications/controllers wanting to query a matching route.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18c3a61c

23 5月, 2017 2 次提交

net: ipv6: Add extack messages for route add failures · d5d531cb

由 David Ahern 提交于 5月 21, 2017

Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5d531cb

net: ipv6: Plumb extack through route add functions · 333c4301

由 David Ahern 提交于 5月 21, 2017

Plumb extack argument down to route add functions.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

333c4301

09 5月, 2017 1 次提交

ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf · 242d3a49

由 WANG Cong 提交于 5月 08, 2017

For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
   loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, thus we have to do that after we add
idev to loopback. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier. This is similar to commit 2f460933
("ipv6: initialize route null entry in addrconf_init()") which
fixes init_net.

Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.
Acked-by: NDavid Ahern <dsahern@gmail.com>
Tested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

242d3a49

05 5月, 2017 1 次提交

ipv6: initialize route null entry in addrconf_init() · 2f460933

由 WANG Cong 提交于 5月 03, 2017

Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
since it is always NULL.

This is clearly wrong, we have code to initialize it to loopback_dev,
unfortunately the order is still not correct.

loopback_dev is registered very early during boot, we lose a chance
to re-initialize it in notifier. addrconf_init() is called after
ip6_route_init(), which means we have no chance to correct it.

Fix it by moving this initialization explicitly after
ipv6_add_dev(init_net.loopback_dev) in addrconf_init().
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f460933

22 4月, 2017 1 次提交

net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be

由 David Ahern 提交于 4月 19, 2017

Andrey reported a fault in the IPv6 route code:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069809600 task.stack: ffff880062dc8000
RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
RSP: 0018:ffff880062dced30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
Call Trace:
 ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
...

Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.

Fix by checking for the flag and failing with EINVAL.

Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

557c44be

18 4月, 2017 1 次提交

net: rtnetlink: plumb extended ack to doit function · c21ef3e3

由 David Ahern 提交于 4月 16, 2017

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c21ef3e3

14 4月, 2017 1 次提交

netlink: pass extended ACK struct to parsing functions · fceb6435

由 Johannes Berg 提交于 4月 12, 2017

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fceb6435

17 3月, 2017 1 次提交

net: ipv6: set route type for anycast routes · 4ee39733

由 David Ahern 提交于 3月 15, 2017

Anycast routes have the RTF_ANYCAST flag set, but when dumping routes
for userspace the route type is not set to RTN_ANYCAST. Make it so.

Fixes: 58c4fb86 ("[IPV6]: Flag RTF_ANYCAST for anycast routes")
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ee39733

10 3月, 2017 1 次提交

net: ipv6: Remove redundant RTA_OIF in multipath routes · 5be083ce

由 David Ahern 提交于 3月 06, 2017

Dinesh reported that RTA_MULTIPATH nexthops are 8-bytes larger with IPv6
than IPv4. The recent refactoring for multipath support in netlink
messages does discriminate between non-multipath which needs the OIF
and multipath which adds a rtnexthop struct for each hop making the
RTA_OIF attribute redundant. Resolve by adding a flag to the info
function to skip the oif for multipath.

Fixes: beb1afac ("net: ipv6: Add support to dump multipath routes
       via RTA_MULTIPATH attribute")
Reported-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5be083ce

03 3月, 2017 2 次提交

ipv6: ignore null_entry in inet6_rtm_getroute() too · 9d6acb3b

由 WANG Cong 提交于 3月 01, 2017

Like commit 1f17e2f2 ("net: ipv6: ignore null_entry on route dumps"),
we need to ignore null entry in inet6_rtm_getroute() too.

Return -ENETUNREACH here to sync with IPv4 behavior, as suggested by David.

Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d6acb3b

ipv6: check for ip6_null_entry in __ip6_del_rt_siblings() · e3330039

由 WANG Cong 提交于 2月 27, 2017

Andrey reported a NULL pointer deref bug in ipv6_route_ioctl()
-> ip6_route_del() -> __ip6_del_rt_siblings() code path. This is
because ip6_null_entry is returned in this path since ip6_null_entry
is kinda default for a ipv6 route table root node. Quote from
David Ahern:

 ip6_null_entry is the root of all ipv6 fib tables making it integrated
 into the table ...

We should ignore any attempt of trying to delete it, like we do in
__ip6_del_rt() path and several others.
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Fixes: 0ae81335 ("net: ipv6: Allow shorthand delete of all nexthops in multipath route")
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3330039

02 3月, 2017 1 次提交

net: route: add missing nla_policy entry for RTA_MARK attribute · 3b45a410

由 Liping Zhang 提交于 2月 27, 2017

This will add stricter validating for RTA_MARK attribute.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b45a410

08 2月, 2017 2 次提交

net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP · 0dec879f

由 Julian Anastasov 提交于 2月 06, 2017

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.
Reported-by: NYueHaibing <yuehaibing@huawei.com>
Fixes: 5110effe ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bed ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dec879f

net: add confirm_neigh method to dst_ops · 63fca65d

由 Julian Anastasov 提交于 2月 06, 2017

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

For XFRM prefer the last tunnel address, if present. With help
from Steffen Klassert.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63fca65d

05 2月, 2017 5 次提交

net: ipv6: Use compressed IPv6 addresses showing route replace error · 7d4d5065

由 David Ahern 提交于 2月 02, 2017

ip6_print_replace_route_err logs an error if a route replace fails with
IPv6 addresses in the full format. e.g,:

IPv6: IPV6: multipath route replace failed (check consistency of installed routes): 2001:0db8:0200:0000:0000:0000:0000:0000 nexthop 2001:0db8:0001:0000:0000:0000:0000:0016 ifi 0

Change the message to dump the addresses in the compressed format.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d4d5065

net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH · 16a16cd3

由 David Ahern 提交于 2月 02, 2017

If an entire multipath route is deleted using prefix and len (without any
nexthops), send a single RTM_DELROUTE notification with the full route
using RTA_MULTIPATH. This is done by generating the skb before the route
delete when all of the sibling routes are still present but sending it
after the route has been removed from the FIB. The skip_notify flag
is used to tell the lower fib code not to send notifications for the
individual nexthop routes.

If a route is deleted using RTA_MULTIPATH for any nexthops or a single
nexthop entry is deleted, then the nexthops are deleted one at a time with
notifications sent as each hop is deleted. This is necessary given that
IPv6 allows individual hops within a route to be deleted.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16a16cd3

net: ipv6: Change notifications for multipath add to RTA_MULTIPATH · 3b1137fe

由 David Ahern 提交于 2月 02, 2017

Change ip6_route_multipath_add to send one notifciation with the full
route encoded with RTA_MULTIPATH instead of a series of individual routes.
This is done by adding a skip_notify flag to the nl_info struct. The
flag is used to skip sending of the notification in the fib code that
actually inserts the route. Once the full route has been added, a
notification is generated with all nexthops.

ip6_route_multipath_add handles 3 use cases: new routes, route replace,
and route append. The multipath notification generated needs to be
consistent with the order of the nexthops and it should be consistent
with the order in a FIB dump which means the route with the first nexthop
needs to be used as the route reference. For the first 2 cases (new and
replace), a reference to the route used to send the notification is
obtained by saving the first route added. For the append case, the last
route added is used to loop back to its first sibling route which is
the first nexthop in the multipath route.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b1137fe

net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute · beb1afac

由 David Ahern 提交于 2月 02, 2017

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing multipath routes to be returned using the
RTA_MULTIPATH attribute.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

beb1afac

net: ipv6: Allow shorthand delete of all nexthops in multipath route · 0ae81335

由 David Ahern 提交于 2月 02, 2017

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
        nexthop via 10.100.1.254  dev eth1 weight 1
        nexthop via 10.11.200.2  dev eth11.200 weight 1
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

    $ ip ro del 1.1.1.0/24 vrf red

    $ ip ro ls vrf red
    unreachable default metric 8192
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request. Internally, this is done by walking the sibling
list of the route matching the specifications given (prefix, length,
metric, protocol, etc).

    $  ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
    2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
    ...

    $ ip -6 ro del vrf red 2001:db8:200::/120

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    ...

Because IPv6 allows individual nexthops to be deleted without deleting
the entire route, the ip6_route_multipath_del and non-multipath code
path (ip6_route_del) have to be discriminated so that all nexthops are
only deleted for the latter case. This is done by making the existing
fc_type in fib6_config a u16 and then adding a new u16 field with
fc_delete_all_nh as the first bit.
Suggested-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ae81335

04 2月, 2017 1 次提交

net: ipv6: Set protocol to kernel for local routes · 94b5e0f9

由 David Ahern 提交于 2月 02, 2017

IPv6 stack does not set the protocol for local routes, so those routes show
up with proto "none":
    $ ip -6 ro ls table local
    local ::1 dev lo proto none metric 0  pref medium
    local 2100:3:: dev lo proto none metric 0  pref medium
    local 2100:3::4 dev lo proto none metric 0  pref medium
    local fe80:: dev lo proto none metric 0  pref medium
    ...

Set rt6i_protocol to RTPROT_KERNEL for consistency with IPv4. Now routes
show up with proto "kernel":
    $ ip -6 ro ls table local
    local ::1 dev lo proto kernel metric 0  pref medium
    local 2100:3:: dev lo proto kernel metric 0  pref medium
    local 2100:3::4 dev lo proto kernel metric 0  pref medium
    local fe80:: dev lo proto kernel metric 0  pref medium
    ...
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94b5e0f9

31 1月, 2017 1 次提交

lwtunnel: remove device arg to lwtunnel_build_state · 30357d7d

由 David Ahern 提交于 1月 30, 2017

Nothing about lwt state requires a device reference, so remove the
input argument.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30357d7d

27 1月, 2017 2 次提交

net: ipv6: ignore null_entry on route dumps · 1f17e2f2

由 David Ahern 提交于 1月 26, 2017

lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000
[   10.153309]
[   10.154492] Oops: 0000 [#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70e-dirty #10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

ip6_null_entry is the root of all ipv6 fib tables making it integrated
into the table and hence passed to the ipv6 route dump code. The
null_entry route uses the loopback device for dst.dev but may not have
rt6i_idev set because of the order in which initializations are done --
ip6_route_net_init is run before addrconf_init has initialized the
loopback device. Fixing the initialization order is a much bigger problem
with no obvious solution thus far.

The BUG is triggered when the loopback is set down and the netif_running
check added by a1a22c12 fails. The fill_node descends to checking
rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is
NULL it faults.

The null_entry route should not be processed in a dump request. Catch
and ignore. This check is done in rt6_dump_route as it is the highest
place in the callchain with knowledge of both the route and the network
namespace.

Fixes: a1a22c12("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f17e2f2

net: ipv6: remove skb_reserve in getroute · 3b7b2b0a

由 David Ahern 提交于 1月 26, 2017

Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The
allocated skb is not passed through the routing engine (like it is for
IPv4) and has not since the beginning of git time.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b7b2b0a

20 1月, 2017 1 次提交

net: ipv6: Keep nexthop of multipath route on admin down · a1a22c12

由 David Ahern 提交于 1月 18, 2017

IPv6 deletes route entries associated with multipath routes on an
admin down where IPv4 does not. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24 metric 64
            nexthop via 10.100.1.254  dev eth1 weight 1
            nexthop via 10.100.2.254  dev eth2 weight 1
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4
    10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...

Set link down:
    $ ip li set eth1 down

IPv4 retains the multihop route but flags eth1 route as dead:

    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
            nexthop via 10.100.1.16  dev eth1 weight 1 dead linkdown
            nexthop via 10.100.2.16  dev eth2 weight 1
    10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4

and IPv6 deletes the route as part of flushing all routes for the device:

    $ ip -6 ro ls vrf red
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...

Worse, on admin up of the device the multipath route has to be deleted
to get this leg of the route re-added.

This patch keeps routes that are part of a multipath route if
ignore_routes_with_linkdown is set with the dead and linkdown flags
enabling consistency between IPv4 and IPv6:

    $ ip -6 ro ls vrf red
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1a22c12