1. 04 6月, 2019 1 次提交
    • D
      ipv6: Fix redirect with VRF · 44217666
      David Ahern 提交于
      [ Upstream commit 31680ac265802397937d75461a2809a067b9fb93 ]
      
      IPv6 redirect is broken for VRF. __ip6_route_redirect walks the FIB
      entries looking for an exact match on ifindex. With VRF the flowi6_oif
      is updated by l3mdev_update_flow to the l3mdev index and the
      FLOWI_FLAG_SKIP_NH_OIF set in the flags to tell the lookup to skip the
      device match. For redirects the device match is requires so use that
      flag to know when the oif needs to be reset to the skb device index.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44217666
  2. 26 5月, 2019 2 次提交
    • E
      ipv6: prevent possible fib6 leaks · 6bc3240a
      Eric Dumazet 提交于
      [ Upstream commit 61fb0d01680771f72cc9d39783fb2c122aaad51e ]
      
      At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
      for finding all percpu routes and set their ->from pointer
      to NULL, so that fib6_ref can reach its expected value (1).
      
      The problem right now is that other cpus can still catch the
      route being deleted, since there is no rcu grace period
      between the route deletion and call to fib6_drop_pcpu_from()
      
      This can leak the fib6 and associated resources, since no
      notifier will take care of removing the last reference(s).
      
      I decided to add another boolean (fib6_destroying) instead
      of reusing/renaming exception_bucket_flushed to ease stable backports,
      and properly document the memory barriers used to implement this fix.
      
      This patch has been co-developped with Wei Wang.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Martin Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bc3240a
    • W
      ipv6: fix src addr routing with the exception table · 81a61a95
      Wei Wang 提交于
      [ Upstream commit 510e2ceda031eed97a7a0f9aad65d271a58b460d ]
      
      When inserting route cache into the exception table, the key is
      generated with both src_addr and dest_addr with src addr routing.
      However, current logic always assumes the src_addr used to generate the
      key is a /128 host address. This is not true in the following scenarios:
      1. When the route is a gateway route or does not have next hop.
         (rt6_is_gw_or_nonexthop() == false)
      2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
      This means, when looking for a route cache in the exception table, we
      have to do the lookup twice: first time with the passed in /128 host
      address, second time with the src_addr stored in fib6_info.
      
      This solves the pmtu discovery issue reported by Mikael Magnusson where
      a route cache with a lower mtu info is created for a gateway route with
      src addr. However, the lookup code is not able to find this route cache.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Reported-by: NMikael Magnusson <mikael.kernel@lists.m7n.se>
      Bisected-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Martin Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81a61a95
  3. 05 5月, 2019 2 次提交
    • E
      ipv6: fix races in ip6_dst_destroy() · 1a9e0134
      Eric Dumazet 提交于
      [ Upstream commit 0e2338749192ce0e52e7174c5352f627632f478a ]
      
      We had many syzbot reports that seem to be caused by use-after-free
      of struct fib6_info.
      
      ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
      are writers vs rt->from, and use non consistent synchronization among
      themselves.
      
      Switching to xchg() will solve the issues with no possible
      lockdep issues.
      
      BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
      BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
      BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
      Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
      
      CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       kasan_check_write+0x14/0x20 mm/kasan/common.c:108
       atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
       fib6_info_release include/net/ip6_fib.h:294 [inline]
       fib6_info_release include/net/ip6_fib.h:292 [inline]
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
       fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
       fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
       fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
       fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
       fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
       fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
       __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
       fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
       rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
       rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
       addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
       addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
       call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
       call_netdevice_notifiers net/core/dev.c:1779 [inline]
       dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
       rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
       rollback_registered+0x109/0x1d0 net/core/dev.c:8242
       unregister_netdevice_queue net/core/dev.c:9289 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
       unregister_netdevice include/linux/netdevice.h:2658 [inline]
       __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
       tun_detach drivers/net/tun.c:744 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
       __fput+0x2e5/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x90a/0x2fa0 kernel/exit.c:876
       do_group_exit+0x135/0x370 kernel/exit.c:980
       __do_sys_exit_group kernel/exit.c:991 [inline]
       __se_sys_exit_group kernel/exit.c:989 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
      RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
      RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
      R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a9e0134
    • M
      ipv6: A few fixes on dereferencing rt->from · 7ea4f000
      Martin KaFai Lau 提交于
      [ Upstream commit 886b7a50100a50f1cbd08a6f8ec5884dfbe082dc ]
      
      It is a followup after the fix in
      commit 9c69a1320515 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ea4f000
  4. 27 4月, 2019 1 次提交
    • J
      route: Avoid crash from dereferencing NULL rt->from · 5f72cb2a
      Jonathan Lemon 提交于
      [ Upstream commit 9c69a13205151c0d801de9f9d83a818e6e8f60ec ]
      
      When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is
      never checked for null - rt6_flush_exceptions() may have removed the entry.
      
      [ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170
      [ 1914.209410] Call Trace:
      [ 1914.214798]  <IRQ>
      [ 1914.219226]  __ip6_rt_update_pmtu+0xb0/0x190
      [ 1914.228649]  ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel]
      [ 1914.239223]  ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel]
      [ 1914.252489]  ? __gre6_xmit+0x148/0x530 [ip6_gre]
      [ 1914.262678]  ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre]
      [ 1914.273831]  dev_hard_start_xmit+0x8d/0x1f0
      [ 1914.283061]  sch_direct_xmit+0xfa/0x230
      [ 1914.291521]  __qdisc_run+0x154/0x4b0
      [ 1914.299407]  net_tx_action+0x10e/0x1f0
      [ 1914.307678]  __do_softirq+0xca/0x297
      [ 1914.315567]  irq_exit+0x96/0xa0
      [ 1914.322494]  smp_apic_timer_interrupt+0x68/0x130
      [ 1914.332683]  apic_timer_interrupt+0xf/0x20
      [ 1914.341721]  </IRQ>
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f72cb2a
  5. 03 4月, 2019 1 次提交
  6. 19 3月, 2019 4 次提交
    • P
      ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() · 2e4b2aeb
      Paolo Abeni 提交于
      [ Upstream commit bf1dc8bad1d42287164d216d8efb51c5cd381b18 ]
      
      We need a RCU critical section around rt6_info->from deference, and
      proper annotation.
      
      Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e4b2aeb
    • P
      ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() · 96dd4ef3
      Paolo Abeni 提交于
      [ Upstream commit 193f3685d0546b0cea20c99894aadb70098e47bf ]
      
      We must access rt6_info->from under RCU read lock: move the
      dereference under such lock, with proper annotation.
      
      v1 -> v2:
       - avoid using multiple, racy, fetch operations for rt->from
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96dd4ef3
    • P
      ipv6: route: purge exception on removal · b9d0cb75
      Paolo Abeni 提交于
      [ Upstream commit f5b51fe804ec2a6edce0f8f6b11ea57283f5857b ]
      
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9d0cb75
    • K
      net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 · fe38cbc9
      Kalash Nainwal 提交于
      [ Upstream commit 97f0082a0592212fc15d4680f5a4d80f79a1687c ]
      
      Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
      keep legacy software happy. This is similar to what was done for
      ipv4 in commit 709772e6 ("net: Fix routing tables with
      id > 255 for legacy software").
      Signed-off-by: NKalash Nainwal <kalash@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe38cbc9
  7. 10 3月, 2019 2 次提交
  8. 10 1月, 2019 1 次提交
  9. 23 11月, 2018 2 次提交
  10. 04 11月, 2018 1 次提交
    • D
      net/ipv6: Allow onlink routes to have a device mismatch if it is the default route · 6a4aa53a
      David Ahern 提交于
      [ Upstream commit 4ed591c8ab44e711e56b8e021ffaf4f407c045f5 ]
      
      The intent of ip6_route_check_nh_onlink is to make sure the gateway
      given for an onlink route is not actually on a connected route for
      a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then
      an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway
      lookup hits the default route then it most likely will be a different
      interface than the onlink route which is ok.
      
      Update ip6_route_check_nh_onlink to disregard the device mismatch
      if the gateway lookup hits the default route. Turns out the existing
      onlink tests are passing because there is no default route or it is
      an unreachable default, so update the onlink tests to have a default
      route other than unreachable.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a4aa53a
  11. 16 10月, 2018 1 次提交
  12. 27 9月, 2018 1 次提交
  13. 19 9月, 2018 2 次提交
    • W
      ipv6: fix memory leak on dst->_metrics · ce7ea4af
      Wei Wang 提交于
      When dst->_metrics and f6i->fib6_metrics share the same memory, both
      take reference count on the dst_metrics structure. However, when dst is
      destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
      which does not take care of READONLY metrics and does not release refcnt.
      This causes memory leak.
      Similar to ipv4 logic, the fix is to properly release refcnt and free
      the memory space pointed by dst->_metrics if refcnt becomes 0.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce7ea4af
    • W
      Revert "ipv6: fix double refcount of fib6_metrics" · 86758605
      Wei Wang 提交于
      This reverts commit e70a3aad.
      
      This change causes use-after-free on dst->_metrics.
      The crash trace looks like this:
      [   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
      [   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
      
      [   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
      [   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
      [   97.777957] Call Trace:
      [   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
      [   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
      [   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
      [   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
      [   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
      [   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
      [   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
      [   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
      [   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
      [   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
      [   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
      [   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
      [   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
      [   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
      [   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
      [   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
      [   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
      [   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
      [   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
      [   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
      [   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
      [   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
      [   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
      [   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
      [   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
      [   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
      [   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
      [   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
      [   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
      [   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
      [   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
      [   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
      [   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
      [   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
      [   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
      [   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
      [   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
      [   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
      [   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
      [   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
      [   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
      [   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [   97.778177] RIP: 0033:0x7f83fa36000d
      [   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
      [   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
      [   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
      [   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
      [   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000
      
      [   97.779684] Allocated by task 5919:
      [   97.783185]  save_stack+0x46/0xd0
      [   97.783187]  kasan_kmalloc+0xad/0xe0
      [   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
      [   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
      [   97.783192]  ip6_route_info_create+0x60a/0x2480
      [   97.783193]  ip6_route_add+0x1d/0x80
      [   97.783195]  inet6_rtm_newroute+0xdd/0xf0
      [   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
      [   97.783200]  netlink_rcv_skb+0x27b/0x3e0
      [   97.783202]  rtnetlink_rcv+0x15/0x20
      [   97.783203]  netlink_unicast+0x4be/0x720
      [   97.783204]  netlink_sendmsg+0x7bc/0xbf0
      [   97.783205]  sock_sendmsg+0xba/0xf0
      [   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
      [   97.783208]  __sys_sendmsg+0xe6/0x190
      [   97.783209]  SyS_sendmsg+0x13/0x20
      [   97.783211]  do_syscall_64+0x2ac/0x430
      [   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      [   97.784709] Freed by task 0:
      [   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
      [   97.794497]  save_stack+0x46/0xd0
      [   97.794499]  kasan_slab_free+0x71/0xc0
      [   97.794500]  kfree+0x7c/0xf0
      [   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
      [   97.794504]  rcu_process_callbacks+0x38b/0x1730
      [   97.794506]  __do_softirq+0x1c8/0x5d0
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86758605
  14. 18 9月, 2018 1 次提交
    • P
      net/ipv6: do not copy dst flags on rt init · 30bfd930
      Peter Oskolkov 提交于
      DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
      toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
      
      If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
      in dist_init() and decremented in dst_destroy().
      
      This flag is tied to allocation/deallocation of dst_entry and
      should not be copied from another dst/route. Otherwise it can happen
      that dst_ops::pcpuc_entries counter grows until no new routes can
      be allocated because the counter reached ip6_rt_max_size due to
      DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
      
      Fixes: 3b6761d1 ("net/ipv6: Move dst flags to booleans in fib entries")
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30bfd930
  15. 13 9月, 2018 1 次提交
    • X
      ipv6: use rt6_info members when dst is set in rt6_fill_node · 22d0bd82
      Xin Long 提交于
      In inet6_rtm_getroute, since Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"), it has used rt->from
      to dump route info instead of rt.
      
      However for some route like cache, some of its information like flags
      or gateway is not the same as that of the 'from' one. It caused 'ip
      route get' to dump the wrong route information.
      
      In Jianlin's testing, the output information even lost the expiration
      time for a pmtu route cache due to the wrong fib6_flags.
      
      So change to use rt6_info members for dst addr, src addr, flags and
      gateway when it tries to dump a route entry without fibmatch set.
      
      v1->v2:
        - not use rt6i_prefsrc.
        - also fix the gw dump issue.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d0bd82
  16. 02 9月, 2018 1 次提交
    • A
      ipv6: don't get lwtstate twice in ip6_rt_copy_init() · 93bbadd6
      Alexey Kodanev 提交于
      Commit 80f1a0f4 ("net/ipv6: Put lwtstate when destroying fib6_info")
      partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
      with ip6_rt_copy_init(), and it should be done only once there.
      
      rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
      ip6_rt_copy_init(), so there is no need to get it again at the end.
      
      With this patch, lwtstate also isn't copied from RTF_REJECT routes.
      
      [1]:
      unreferenced object 0xffff880b6aaa14e0 (size 64):
        comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
        hex dump (first 32 bytes):
          01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
          [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
          [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
          [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
          [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
          [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
          [<000000004c893c76>] netlink_unicast+0x417/0x5a0
          [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
          [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
          [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
          [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
          [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
          [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000006d21f353>] 0xffffffffffffffff
      
      Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93bbadd6
  17. 23 8月, 2018 1 次提交
  18. 06 8月, 2018 1 次提交
  19. 24 7月, 2018 1 次提交
    • W
      ipv6: use fib6_info_hold_safe() when necessary · e873e4b9
      Wei Wang 提交于
      In the code path where only rcu read lock is held, e.g. in the route
      lookup code path, it is not safe to directly call fib6_info_hold()
      because the fib6_info may already have been deleted but still exists
      in the rcu grace period. Holding reference to it could cause double
      free and crash the kernel.
      
      This patch adds a new function fib6_info_hold_safe() and replace
      fib6_info_hold() in all necessary places.
      
      Syzbot reported 3 crash traces because of this. One of them is:
      8021q: adding VLAN 0 to HW filter on device team0
      IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
      dst_release: dst:(____ptrval____) refcnt:-1
      dst_release: dst:(____ptrval____) refcnt:-2
      WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline]
      WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
      dst_release: dst:(____ptrval____) refcnt:-1
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       panic+0x238/0x4e7 kernel/panic.c:184
      dst_release: dst:(____ptrval____) refcnt:-2
      dst_release: dst:(____ptrval____) refcnt:-3
       __warn.cold.8+0x163/0x1ba kernel/panic.c:536
      dst_release: dst:(____ptrval____) refcnt:-4
       report_bug+0x252/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
      dst_release: dst:(____ptrval____) refcnt:-5
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
      RIP: 0010:dst_hold include/net/dst.h:239 [inline]
      RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
      Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78
      RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293
      RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6
      RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005
      RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338
      R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720
      R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980
       ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768
       udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:641 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:651
       ___sys_sendmsg+0x51d/0x930 net/socket.c:2125
       __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220
       __do_sys_sendmmsg net/socket.c:2249 [inline]
       __se_sys_sendmmsg net/socket.c:2246 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x446ba9
      Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9
      RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003
      RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843
      R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com
      Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com
      Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e873e4b9
  20. 17 7月, 2018 1 次提交
    • D
      net/ipv6: Do not allow device only routes via the multipath API · b5d2d75e
      David Ahern 提交于
      Eric reported that reverting the patch that fixed and simplified IPv6
      multipath routes means reverting back to invalid userspace notifications.
      eg.,
      $ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1
      
      only generates a single notification:
      2001:db8:1::/64 dev eth0 metric 1024 pref medium
      
      While working on a fix for this problem I found another case that is just
      broken completely - a multipath route with a gateway followed by device
      followed by gateway:
          $ ip -6 ro add 2001:db8:103::/64
                nexthop via 2001:db8:1::64
                nexthop dev dummy2
                nexthop via 2001:db8:3::64
      
      In this case the device only route is dropped completely - no notification
      to userpsace but no addition to the FIB either:
      
      $ ip -6 ro ls
      2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
      2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
      2001:db8:103::/64 metric 1024
      	nexthop via 2001:db8:1::64 dev dummy1 weight 1
      	nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy2 proto kernel metric 256 pref medium
      fe80::/64 dev dummy3 proto kernel metric 256 pref medium
      
      Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
      device only routes, so do not allow it all.
      
      This change will break any scripts relying on the mpath api for insert,
      but I don't see any other way to handle the permutations. Besides, since
      the routes are added to the FIB as standalone (non-multipath) routes the
      kernel is not doing what the user requested, so it might as well tell the
      user that.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5d2d75e
  21. 04 7月, 2018 1 次提交
    • D
      net/ipv6: Revert attempt to simplify route replace and append · 33bd5ac5
      David Ahern 提交于
      NetworkManager likes to manage linklocal prefix routes and does so with
      the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route
      code and by extension enable multipath routes with device only nexthops.
      
      Revert f34436a4 and these followup patches:
      6eba08c3 ("ipv6: Only emit append events for appended routes").
      ce45bded ("mlxsw: spectrum_router: Align with new route replace logic")
      53b562df ("mlxsw: spectrum_router: Allow appending to dev-only routes")
      
      Update the fib_tests cases to reflect the old behavior.
      
      Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      33bd5ac5
  22. 12 6月, 2018 1 次提交
  23. 05 6月, 2018 4 次提交
    • D
      net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172
      David Ahern 提交于
      syzbot reported a use-after-free:
      
      BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
      Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
      
      CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
       ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Allocated by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:104
       __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
       ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
       ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
       ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Freed by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:140
       dst_release_immediate+0x71/0x9e net/core/dst.c:205
       fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      The problem is that rt_last can point to a deleted route if the insert
      fails.
      
      One reproducer is to insert a route and then add a multipath route that
      has a duplicate nexthop.e.g,:
          $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
          $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
      
      Fix by not setting rt_last until the it is verified the insert succeeded.
      
      Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7225172
    • M
      ipv6: omit traffic class when calculating flow hash · fa1be7e0
      Michal Kubecek 提交于
      Some of the code paths calculating flow hash for IPv6 use flowlabel member
      of struct flowi6 which, despite its name, encodes both flow label and
      traffic class. If traffic class changes within a TCP connection (as e.g.
      ssh does), ECMP route can switch between path. It's also inconsistent with
      other code paths where ip6_flowlabel() (returning only flow label) is used
      to feed the key.
      
      Use only flow label everywhere, including one place where hash key is set
      using ip6_flowinfo().
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Fixes: f70ea018 ("net: Add functions to get skb->hash based on flow structures")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa1be7e0
    • D
      Revert "ipv6: omit traffic class when calculating flow hash" · a925ab48
      David S. Miller 提交于
      This reverts commit 87ae68c8.
      
      Applied the wrong version of this fix, correct version
      coming up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a925ab48
    • M
      ipv6: omit traffic class when calculating flow hash · 87ae68c8
      Michal Kubecek 提交于
      Some of the code paths calculating flow hash for IPv6 use flowlabel member
      of struct flowi6 which, despite its name, encodes both flow label and
      traffic class. If traffic class changes within a TCP connection (as e.g.
      ssh does), ECMP route can switch between path. It's also incosistent with
      other code paths where ip6_flowlabel() (returning only flow label) is used
      to feed the key.
      
      Use only flow label everywhere, including one place where hash key is set
      using ip6_flowinfo().
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Fixes: f70ea018 ("net: Add functions to get skb->hash based on flow structures")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87ae68c8
  24. 25 5月, 2018 1 次提交
  25. 24 5月, 2018 1 次提交
  26. 23 5月, 2018 1 次提交
    • D
      net/ipv6: Simplify route replace and appending into multipath route · f34436a4
      David Ahern 提交于
      Bring consistency to ipv6 route replace and append semantics.
      
      Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
      1. can not replace a route with a reject route. Existing code appends
         a new route instead of replacing the existing one.
      
      2. can not have a multipath route where a leg uses a dev only nexthop
      
      Existing use cases affected by this change:
      1. adding a route with existing prefix and metric using NLM_F_CREATE
         without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
         'prepend'). Existing code auto-determines that the new nexthop can
         be appended to an existing route to create a multipath route. This
         change breaks that by requiring the APPEND flag for the new route
         to be added to an existing one. Instead the prepend just adds another
         route entry.
      
      2. route replace. Existing code replaces first matching multipath route
         if new route is multipath capable and fallback to first matching
         non-ECMP route (reject or dev only route) in case one isn't available.
         New behavior replaces first matching route. (Thanks to Ido for spotting
         this one)
      
      Note: Newer iproute2 is needed to display multipath routes with a dev-only
            nexthop. This is due to a bug in iproute2 and parsing nexthops.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f34436a4
  27. 22 5月, 2018 1 次提交
  28. 16 5月, 2018 2 次提交