1. 22 3月, 2019 2 次提交
    • D
      ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create · c7a1ce39
      David Ahern 提交于
      Change addrconf_f6i_alloc to generate a fib6_config and call
      ip6_route_info_create. addrconf_f6i_alloc is the last caller to
      fib6_info_alloc besides ip6_route_info_create, and there is no
      reason for it to do its own initialization on a fib6_info.
      
      Host routes need to be created even if the device is down, so add a
      new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init
      to not error out if device is not up.
      
      Notes on the conversion:
      - ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL
        and fc_mx_len set to 0
      - dst_nocount is handled by the RTF_ADDRCONF flag
      - dst_host is handled by fc_dst_len = 128
      
      nh_gw does not get set after the conversion to ip6_route_info_create
      but it should not be set in addrconf_f6i_alloc since this is a host
      route not a gateway route.
      
      Everything else is a straight forward map between fib6_info and
      fib6_config.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a1ce39
    • D
      ipv6: Move setting default metric for routes · 67f69513
      David Ahern 提交于
      ip6_route_info_create is a low level function for ensuring fc_metric is
      set. Move the check and default setting to the 2 locations that do not
      already set fc_metric before calling ip6_route_info_create. This is
      required for the next patch which moves addrconf allocations to
      ip6_route_info_create and want the metric for host routes to be 0.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67f69513
  2. 02 3月, 2019 1 次提交
  3. 27 2月, 2019 1 次提交
    • D
      ipv6: Return error for RTA_VIA attribute · e3818541
      David Ahern 提交于
      IPv6 currently does not support nexthops outside of the AF_INET6 family.
      Specifically, it does not handle RTA_VIA attribute. If it is passed
      in a route add request, the actual route added only uses the device
      which is clearly not what the user intended:
      
        $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
        $ ip ro ls
        ...
        2001:db8:2::/64 dev eth0 metric 1024 pref medium
      
      Catch this and fail the route add:
        $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
        Error: IPv6 does not support RTA_VIA attribute.
      
      Fixes: 03c05665 ("mpls: Netlink commands to add, remove, and dump routes")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3818541
  4. 23 2月, 2019 2 次提交
    • K
      net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 · 97f0082a
      Kalash Nainwal 提交于
      Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
      keep legacy software happy. This is similar to what was done for
      ipv4 in commit 709772e6 ("net: Fix routing tables with
      id > 255 for legacy software").
      Signed-off-by: NKalash Nainwal <kalash@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97f0082a
    • P
      ipv6: route: purge exception on removal · f5b51fe8
      Paolo Abeni 提交于
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b51fe8
  5. 22 2月, 2019 2 次提交
  6. 16 2月, 2019 1 次提交
  7. 20 1月, 2019 1 次提交
  8. 17 1月, 2019 1 次提交
  9. 03 1月, 2019 1 次提交
  10. 28 12月, 2018 1 次提交
  11. 19 11月, 2018 1 次提交
  12. 17 11月, 2018 1 次提交
    • X
      ipv6: fix a dst leak when removing its exception · 761f6026
      Xin Long 提交于
      These is no need to hold dst before calling rt6_remove_exception_rt().
      The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
      which has been removed in Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"). Otherwise, it will
      cause a dst leak.
      
      This patch is to simply remove the dst_hold_safe() call before calling
      rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
      It's safe, because the removal of the exception that holds its dst's
      refcnt is protected by rt6_exception_lock.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Fixes: 23fb93a4 ("net/ipv6: Cleanup exception and cache route handling")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      761f6026
  13. 07 11月, 2018 1 次提交
  14. 25 10月, 2018 1 次提交
    • D
      net/ipv6: Allow onlink routes to have a device mismatch if it is the default route · 4ed591c8
      David Ahern 提交于
      The intent of ip6_route_check_nh_onlink is to make sure the gateway
      given for an onlink route is not actually on a connected route for
      a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then
      an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway
      lookup hits the default route then it most likely will be a different
      interface than the onlink route which is ok.
      
      Update ip6_route_check_nh_onlink to disregard the device mismatch
      if the gateway lookup hits the default route. Turns out the existing
      onlink tests are passing because there is no default route or it is
      an unreachable default, so update the onlink tests to have a default
      route other than unreachable.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed591c8
  15. 16 10月, 2018 2 次提交
  16. 13 10月, 2018 1 次提交
    • D
      net/ipv6: Add knob to skip DELROUTE message on device down · 7c6bb7d2
      David Ahern 提交于
      Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
      notifications when a device is taken down (admin down) or deleted. IPv4
      does not generate a message for routes evicted by the down or delete;
      IPv6 does. A NOS at scale really needs to avoid these messages and have
      IPv4 and IPv6 behave similarly, relying on userspace to handle link
      notifications and evict the routes.
      
      At this point existing user behavior needs to be preserved. Since
      notifications are a global action (not per app) the only way to preserve
      existing behavior and allow the messages to be skipped is to add a new
      sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
      disable the notificatioons.
      
      IPv6 route code already supports the option to skip the message (it is
      used for multipath routes for example). Besides the new sysctl we need
      to pass the skip_notify setting through the generic fib6_clean and
      fib6_walk functions to fib6_clean_node and to set skip_notify on calls
      to __ip_del_rt for the addrconf_ifdown path.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6bb7d2
  17. 11 10月, 2018 1 次提交
  18. 09 10月, 2018 1 次提交
  19. 06 10月, 2018 1 次提交
  20. 05 10月, 2018 3 次提交
  21. 03 10月, 2018 7 次提交
  22. 27 9月, 2018 1 次提交
  23. 20 9月, 2018 1 次提交
  24. 19 9月, 2018 2 次提交
    • W
      ipv6: fix memory leak on dst->_metrics · ce7ea4af
      Wei Wang 提交于
      When dst->_metrics and f6i->fib6_metrics share the same memory, both
      take reference count on the dst_metrics structure. However, when dst is
      destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
      which does not take care of READONLY metrics and does not release refcnt.
      This causes memory leak.
      Similar to ipv4 logic, the fix is to properly release refcnt and free
      the memory space pointed by dst->_metrics if refcnt becomes 0.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce7ea4af
    • W
      Revert "ipv6: fix double refcount of fib6_metrics" · 86758605
      Wei Wang 提交于
      This reverts commit e70a3aad.
      
      This change causes use-after-free on dst->_metrics.
      The crash trace looks like this:
      [   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
      [   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
      
      [   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
      [   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
      [   97.777957] Call Trace:
      [   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
      [   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
      [   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
      [   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
      [   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
      [   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
      [   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
      [   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
      [   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
      [   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
      [   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
      [   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
      [   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
      [   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
      [   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
      [   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
      [   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
      [   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
      [   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
      [   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
      [   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
      [   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
      [   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
      [   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
      [   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
      [   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
      [   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
      [   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
      [   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
      [   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
      [   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
      [   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
      [   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
      [   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
      [   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
      [   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
      [   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
      [   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
      [   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
      [   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
      [   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
      [   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [   97.778177] RIP: 0033:0x7f83fa36000d
      [   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
      [   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
      [   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
      [   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
      [   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000
      
      [   97.779684] Allocated by task 5919:
      [   97.783185]  save_stack+0x46/0xd0
      [   97.783187]  kasan_kmalloc+0xad/0xe0
      [   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
      [   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
      [   97.783192]  ip6_route_info_create+0x60a/0x2480
      [   97.783193]  ip6_route_add+0x1d/0x80
      [   97.783195]  inet6_rtm_newroute+0xdd/0xf0
      [   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
      [   97.783200]  netlink_rcv_skb+0x27b/0x3e0
      [   97.783202]  rtnetlink_rcv+0x15/0x20
      [   97.783203]  netlink_unicast+0x4be/0x720
      [   97.783204]  netlink_sendmsg+0x7bc/0xbf0
      [   97.783205]  sock_sendmsg+0xba/0xf0
      [   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
      [   97.783208]  __sys_sendmsg+0xe6/0x190
      [   97.783209]  SyS_sendmsg+0x13/0x20
      [   97.783211]  do_syscall_64+0x2ac/0x430
      [   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      [   97.784709] Freed by task 0:
      [   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
      [   97.794497]  save_stack+0x46/0xd0
      [   97.794499]  kasan_slab_free+0x71/0xc0
      [   97.794500]  kfree+0x7c/0xf0
      [   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
      [   97.794504]  rcu_process_callbacks+0x38b/0x1730
      [   97.794506]  __do_softirq+0x1c8/0x5d0
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86758605
  25. 18 9月, 2018 1 次提交
    • P
      net/ipv6: do not copy dst flags on rt init · 30bfd930
      Peter Oskolkov 提交于
      DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
      toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
      
      If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
      in dist_init() and decremented in dst_destroy().
      
      This flag is tied to allocation/deallocation of dst_entry and
      should not be copied from another dst/route. Otherwise it can happen
      that dst_ops::pcpuc_entries counter grows until no new routes can
      be allocated because the counter reached ip6_rt_max_size due to
      DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
      
      Fixes: 3b6761d1 ("net/ipv6: Move dst flags to booleans in fib entries")
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30bfd930
  26. 13 9月, 2018 1 次提交
    • X
      ipv6: use rt6_info members when dst is set in rt6_fill_node · 22d0bd82
      Xin Long 提交于
      In inet6_rtm_getroute, since Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"), it has used rt->from
      to dump route info instead of rt.
      
      However for some route like cache, some of its information like flags
      or gateway is not the same as that of the 'from' one. It caused 'ip
      route get' to dump the wrong route information.
      
      In Jianlin's testing, the output information even lost the expiration
      time for a pmtu route cache due to the wrong fib6_flags.
      
      So change to use rt6_info members for dst addr, src addr, flags and
      gateway when it tries to dump a route entry without fibmatch set.
      
      v1->v2:
        - not use rt6i_prefsrc.
        - also fix the gw dump issue.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d0bd82
  27. 11 9月, 2018 1 次提交