1. 25 6月, 2019 2 次提交
    • S
      Revert "net/ipv6: Bail early if user only wants cloned entries" · ef11209d
      Stefano Brivio 提交于
      This reverts commit 08e814c9: as we
      are preparing to fix listing and dumping of IPv6 cached routes, we
      need to allow RTM_F_CLONED as a flag to match routes against while
      dumping them.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef11209d
    • S
      fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED · 564c91f7
      Stefano Brivio 提交于
      The following patches add back the ability to dump IPv4 and IPv6 exception
      routes, and we need to allow selection of regular routes or exceptions.
      
      Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
      iproute2 passes it in dump requests (except for IPv6 cache flush requests,
      this will be fixed in iproute2) and this used to work as long as
      exceptions were stored directly in the FIB, for both IPv4 and IPv6.
      
      Caveat: if strict checking is not requested (that is, if the dump request
      doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
      tables or route types.
      
      In this case, filtering on RTM_F_CLONED would be inconsistent: we would
      fix 'ip route list cache' by returning exception routes and at the same
      time introduce another bug in case another selector is present, e.g. on
      'ip route list cache table main' we would return all exception routes,
      without filtering on tables.
      
      Keep this consistent by applying no filters at all, and dumping both
      routes and exceptions, if strict checking is not requested. iproute2
      currently filters results anyway, and no unwanted results will be
      presented to the user. The kernel will just dump more data than needed.
      
      v7: No changes
      
      v6: Rebase onto net-next, no changes
      
      v5: New patch: add dump_routes and dump_exceptions flags in filter and
          simply clear the unwanted one if strict checking is enabled, don't
          ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
          Skip filtering altogether if no strict checking is requested:
          selecting routes or exceptions only would be inconsistent with the
          fact we can't filter on tables.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564c91f7
  2. 24 6月, 2019 1 次提交
  3. 19 6月, 2019 2 次提交
  4. 11 6月, 2019 1 次提交
  5. 05 6月, 2019 1 次提交
    • D
      ipv6: Plumb support for nexthop object in a fib6_info · f88d8ea6
      David Ahern 提交于
      Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
      fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
      referencing a nexthop object can not have 'sibling' entries (the old way
      of doing multipath routes), the nh_list is a union with fib6_siblings.
      
      Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
      using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
      and delete fib entries using the nexthop.
      
      Add a few nexthop helpers for use when a nexthop is added to fib6_info:
      - nexthop_fib6_nh - return first fib6_nh in a nexthop object
      - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
        if the fib6_info references a nexthop object
      - nexthop_path_fib6_result - similar to ipv4, select a path within a
        multipath nexthop object. If the nexthop is a blackhole, set
        fib6_result type to RTN_BLACKHOLE, and set the REJECT flag
      
      Update the fib6_info references to check for nh and take a different path
      as needed:
      - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
        be coalesced with other fib entries into a multipath route
      - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
        a nexthop
      - addrconf (host routes), RA's and info entries (anything configured via
        ndisc) does not use nexthop objects
      - fib6_info_destroy_rcu - put reference to nexthop object
      - fib6_purge_rt - drop fib6_info from f6i_list
      - fib6_select_path - update to use the new nexthop_path_fib6_result when
        fib entry uses a nexthop object
      - rt6_device_match - update to catch use of nexthop object as a blackhole
        and set fib6_type and flags.
      - ip6_route_info_create - don't add space for fib6_nh if fib entry is
        going to reference a nexthop object, take a reference to nexthop object,
        disallow use of source routing
      - rt6_nlmsg_size - add space for RTA_NH_ID
      - add rt6_fill_node_nexthop to add nexthop data on a dump
      
      As with ipv4, most of the changes push existing code into the else branch
      of whether the fib entry uses a nexthop object.
      
      Update the nexthop code to walk f6i_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88d8ea6
  6. 31 5月, 2019 1 次提交
  7. 25 5月, 2019 4 次提交
    • D
      ipv6: Make fib6_nh optional at the end of fib6_info · 1cf844c7
      David Ahern 提交于
      Move fib6_nh to the end of fib6_info and make it an array of
      size 0. Pass a flag to fib6_info_alloc indicating if the
      allocation needs to add space for a fib6_nh.
      
      The current code path always has a fib6_nh allocated with a
      fib6_info; with nexthop objects they will be separate.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf844c7
    • D
      ipv6: Move exception bucket to fib6_nh · cc5c073a
      David Ahern 提交于
      Similar to the pcpu routes exceptions are really per nexthop, so move
      rt6i_exception_bucket from fib6_info to fib6_nh.
      
      To avoid additional increases to the size of fib6_nh for a 1-bit flag,
      use the lowest bit in the allocated memory pointer for the flushed flag.
      Add helpers for retrieving the bucket pointer to mask off the flag.
      
      The cleanup of the exception bucket is moved to fib6_nh_release.
      
      fib6_nh_flush_exceptions can now be called from 2 contexts:
      1. deleting a fib entry
      2. deleting a fib6_nh
      
      For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that
      is getting deleted. All exceptions in the cache using the entry are
      deleted. For 2, the fib6_nh itself is getting destroyed so
      fib6_nh_flush_exceptions is called for a NULL fib6_info which means
      flush all entries.
      
      The pmtu.sh selftest exercises the affected code paths - from creating
      exceptions to cleaning them up on device delete. All tests pass without
      any rcu locking or memleak warnings.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5c073a
    • D
      ipv6: Refactor fib6_drop_pcpu_from · 7d88d8b5
      David Ahern 提交于
      Move the existing pcpu walk in fib6_drop_pcpu_from to a new
      helper, __fib6_drop_pcpu_from, that can be invoked per fib6_nh with a
      reference to the from entries that need to be evicted. If the passed
      in 'from' is non-NULL then only entries associated with that fib6_info
      are removed (e.g., case where fib entry is deleted); if the 'from' is
      NULL are entries are flushed (e.g., fib6_nh is deleted).
      
      For fib6_info entries with builtin fib6_nh (ie., current code) there
      is no change in behavior.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d88d8b5
    • D
      ipv6: Move pcpu cached routes to fib6_nh · f40b6ae2
      David Ahern 提交于
      rt6_info are specific instances of a fib entry and are tied to a
      device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
      entries have separate fib6_info for each nexthop in a multipath route,
      so the location of the pcpu cache in the fib6_info struct worked.
      However, with nexthop objects a fib6_info can point to a set of nexthops
      (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
      cache needs to be moved to the fib6_nh struct so the cached entries
      are local to the nexthop specification used to create the rt6_info.
      
      Initialization and free of the pcpu entries moved to fib6_nh_init and
      fib6_nh_release.
      
      Change in location only, from fib6_info down to fib6_nh; no other
      functional change intended.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40b6ae2
  8. 23 5月, 2019 2 次提交
  9. 17 5月, 2019 1 次提交
    • E
      ipv6: prevent possible fib6 leaks · 61fb0d01
      Eric Dumazet 提交于
      At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
      for finding all percpu routes and set their ->from pointer
      to NULL, so that fib6_ref can reach its expected value (1).
      
      The problem right now is that other cpus can still catch the
      route being deleted, since there is no rcu grace period
      between the route deletion and call to fib6_drop_pcpu_from()
      
      This can leak the fib6 and associated resources, since no
      notifier will take care of removing the last reference(s).
      
      I decided to add another boolean (fib6_destroying) instead
      of reusing/renaming exception_bucket_flushed to ease stable backports,
      and properly document the memory barriers used to implement this fix.
      
      This patch has been co-developped with Wei Wang.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Martin Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61fb0d01
  10. 01 5月, 2019 1 次提交
    • E
      ipv6: fix races in ip6_dst_destroy() · 0e233874
      Eric Dumazet 提交于
      We had many syzbot reports that seem to be caused by use-after-free
      of struct fib6_info.
      
      ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
      are writers vs rt->from, and use non consistent synchronization among
      themselves.
      
      Switching to xchg() will solve the issues with no possible
      lockdep issues.
      
      BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
      BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
      BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
      Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
      
      CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       kasan_check_write+0x14/0x20 mm/kasan/common.c:108
       atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
       fib6_info_release include/net/ip6_fib.h:294 [inline]
       fib6_info_release include/net/ip6_fib.h:292 [inline]
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
       fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
       fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
       fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
       fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
       fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
       fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
       __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
       fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
       rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
       rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
       addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
       addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
       call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
       call_netdevice_notifiers net/core/dev.c:1779 [inline]
       dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
       rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
       rollback_registered+0x109/0x1d0 net/core/dev.c:8242
       unregister_netdevice_queue net/core/dev.c:9289 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
       unregister_netdevice include/linux/netdevice.h:2658 [inline]
       __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
       tun_detach drivers/net/tun.c:744 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
       __fput+0x2e5/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x90a/0x2fa0 kernel/exit.c:876
       do_group_exit+0x135/0x370 kernel/exit.c:980
       __do_sys_exit_group kernel/exit.c:991 [inline]
       __se_sys_exit_group kernel/exit.c:989 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
      RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
      RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
      R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e233874
  11. 24 4月, 2019 3 次提交
  12. 18 4月, 2019 1 次提交
  13. 09 4月, 2019 1 次提交
  14. 30 3月, 2019 3 次提交
  15. 03 1月, 2019 1 次提交
  16. 06 11月, 2018 1 次提交
  17. 25 10月, 2018 1 次提交
  18. 16 10月, 2018 4 次提交
  19. 13 10月, 2018 1 次提交
    • D
      net/ipv6: Add knob to skip DELROUTE message on device down · 7c6bb7d2
      David Ahern 提交于
      Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
      notifications when a device is taken down (admin down) or deleted. IPv4
      does not generate a message for routes evicted by the down or delete;
      IPv6 does. A NOS at scale really needs to avoid these messages and have
      IPv4 and IPv6 behave similarly, relying on userspace to handle link
      notifications and evict the routes.
      
      At this point existing user behavior needs to be preserved. Since
      notifications are a global action (not per app) the only way to preserve
      existing behavior and allow the messages to be skipped is to add a new
      sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
      disable the notificatioons.
      
      IPv6 route code already supports the option to skip the message (it is
      used for multipath routes for example). Besides the new sysctl we need
      to pass the skip_notify setting through the generic fib6_clean and
      fib6_walk functions to fib6_clean_node and to set skip_notify on calls
      to __ip_del_rt for the addrconf_ifdown path.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6bb7d2
  20. 11 10月, 2018 1 次提交
  21. 09 10月, 2018 1 次提交
    • D
      rtnetlink: Update fib dumps for strict data checking · e8ba330a
      David Ahern 提交于
      Add helper to check netlink message for route dumps. If the strict flag
      is set the dump request is expected to have an rtmsg struct as the header.
      All elements of the struct are expected to be 0 with the exception of
      rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
      can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
      set.
      
      Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
      and ip6mr_rtm_dumproute to call this helper if strict data checking is
      enabled.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ba330a
  22. 05 10月, 2018 2 次提交
  23. 03 9月, 2018 1 次提交
    • D
      net/ipv6: Only update MTU metric if it set · 15a81b41
      David Ahern 提交于
      Jan reported a regression after an update to 4.18.5. In this case ipv6
      default route is setup by systemd-networkd based on data from an RA. The
      RA contains an MTU of 1492 which is used when the route is first inserted
      but then systemd-networkd pushes down updates to the default route
      without the mtu set.
      
      Prior to the change to fib6_info, metrics such as MTU were held in the
      dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
      any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
      the value from the metrics is used if it is set and finally falling
      back to the idev value.
      
      After the fib6_info change metrics are contained in the fib6_info struct
      and there is no equivalent to rt6i_pmtu. To maintain consistency with
      the old behavior the new code should only reset the MTU in the metrics
      if the route update has it set.
      
      Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
      Reported-by: NJan Janssen <medhefgo@web.de>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15a81b41
  24. 21 8月, 2018 1 次提交
  25. 02 8月, 2018 1 次提交
  26. 31 7月, 2018 1 次提交