1. 14 6月, 2019 3 次提交
  2. 13 6月, 2019 19 次提交
  3. 12 6月, 2019 17 次提交
  4. 11 6月, 2019 1 次提交
    • D
      Merge branch 'net-Enable-nexthop-objects-with-IPv4-and-IPv6-routes' · 48debfd7
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: Enable nexthop objects with IPv4 and IPv6 routes
      
      This is the final set of the initial nexthop object work. When I
      started this idea almost 2 years ago, it took 18 seconds to inject
      700k+ IPv4 routes with 1 hop and about 28 seconds for 4-paths. Some
      of that time was due to inefficiencies in 'ip', but most of it was
      kernel side with excessive synchronize_rcu calls in ipv4, and redundant
      processing validating a nexthop spec (device, gateway, encap). Worse,
      the time increased dramatically as the number of legs in the routes
      increased; for example, taking over 72 seconds for 16-path routes.
      
      After this set, with increased dirty memory limits (fib_sync_mem sysctl),
      an improved ip and nexthop objects a full internet fib (743,799 routes
      based on a pull in January 2019) can be pushed to the kernel in 4.3
      seconds. Even better, the time to insert is "almost" constant with
      increasing number of paths. The 'almost constant' time is due to
      expanding the nexthop definitions when generating notifications. A
      follow on patch will be sent adding a sysctl that allows an admin to
      avoid the nexthop expansion and truly get constant route insert time
      regardless of the number of paths in a route! (Useful once all programs
      used for a deployment that care about routes understand nexthop objects).
      
      To be clear, 'ip' is used for benchmarking for no other reason than
      'ip -batch' is a trivial to use for the tests. FRR, for example, better
      manages nexthops and route changes and the way those are pushed to the
      kernel and thus will have less userspace processing times than 'ip -batch'.
      
      Patches 1-10 iterate over fib6_nh with a nexthop invoke a processing
      function per fib6_nh. Prior to nexthop objects, a fib6_info referenced
      a single fib6_nh. Multipath routes were added as separate fib6_info for
      each leg of the route and linked as siblings:
      
          f6i -> sibling -> sibling ... -> sibling
           |                                   |
           +--------- multipath route ---------+
      
      With nexthop objects a single fib6_info references an external
      nexthop which may have a series of fib6_nh:
      
           f6i ---> nexthop ---> fib6_nh
                                 ...
                                 fib6_nh
      
      making IPv6 routes similar to IPv4. The side effect is that a single
      fib6_info now indirectly references a series of fib6_nh so the code
      needs to walk each entry and call the local, per-fib6_nh processing
      function.
      
      Patches 11 and 13 wire up use of nexthops with fib entries for IPv4
      and IPv6. With these commits you can actually use nexthops with routes.
      
      Patch 12 is an optimization for IPv4 when using nexthops in the most
      predominant use case (no metrics).
      
      Patches 14 handles replace of a nexthop config.
      
      Patches 15-18 add update pmtu and redirect tests to use both old and
      new routing.
      
      Patches 19 and 20 add new tests for the nexthop infrastructure. The first
      is single nexthop is used by multiple prefixes to communicate with remote
      hosts. This is on top of the functional tests already committed. The
      second verifies multipath selection.
      
      v4
      - changed return to 'goto out' in patch 9 since the rcu_read_lock is
        held (noticed by Wei)
      
      v3
      - removed found arg in patch 7 and changed rt6_nh_remove_exception_rt
        to return 1 when a match is found for an exception
      
      v2
      - changed ++i to i++ in patches 1 and 14 as noticed by DaveM
      - improved commit message for patch 14 (nexthop replace)
      - removed the skip_fib argument to remove_nexthop; vestige of an
        older design
      ====================
      Reviewed-By: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48debfd7