1. 19 4月, 2018 32 次提交
  2. 18 4月, 2018 8 次提交
    • D
      Merge branch 'ipv6-Separate-data-structures-for-FIB-and-data-path' · 0565de29
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net/ipv6: Separate data structures for FIB and data path
      
      IPv6 uses the same data struct for both control plane (FIB entries) and
      data path (dst entries). This struct has elements needed for both paths
      adding memory overhead and complexity (taking a dst hold in most places
      but an additional reference on rt6i_ref in a few). Furthermore, because
      of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.
      
      This patch set separates FIB entries from dst entries, better aligning
      IPv6 code with IPv4, simplifying the reference counting and allowing
      FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
      first step to a number of performance and scalability changes.
      
      The end result of this patch set:
        - FIB entries (fib6_info):
              /* size: 208, cachelines: 4, members: 25 */
              /* sum members: 207, holes: 1, sum holes: 1 */
      
        - dst entries (rt6_info)
             /* size: 240, cachelines: 4, members: 11 */
      
      Versus the the single rt6_info struct today for both paths:
            /* size: 320, cachelines: 5, members: 28 */
      
      This amounts to a 35% reduction in memory use for FIB entries and a
      25% reduction for dst entries.
      
      With respect to locking FIB entries use RCU and a single atomic
      counter with fib6_info_hold and fib6_info_release helpers to manage
      the reference counting. dst entries use only the traditional dst
      refcounts with dst_hold and dst_release.
      
      FIB entries for host routes are referenced by inet6_ifaddr and
      ifacaddr6. In both cases, additional holds are taken -- similar to
      what is done for devices.
      
      This set is the first of many changes to improve the scalability of the
      IPv6 code. Follow on changes include:
      - consolidating duplicate fib6_info references like IPv4 does with
        duplicate fib_info
      
      - moving fib6_info into a slab cache to avoid allocation roundups to
        power of 2 (the 208 size becomes a 256 actual allocation)
      
      - Allow FIB lookups without generating a dst (e.g., most rt6_lookup
        users just want to verify the egress device). Means moving dst
        allocation to the other side of fib6_rule_lookup which again aligns
        with IPv4 behavior
      
      - using separate standalone nexthop objects which have performance
        benefits beyond fib_info consolidation
      
      At this point I am not seeing any refcount leaks or underflows, no
      oops or bug_ons, or warnings from kasan, so I think it is ready for
      others to beat up on it finding errors in code paths I have missed.
      
      v2 changes
      - rebased to top of tree
      - improved commit message on patch 7
      
      v1 changes
      - rebased to top of tree
      - fix memory leak of metrics as noted by Ido
      - MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)
      
      RFC v2 changes
      - improved commit messages
      - move common metrics code from dst.c to net/ipv4/metrics.c (comment
        from DaveM)
      - address comments from Wei Wang and Martin KaFai Lau (let me know if
        I missed something)
      - fixes detected by kernel test robots
        + added fib6_metric_set to change metric on a FIB entry which could
          be pointing to read-only dst_default_metrics
        + 0day testing found a problem with an intermediate patch; added
          dst_hold_safe on rt->from. Code is removed 3 patches later
      - allow cacheinfo to handle NULL dst; means only expires is pushed to
        userspace
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0565de29
    • D
      net/ipv6: Remove unused code and variables for rt6_info · 77634cc6
      David Ahern 提交于
      Drop unneeded elements from rt6_info struct and rearrange layout to
      something more relevant for the data path.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77634cc6
    • D
      net/ipv6: Flip FIB entries to fib6_info · 8d1c802b
      David Ahern 提交于
      Convert all code paths referencing a FIB entry from
      rt6_info to fib6_info.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d1c802b
    • D
      net/ipv6: separate handling of FIB entries from dst based routes · 93531c67
      David Ahern 提交于
      Last step before flipping the data type for FIB entries:
      - use fib6_info_alloc to create FIB entries in ip6_route_info_create
        and addrconf_dst_alloc
      - use fib6_info_release in place of dst_release, ip6_rt_put and
        rt6_release
      - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
      - when purging routes, drop per-cpu routes
      - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
      - use rt->from since it points to the FIB entry
      - drop references to exception bucket, fib6_metrics and per-cpu from
        dst entries (those are relevant for fib entries only)
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93531c67
    • D
      net/ipv6: introduce fib6_info struct and helpers · a64efe14
      David Ahern 提交于
      Add fib6_info struct and alloc, destroy, hold and release helpers.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a64efe14
    • D
      net/ipv6: Cleanup exception and cache route handling · 23fb93a4
      David Ahern 提交于
      IPv6 FIB will only contain FIB entries with exception routes added to
      the FIB entry. Once this transformation is complete, FIB lookups will
      return a fib6_info with the lookup functions still returning a dst
      based rt6_info. The current code uses rt6_info for both paths and
      overloads the rt6_info variable usually called 'rt'.
      
      This patch introduces a new 'f6i' variable name for the result of the FIB
      lookup and keeps 'rt' as the dst based return variable. 'f6i' becomes a
      fib6_info in a later patch which is why it is introduced as f6i now;
      avoids the additional churn in the later patch.
      
      In addition, remove RTF_CACHE and dst checks from fib6 add and delete
      since they can not happen now and will never happen after the data
      type flip.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fb93a4
    • D
      net/ipv6: Add gfp_flags to route add functions · acb54e3c
      David Ahern 提交于
      Most FIB entries can be added using memory allocated with GFP_KERNEL.
      Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that
      can be reached from the packet path (e.g., ndisc and autoconfig) or
      atomic notifiers use GFP_ATOMIC; paths from user context (adding
      addresses and routes) use GFP_KERNEL.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acb54e3c
    • D
      net/ipv6: Create a neigh_lookup for FIB entries · f8a1b43b
      David Ahern 提交于
      The router discovery code has a FIB entry and wants to validate the
      gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
      for IPv6 and create a new function that takes the gateway and device
      and returns a neighbor entry. Use the new function in
      ndisc_router_discovery to validate the gateway.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8a1b43b