1. 18 4月, 2018 1 次提交
  2. 05 3月, 2018 1 次提交
  3. 01 3月, 2018 1 次提交
  4. 11 1月, 2018 2 次提交
  5. 08 1月, 2018 2 次提交
    • I
      ipv6: Export sernum update function · 4a8e56ee
      Ido Schimmel 提交于
      We are going to allow dead routes to stay in the FIB tree (e.g., when
      they are part of a multipath route, directly connected route with no
      carrier) and revive them when their nexthop device gains carrier or when
      it is put administratively up.
      
      This is equivalent to the addition of the route to the FIB tree and we
      should therefore take care of updating the sernum of all the parent
      nodes of the node where the route is stored. Otherwise, we risk sockets
      caching and using sub-optimal dst entries.
      
      Export the function that performs the above, so that it could be invoked
      from fib6_ifup() later on.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a8e56ee
    • I
      ipv6: Add explicit flush indication to routes · a2c554d3
      Ido Schimmel 提交于
      When routes that are a part of a multipath route are evaluated by
      fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
      the state of their sibling routes is not considered.
      
      This will change in subsequent patches in order to align IPv6 with
      IPv4's behavior. For example, when the last sibling in a multipath route
      becomes dead, the entire multipath route needs to be removed.
      
      To prevent the tree walker from re-evaluating all the sibling routes
      each time, we can simply evaluate them once - when the first sibling is
      traversed.
      
      If we determine the entire multipath route needs to be removed, then the
      'should_flush' bit is set in all the siblings, which will cause the
      walker to flush them when it traverses them.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2c554d3
  6. 30 11月, 2017 2 次提交
  7. 08 10月, 2017 8 次提交
    • W
      ipv6: take care of rt6_stats · 81eb8447
      Wei Wang 提交于
      Currently, most of the rt6_stats are not hooked up correctly. As the
      last part of this patch series, hook up all existing rt6_stats and add
      one new stat fib_rt_uncache to indicate the number of routes in the
      uncached list.
      For details of the stats, please refer to the comments added in
      include/net/ip6_fib.h.
      
      Note: fib_rt_alloc and fib_rt_uncache are not guaranteed to be modified
      under a lock. So atomic_t is used for them.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81eb8447
    • W
      ipv6: replace rwlock with rcu and spinlock in fib6_table · 66f5d6ce
      Wei Wang 提交于
      With all the preparation work before, we are now ready to replace rwlock
      with rcu and spinlock in fib6_table.
      That means now all fib6_node in fib6_table are protected by rcu. And
      when freeing fib6_node, call_rcu() is used to wait for the rcu grace
      period before releasing the memory.
      When accessing fib6_node, corresponding rcu APIs need to be used.
      And all previous sessions protected by the write lock will now be
      protected by the spin lock per table.
      All previous sessions protected by read lock will now be protected by
      rcu_read_lock().
      
      A couple of things to note here:
      1. As part of the work of replacing rwlock with rcu, the linked list of
      fn->leaf now has to be rcu protected as well. So both fn->leaf and
      rt->dst.rt6_next are now __rcu tagged and corresponding rcu APIs are
      used when manipulating them.
      
      2. For fn->rr_ptr, first of all, it also needs to be rcu protected now
      and is tagged with __rcu and rcu APIs are used in corresponding places.
      Secondly, fn->rr_ptr is changed in rt6_select() which is a reader
      thread. This makes the issue a bit complicated. We think a valid
      solution for it is to let rt6_select() grab the tb6_lock if it decides
      to change it. As it is not in the normal operation and only happens when
      there is no valid neighbor cache for the route, we think the performance
      impact should be low.
      
      3. fib6_walk_continue() has to be called with tb6_lock held even in the
      route dumping related functions, e.g. inet6_dump_fib(),
      fib6_tables_dump() and ipv6_route_seq_ops. It is because
      fib6_walk_continue() makes modifications to the walker structure, and so
      are fib6_repair_tree() and fib6_del_route(). In order to do proper
      syncing between them, we need to let fib6_walk_continue() hold the lock.
      We may be able to do further improvement on the way we do the tree walk
      to get rid of the need for holding the spin lock. But not for now.
      
      4. When fib6_del_route() removes a route from the tree, we no longer
      mark rt->dst.rt6_next to NULL to make simultaneous reader be able to
      further traverse the list with rcu. However, rt->dst.rt6_next is only
      valid within this same rcu period. No one should access it later.
      
      5. All the operation of atomic_inc(rt->rt6i_ref) is changed to be
      performed before we publish this route (either by linking it to fn->leaf
      or insert it in the list pointed by fn->leaf) just to be safe because as
      soon as we publish the route, some read thread will be able to access it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66f5d6ce
    • W
      ipv6: update fn_sernum after route is inserted to tree · bbd63f06
      Wei Wang 提交于
      fib6_add() logic currently calls fib6_add_1() to figure out what node
      should be used for the newly added route and then call
      fib6_add_rt2node() to insert the route to the node.
      And during the call of fib6_add_1(), fn_sernum is updated for all nodes
      that share the same prefix as the new route.
      This does not have issue in the current code because reader thread will
      not be able to access the tree while writer thread is inserting new
      route to it. However, it is not the case once we transition to use RCU.
      Reader thread could potentially see the new fn_sernum before the new
      route is inserted. As a result, reader thread's route lookup will return
      a stale route with the new fn_sernum.
      
      In order to solve this issue, we remove all the update of fn_sernum in
      fib6_add_1(), and instead, introduce a new function that updates fn_sernum
      for all related nodes and call this functions once the route is
      successfully inserted to the tree.
      Also, smp_wmb() is used after a route is successfully inserted into the
      fib tree and right before the updated of fn->sernum. And smp_rmb() is
      used right after fn->sernum is accessed in rt6_get_cookie_safe(). This
      is to guarantee that when the reader thread sees the new fn->sernum, the
      new route is already inserted in the tree in memory.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd63f06
    • W
      ipv6: hook up exception table to store dst cache · 2b760fcf
      Wei Wang 提交于
      This commit makes use of the exception hash table implementation to
      store dst caches created by pmtu discovery and ip redirect into the hash
      table under the rt_info and no longer inserts these routes into fib6
      tree.
      This makes the fib6 tree only contain static configured routes and could
      now be protected by rcu instead of a rw lock.
      With this change, in the route lookup related functions, after finding
      the rt6_info with the longest prefix, we also need to search for the
      exception table before doing backtracking.
      In the route delete function, if the route being deleted is not a dst
      cache, deletion of this route also need to flush the whole hash table
      under it. If it is a dst cache, then only delete the cached dst in the
      hash table.
      
      Note: for fib6_walk_continue() function, w->root now is always pointing
      to a root node considering that fib6_prune_clones() is removed from the
      code. So we add a WARN_ON() msg to make sure w->root always points to a
      root node and also removed the update of w->root in fib6_repair_tree().
      This is a prerequisite for later patch because we don't need to make
      w->root as rcu protected when replacing rwlock with RCU.
      Also, we remove all prune related variables as it is no longer used.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b760fcf
    • W
      ipv6: prepare fib6_locate() for exception table · 38fbeeee
      Wei Wang 提交于
      fib6_locate() is used to find the fib6_node according to the passed in
      prefix address key. It currently tries to find the fib6_node with the
      exact match of the passed in key. However, when we move cached routes
      into the exception table, fib6_locate() will fail to find the fib6_node
      for it as the cached routes will be stored in the exception table under
      the fib6_node with the longest prefix match of the cache's dst addr key.
      This commit adds a new parameter to let the caller specify if it needs
      exact match or longest prefix match.
      Right now, all callers still does exact match when calling
      fib6_locate(). It will be changed in later commit where exception table
      is hooked up to store cached routes.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38fbeeee
    • W
      ipv6: prepare fib6_age() for exception table · c757faa8
      Wei Wang 提交于
      If all dst cache entries are stored in the exception table under the
      main route, we have to go through them during fib6_age() when doing
      garbage collecting.
      Introduce a new function rt6_age_exception() which goes through all dst
      entries in the exception table and remove those entries that are expired.
      This function is called in fib6_age() so that all dst caches are also
      garbage collected.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c757faa8
    • W
      ipv6: introduce a hash table to store dst cache · 35732d01
      Wei Wang 提交于
      Add a hash table into struct rt6_info in order to store dst caches
      created by pmtu discovery and ip redirect in ipv6 routing code.
      APIs to add dst cache, delete dst cache, find dst cache and update
      dst cache in the hash table are implemented and will be used in later
      commits.
      This is a preparation work to move all cache routes into the exception
      table instead of getting inserted into the fib6 tree.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35732d01
    • W
      ipv6: introduce a new function fib6_update_sernum() · 180ca444
      Wei Wang 提交于
      This function takes a route as input and tries to update the sernum in
      the fib6_node this route is associated with. It will be used in later
      commit when adding a cached route into the exception table under that
      route.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      180ca444
  8. 29 8月, 2017 1 次提交
  9. 23 8月, 2017 1 次提交
  10. 16 8月, 2017 1 次提交
    • I
      ipv6: fib: Provide offload indication using nexthop flags · fe400799
      Ido Schimmel 提交于
      IPv6 routes currently lack nexthop flags as in IPv4. This has several
      implications.
      
      In the forwarding path, it requires us to check the carrier state of the
      nexthop device and potentially ignore a linkdown route, instead of
      checking for RTNH_F_LINKDOWN.
      
      It also requires capable drivers to use the user facing IPv6-specific
      route flags to provide offload indication, instead of using the nexthop
      flags as in IPv4.
      
      Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
      provide offload indication instead of the RTF_OFFLOAD flag, which is
      removed while it's still not part of any official kernel release.
      
      In the near future we would like to use the field for the
      RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
      not be ready in time for the current cycle.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe400799
  11. 04 8月, 2017 6 次提交
  12. 18 6月, 2017 1 次提交
  13. 23 5月, 2017 1 次提交
  14. 05 2月, 2017 1 次提交
    • D
      net: ipv6: Allow shorthand delete of all nexthops in multipath route · 0ae81335
      David Ahern 提交于
      IPv4 allows multipath routes to be deleted using just the prefix and
      length. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
              nexthop via 10.100.1.254  dev eth1 weight 1
              nexthop via 10.11.200.2  dev eth11.200 weight 1
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
          $ ip ro del 1.1.1.0/24 vrf red
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
      The same notation does not work with IPv6 because of how multipath routes
      are implemented for IPv6. For IPv6 only the first nexthop of a multipath
      route is deleted if the request contains only a prefix and length. This
      leads to unnecessary complexity in userspace dealing with IPv6 multipath
      routes.
      
      This patch allows all nexthops to be deleted without specifying each one
      in the delete request. Internally, this is done by walking the sibling
      list of the route matching the specifications given (prefix, length,
      metric, protocol, etc).
      
          $  ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
          2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
          ...
      
          $ ip -6 ro del vrf red 2001:db8:200::/120
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          ...
      
      Because IPv6 allows individual nexthops to be deleted without deleting
      the entire route, the ip6_route_multipath_del and non-multipath code
      path (ip6_route_del) have to be discriminated so that all nexthops are
      only deleted for the latter case. This is done by making the existing
      fc_type in fib6_config a u16 and then adding a new u16 field with
      fc_delete_all_nh as the first bit.
      Suggested-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ae81335
  15. 28 10月, 2016 1 次提交
    • D
      net: ipv6: Fix processing of RAs in presence of VRF · 830218c1
      David Ahern 提交于
      rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
      table from the device index, but the corresponding rt6_get_route_info
      and rt6_get_dflt_router functions were not leading to the failure to
      process RA's:
      
          ICMPv6: RA: ndisc_router_discovery failed to add default route
      
      Fix the 'get' functions by using the table id associated with the
      device when applicable.
      
      Also, now that default routes can be added to tables other than the
      default table, rt6_purge_dflt_routers needs to be updated as well to
      look at all tables. To handle that efficiently, add a flag to the table
      denoting if it is has a default route via RA.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      830218c1
  16. 16 11月, 2015 1 次提交
  17. 18 9月, 2015 1 次提交
  18. 21 8月, 2015 1 次提交
  19. 22 7月, 2015 1 次提交
  20. 26 5月, 2015 4 次提交
  21. 02 5月, 2015 2 次提交
    • M
      ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer · afc4eef8
      Martin KaFai Lau 提交于
      _rt6i_peer is no longer needed after the last patch,
      'ipv6: Stop rt6_info from using inet_peer's metrics'.
      
      DST_METRICS_FORCE_OVERWRITE is added by
      commit e5fd387a ("ipv6: do not overwrite inetpeer metrics prematurely").
      Since inetpeer is no longer used for metrics, this bit is also not needed.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afc4eef8
    • M
      ipv6: Stop rt6_info from using inet_peer's metrics · 4b32b5ad
      Martin KaFai Lau 提交于
      inet_peer is indexed by the dst address alone.  However, the fib6 tree
      could have multiple routing entries (rt6_info) for the same dst. For
      example,
      1. A /128 dst via multiple gateways.
      2. A RTF_CACHE route cloned from a /128 route.
      
      In the above cases, all of them will share the same metrics and
      step on each other.
      
      This patch will steer away from inet_peer's metrics and use
      dst_cow_metrics_generic() for everything.
      
      Change Highlights:
      1. Remove rt6_cow_metrics() which currently acquires metrics from
         inet_peer for DST_HOST route (i.e. /128 route).
      2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a
         full size metrics just to override the RTAX_MTU.
      3. After (2), the RTF_CACHE route can also share the metrics with its
         dst.from route, by:
         dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true);
      4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route.  Instead,
         directly clone from rt->dst.
      
         [ Currently, cloning from another RTF_CACHE is only possible during
           rt6_do_redirect().  Also, the old clone is removed from the tree
           immediately after the new clone is added. ]
      
         In case of cloning from an older redirect RTF_CACHE, it should work as
         before.
      
         In case of cloning from an older pmtu RTF_CACHE, this patch will forget
         the pmtu and re-learn it (if there is any) from the redirected route.
      
      The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed
      in the next cleanup patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b32b5ad