1. 08 10月, 2017 13 次提交
    • W
      ipv6: take care of rt6_stats · 81eb8447
      Wei Wang 提交于
      Currently, most of the rt6_stats are not hooked up correctly. As the
      last part of this patch series, hook up all existing rt6_stats and add
      one new stat fib_rt_uncache to indicate the number of routes in the
      uncached list.
      For details of the stats, please refer to the comments added in
      include/net/ip6_fib.h.
      
      Note: fib_rt_alloc and fib_rt_uncache are not guaranteed to be modified
      under a lock. So atomic_t is used for them.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81eb8447
    • W
      ipv6: replace rwlock with rcu and spinlock in fib6_table · 66f5d6ce
      Wei Wang 提交于
      With all the preparation work before, we are now ready to replace rwlock
      with rcu and spinlock in fib6_table.
      That means now all fib6_node in fib6_table are protected by rcu. And
      when freeing fib6_node, call_rcu() is used to wait for the rcu grace
      period before releasing the memory.
      When accessing fib6_node, corresponding rcu APIs need to be used.
      And all previous sessions protected by the write lock will now be
      protected by the spin lock per table.
      All previous sessions protected by read lock will now be protected by
      rcu_read_lock().
      
      A couple of things to note here:
      1. As part of the work of replacing rwlock with rcu, the linked list of
      fn->leaf now has to be rcu protected as well. So both fn->leaf and
      rt->dst.rt6_next are now __rcu tagged and corresponding rcu APIs are
      used when manipulating them.
      
      2. For fn->rr_ptr, first of all, it also needs to be rcu protected now
      and is tagged with __rcu and rcu APIs are used in corresponding places.
      Secondly, fn->rr_ptr is changed in rt6_select() which is a reader
      thread. This makes the issue a bit complicated. We think a valid
      solution for it is to let rt6_select() grab the tb6_lock if it decides
      to change it. As it is not in the normal operation and only happens when
      there is no valid neighbor cache for the route, we think the performance
      impact should be low.
      
      3. fib6_walk_continue() has to be called with tb6_lock held even in the
      route dumping related functions, e.g. inet6_dump_fib(),
      fib6_tables_dump() and ipv6_route_seq_ops. It is because
      fib6_walk_continue() makes modifications to the walker structure, and so
      are fib6_repair_tree() and fib6_del_route(). In order to do proper
      syncing between them, we need to let fib6_walk_continue() hold the lock.
      We may be able to do further improvement on the way we do the tree walk
      to get rid of the need for holding the spin lock. But not for now.
      
      4. When fib6_del_route() removes a route from the tree, we no longer
      mark rt->dst.rt6_next to NULL to make simultaneous reader be able to
      further traverse the list with rcu. However, rt->dst.rt6_next is only
      valid within this same rcu period. No one should access it later.
      
      5. All the operation of atomic_inc(rt->rt6i_ref) is changed to be
      performed before we publish this route (either by linking it to fn->leaf
      or insert it in the list pointed by fn->leaf) just to be safe because as
      soon as we publish the route, some read thread will be able to access it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66f5d6ce
    • W
      ipv6: add key length check into rt6_select() · 17ecf590
      Wei Wang 提交于
      After rwlock is replaced with rcu and spinlock, fib6_lookup() could
      potentially return an intermediate node if other thread is doing
      fib6_del() on a route which is the only route on the node so that
      fib6_repair_tree() will be called on this node and potentially assigns
      fn->leaf to the its child's fn->leaf.
      
      In order to detect this situation in rt6_select(), we have to check if
      fn->fn_bit is consistent with the key length stored in the route. And
      depending on if the fn is in the subtree or not, the key is either
      rt->rt6i_dst or rt->rt6i_src.
      If any inconsistency is found, that means the node no longer holds valid
      routes in it. So net->ipv6.ip6_null_entry is returned.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17ecf590
    • W
      ipv6: check fn->leaf before it is used · 8d1040e8
      Wei Wang 提交于
      If rwlock is replaced with rcu and spinlock, it is possible that the
      reader thread will see fn->leaf as NULL in the following scenarios:
      1. fib6_add() is in progress and we have already inserted a new node but
      not yet inserted the route.
      2. fib6_del_route() is in progress and we have already set fn->leaf to
      NULL but not yet freed the node because of rcu grace period.
      
      This patch makes sure all the reader threads check fn->leaf first before
      using it. And together with later patch to grab rcu_read_lock() and
      rcu_dereference() fn->leaf, it makes sure reader threads are safe when
      accessing fn->leaf.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d1040e8
    • W
      ipv6: replace dst_hold() with dst_hold_safe() in routing code · d3843fe5
      Wei Wang 提交于
      With rwlock, it is safe to call dst_hold() in the read thread because
      read thread is guaranteed to be separated from write thread.
      However, after we replace rwlock with rcu, it is no longer safe to use
      dst_hold(). A dst might already have been deleted but is waiting for the
      rcu grace period to pass before freeing the memory when a read thread is
      trying to do dst_hold(). This could potentially cause double free issue.
      
      So this commit replaces all dst_hold() with dst_hold_safe() in all read
      thread to avoid this double free issue.
      And in order to make the code more compact, a new function ip6_hold_safe()
      is introduced. It calls dst_hold_safe() first, and if that fails, it will
      either fall back to hold and return net->ipv6.ip6_null_entry or set rt to
      NULL according to the caller's need.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3843fe5
    • W
      ipv6: grab rt->rt6i_ref before allocating pcpu rt · a94b9367
      Wei Wang 提交于
      After rwlock is replaced with rcu and spinlock, ip6_pol_route() will be
      called with only rcu held. That means rt6 route deletion could happen
      simultaneously with rt6_make_pcpu_rt(). This could potentially cause
      memory leak if rt6_release() is called right before rt6_make_pcpu_rt()
      on the same route.
      
      This patch grabs rt->rt6i_ref safely before calling rt6_make_pcpu_rt()
      to make sure rt6_release() will not get triggered while
      rt6_make_pcpu_rt() is in progress. And rt6_release() is called after
      rt6_make_pcpu_rt() is finished.
      
      Note: As we are incrementing rt->rt6i_ref in ip6_pol_route(), there is a
      very slim chance that fib6_purge_rt() will be triggered unnecessarily
      when deleting a route if ip6_pol_route() running on another thread picks
      this route as well and tries to make pcpu cache for it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a94b9367
    • W
      ipv6: hook up exception table to store dst cache · 2b760fcf
      Wei Wang 提交于
      This commit makes use of the exception hash table implementation to
      store dst caches created by pmtu discovery and ip redirect into the hash
      table under the rt_info and no longer inserts these routes into fib6
      tree.
      This makes the fib6 tree only contain static configured routes and could
      now be protected by rcu instead of a rw lock.
      With this change, in the route lookup related functions, after finding
      the rt6_info with the longest prefix, we also need to search for the
      exception table before doing backtracking.
      In the route delete function, if the route being deleted is not a dst
      cache, deletion of this route also need to flush the whole hash table
      under it. If it is a dst cache, then only delete the cached dst in the
      hash table.
      
      Note: for fib6_walk_continue() function, w->root now is always pointing
      to a root node considering that fib6_prune_clones() is removed from the
      code. So we add a WARN_ON() msg to make sure w->root always points to a
      root node and also removed the update of w->root in fib6_repair_tree().
      This is a prerequisite for later patch because we don't need to make
      w->root as rcu protected when replacing rwlock with RCU.
      Also, we remove all prune related variables as it is no longer used.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b760fcf
    • W
      ipv6: prepare fib6_locate() for exception table · 38fbeeee
      Wei Wang 提交于
      fib6_locate() is used to find the fib6_node according to the passed in
      prefix address key. It currently tries to find the fib6_node with the
      exact match of the passed in key. However, when we move cached routes
      into the exception table, fib6_locate() will fail to find the fib6_node
      for it as the cached routes will be stored in the exception table under
      the fib6_node with the longest prefix match of the cache's dst addr key.
      This commit adds a new parameter to let the caller specify if it needs
      exact match or longest prefix match.
      Right now, all callers still does exact match when calling
      fib6_locate(). It will be changed in later commit where exception table
      is hooked up to store cached routes.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38fbeeee
    • W
      ipv6: prepare fib6_age() for exception table · c757faa8
      Wei Wang 提交于
      If all dst cache entries are stored in the exception table under the
      main route, we have to go through them during fib6_age() when doing
      garbage collecting.
      Introduce a new function rt6_age_exception() which goes through all dst
      entries in the exception table and remove those entries that are expired.
      This function is called in fib6_age() so that all dst caches are also
      garbage collected.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c757faa8
    • W
      ipv6: prepare rt6_clean_tohost() for exception table · b16cb459
      Wei Wang 提交于
      If we move all cached dst into the exception table under the main route,
      current rt6_clean_tohost() will no longer be able to access them.
      This commit makes fib6_clean_tohost() to also go through all cached
      routes in exception table and removes cached gateway routes to the
      passed in gateway.
      This is a preparation in order to move all cached routes into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b16cb459
    • W
      ipv6: prepare rt6_mtu_change() for exception table · f5bbe7ee
      Wei Wang 提交于
      If we move all cached dst into the exception table under the main route,
      current rt6_mtu_change() will no longer be able to access them.
      This commit makes rt6_mtu_change_route() function to also go through all
      cached routes in the exception table under the main route and do proper
      updates on the mtu.
      This is a preparation in order to move all cached routes into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5bbe7ee
    • W
      ipv6: prepare fib6_remove_prefsrc() for exception table · 60006a48
      Wei Wang 提交于
      After we move cached dst entries into the exception table under its
      parent route, current fib6_remove_prefsrc() no longer can access them.
      This commit makes fib6_remove_prefsrc() also go through all routes
      in the exception table to remove the pref src.
      This is a preparation patch in order to move all cached dst into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60006a48
    • W
      ipv6: introduce a hash table to store dst cache · 35732d01
      Wei Wang 提交于
      Add a hash table into struct rt6_info in order to store dst caches
      created by pmtu discovery and ip redirect in ipv6 routing code.
      APIs to add dst cache, delete dst cache, find dst cache and update
      dst cache in the hash table are implemented and will be used in later
      commits.
      This is a preparation work to move all cache routes into the exception
      table instead of getting inserted into the fib6 tree.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35732d01
  2. 29 8月, 2017 2 次提交
    • X
      ipv6: set dst.obsolete when a cached route has expired · 1e2ea8ad
      Xin Long 提交于
      Now it doesn't check for the cached route expiration in ipv6's
      dst_ops->check(), because it trusts dst_gc that would clean the
      cached route up when it's expired.
      
      The problem is in dst_gc, it would clean the cached route only
      when it's refcount is 1. If some other module (like xfrm) keeps
      holding it and the module only release it when dst_ops->check()
      fails.
      
      But without checking for the cached route expiration, .check()
      may always return true. Meanwhile, without releasing the cached
      route, dst_gc couldn't del it. It will cause this cached route
      never to expire.
      
      This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
      when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
      in .check.
      
      Note that this is even needed when ipv6 dst_gc timer is removed
      one day. It would set dst.obsolete in .redirect and .update_pmtu
      instead, and check for cached route expiration when getting it,
      just like what ipv4 route does.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e2ea8ad
    • W
      ipv6: fix sparse warning on rt6i_node · 4e587ea7
      Wei Wang 提交于
      Commit c5cff856 adds rcu grace period before freeing fib6_node. This
      generates a new sparse warning on rt->rt6i_node related code:
        net/ipv6/route.c:1394:30: error: incompatible types in comparison
        expression (different address spaces)
        ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
        expression (different address spaces)
      
      This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
      rcu API is used for it.
      After this fix, sparse no longer generates the above warning.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e587ea7
  3. 26 8月, 2017 1 次提交
  4. 25 8月, 2017 3 次提交
  5. 23 8月, 2017 1 次提交
  6. 22 8月, 2017 1 次提交
    • D
      net: ipv6: put host and anycast routes on device with address · 4832c30d
      David Ahern 提交于
      One nagging difference between ipv4 and ipv6 is host routes for ipv6
      addresses are installed using the loopback device or VRF / L3 Master
      device. e.g.,
      
          2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
          local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium
      
      Using the loopback device is convenient -- necessary for local tx, but
      has some nasty side effects, most notably setting the 'lo' device down
      causes all host routes for all local IPv6 address to be removed from the
      FIB and completely breaks IPv6 networking across all interfaces.
      
      This patch puts FIB entries for IPv6 routes against the device. This
      simplifies the routes in the FIB, for example by making dst->dev and
      rt6i_idev->dev the same (a future patch can look at removing the device
      reference taken for rt6i_idev for FIB entries).
      
      When copies are made on FIB lookups, the cloned route has dst->dev
      set to loopback (or the L3 master device). This is needed for the
      local Tx of packets to local addresses.
      
      With fib entries allocated against the real network device, the addrconf
      code that reinserts host routes on admin up of 'lo' is no longer needed.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4832c30d
  7. 19 8月, 2017 1 次提交
    • A
      ipv6: fix false-postive maybe-uninitialized warning · 401481e0
      Arnd Bergmann 提交于
      Adding a lock around one of the assignments prevents gcc from
      tracking the state of the local 'fibmatch' variable, so it can no
      longer prove that 'dst' is always initialized, leading to a bogus
      warning:
      
      net/ipv6/route.c: In function 'inet6_rtm_getroute':
      net/ipv6/route.c:3659:2: error: 'dst' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This moves the other assignment into the same lock to shut up the
      warning.
      
      Fixes: 121622db ("ipv6: route: make rtm_getroute not assume rtnl is locked")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      401481e0
  8. 16 8月, 2017 4 次提交
    • F
      e3a22b7f
    • F
      ipv6: route: make rtm_getroute not assume rtnl is locked · 121622db
      Florian Westphal 提交于
      __dev_get_by_index assumes RTNL is held, use _rcu version instead.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      121622db
    • E
      ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80
      Eric Dumazet 提交于
      Based on a syzkaller report [1], I found that a per cpu allocation
      failure in snmp6_alloc_dev() would then lead to NULL dereference in
      ip6_route_dev_notify().
      
      It seems this is a very old bug, thus no Fixes tag in this submission.
      
      Let's add in6_dev_put_clear() helper, as we will probably use
      it elsewhere (once available/present in net-next)
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff88019f456680 task.stack: ffff8801c6e58000
      RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
      RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
      RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
      RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
      RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
      R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
      FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
      DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
       in6_dev_put include/net/addrconf.h:335 [inline]
       ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
       call_netdevice_notifiers net/core/dev.c:1694 [inline]
       rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
       rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
       register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
       register_netdev+0x1a/0x30 net/core/dev.c:7669
       loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
       ops_init+0x10a/0x570 net/core/net_namespace.c:118
       setup_net+0x313/0x710 net/core/net_namespace.c:294
       copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
       create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
       SYSC_unshare kernel/fork.c:2347 [inline]
       SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512c9
      RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
      R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
      Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
      f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
      0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
      RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
      ffff8801c6e5f1b0
      ---[ end trace e441d046c6410d31 ]---
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d94a80
    • I
      ipv6: fib: Provide offload indication using nexthop flags · fe400799
      Ido Schimmel 提交于
      IPv6 routes currently lack nexthop flags as in IPv4. This has several
      implications.
      
      In the forwarding path, it requires us to check the carrier state of the
      nexthop device and potentially ignore a linkdown route, instead of
      checking for RTNH_F_LINKDOWN.
      
      It also requires capable drivers to use the user facing IPv6-specific
      route flags to provide offload indication, instead of using the nexthop
      flags as in IPv4.
      
      Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
      provide offload indication instead of the RTF_OFFLOAD flag, which is
      removed while it's still not part of any official kernel release.
      
      In the near future we would like to use the field for the
      RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
      not be ready in time for the current cycle.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe400799
  9. 15 8月, 2017 1 次提交
  10. 10 8月, 2017 1 次提交
  11. 09 8月, 2017 1 次提交
    • V
      net: ipv6: avoid overhead when no custom FIB rules are installed · feca7d8c
      Vincent Bernat 提交于
      If the user hasn't installed any custom rules, don't go through the
      whole FIB rules layer. This is pretty similar to f4530fa5 (ipv4:
      Avoid overhead when no custom FIB rules are installed).
      
      Using a micro-benchmark module [1], timing ip6_route_output() with
      get_cycles(), with 40,000 routes in the main routing table, before this
      patch:
      
          min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                         199
            880 │▒▒▒░░░░░░░░░░░░░░░░                                      43
           1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░                                  48
           1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░                               43
           1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░                          59
           2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                      50
           2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    26
           2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                  31
           2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               28
           3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              17
           3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             17
           3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           11
           4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            6
           4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           6
           4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           9
      
      After:
      
          min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                        201
            800 │▒▒▒▒▒░░░░░░░░░░░░░░░░                                    63
           1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░                               68
           1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░                            39
           1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                         32
           1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                       32
           2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    34
           2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                 33
           2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               26
           2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              22
           3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              9
           3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             9
           3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            8
           4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
           4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
      
      At the frequency of the host during the bench (~ 3.7 GHz), this is
      about a 100 ns difference on the median value.
      
      A next step would be to collapse local and main tables, as in
      0ddcf43d (ipv4: FIB Local/MAIN table collapse).
      
      [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.cSigned-off-by: NVincent Bernat <vincent@bernat.im>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feca7d8c
  12. 04 8月, 2017 2 次提交
    • I
      ipv6: fib: Add offload indication to routes · 61e4d01e
      Ido Schimmel 提交于
      Allow user space applications to see which routes are offloaded and
      which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
      
      To be consistent with IPv4, offload indication is provided on a
      per-nexthop basis.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61e4d01e
    • X
      ipv6: set rt6i_protocol properly in the route when it is installed · b91d5329
      Xin Long 提交于
      After commit c2ed1880 ("net: ipv6: check route protocol when
      deleting routes"), ipv6 route checks rt protocol when trying to
      remove a rt entry.
      
      It introduced a side effect causing 'ip -6 route flush cache' not
      to work well. When flushing caches with iproute, all route caches
      get dumped from kernel then removed one by one by sending DELROUTE
      requests to kernel for each cache.
      
      The thing is iproute sends the request with the cache whose proto
      is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
      it. But in kernel the rt_cache protocol is still 0, which causes
      the cache not to be matched and removed.
      
      So the real reason is rt6i_protocol in the route is not set when
      it is allocated. As David Ahern's suggestion, this patch is to
      set rt6i_protocol properly in the route when it is installed and
      remove the codes setting rtm_protocol according to rt6i_flags in
      rt6_fill_node.
      
      This is also an improvement to keep rt6i_protocol consistent with
      rtm_protocol.
      
      Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b91d5329
  13. 06 7月, 2017 1 次提交
  14. 22 6月, 2017 1 次提交
  15. 18 6月, 2017 7 次提交