1. 22 12月, 2017 1 次提交
    • I
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel 提交于
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58acfd71
  2. 17 12月, 2017 1 次提交
    • B
      ipv6: icmp6: Allow icmp messages to be looped back · 588753f1
      Brendan McGrath 提交于
      One example of when an ICMPv6 packet is required to be looped back is
      when a host acts as both a Multicast Listener and a Multicast Router.
      
      A Multicast Router will listen on address ff02::16 for MLDv2 messages.
      
      Currently, MLDv2 messages originating from a Multicast Listener running
      on the same host as the Multicast Router are not being delivered to the
      Multicast Router. This is due to dst.input being assigned the default
      value of dst_discard.
      
      This results in the packet being looped back but discarded before being
      delivered to the Multicast Router.
      
      This patch sets dst.input to ip6_input to ensure a looped back packet
      is delivered to the Multicast Router.
      Signed-off-by: NBrendan McGrath <redmcg@redmandi.dyndns.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      588753f1
  3. 24 11月, 2017 2 次提交
    • D
      net: ipv6: Fixup device for anycast routes during copy · 98d11291
      David Ahern 提交于
      Florian reported a breakage with anycast routes due to commit
      4832c30d ("net: ipv6: put host and anycast routes on device with
      address"). Prior to this commit anycast routes were added against the
      loopback device causing repetitive route entries with no insight into
      why they existed. e.g.:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
      
      The point of commit 4832c30d is to add the routes using the device
      with the address which is causing the route to be added. e.g.,:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth1 proto kernel metric 0 pref medium
      
      For traffic to work as it did before, the dst device needs to be switched
      to the loopback when the copy is created similar to local routes.
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98d11291
    • I
      ipv6: Do not consider linkdown nexthops during multipath · bbfcd776
      Ido Schimmel 提交于
      When the 'ignore_routes_with_linkdown' sysctl is set, we should not
      consider linkdown nexthops during route lookup.
      
      While the code correctly verifies that the initially selected route
      ('match') has a carrier, it does not perform the same check in the
      subsequent multipath selection, resulting in a potential packet loss.
      
      In case the chosen route does not have a carrier and the sysctl is set,
      choose the initially selected route.
      
      Fixes: 35103d11 ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbfcd776
  4. 15 11月, 2017 1 次提交
  5. 29 10月, 2017 1 次提交
    • W
      ipv6: prevent user from adding cached routes · 2ea2352e
      Wei Wang 提交于
      Cached routes should only be created by the system when receiving pmtu
      discovery or ip redirect msg. Users should not be allowed to create
      cached routes.
      
      Furthermore, after the patch series to move cached routes into exception
      table, user added cached routes will trigger the following warning in
      fib6_add():
      
      WARNING: CPU: 0 PID: 2985 at net/ipv6/ip6_fib.c:1137
      fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 2985 Comm: syzkaller320388 Not tainted 4.14.0-rc3+ #74
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:181
       __warn+0x1c4/0x1d9 kernel/panic.c:542
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
       do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      RSP: 0018:ffff8801cf09f6a0 EFLAGS: 00010297
      RAX: ffff8801ce45e340 RBX: 1ffff10039e13eec RCX: ffff8801d749c814
      RDX: 0000000000000000 RSI: ffff8801d749c700 RDI: ffff8801d749c780
      RBP: ffff8801cf09fa08 R08: 0000000000000000 R09: ffff8801cf09f360
      R10: ffff8801cf09f2d8 R11: 1ffff10039c8befb R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d749c700 R15: ffffffff860655c0
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_add+0x148/0x1a0 net/ipv6/route.c:2782
       ipv6_route_ioctl+0x4d5/0x690 net/ipv6/route.c:3291
       inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:521
       sock_do_ioctl+0x65/0xb0 net/socket.c:961
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      So we fix this by failing the attemp to add cached routes from userspace
      with returning EINVAL error.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2352e
  6. 24 10月, 2017 1 次提交
    • W
      ipv6: add ip6_null_entry check in rt6_select() · 87b1af8d
      Wei Wang 提交于
      In rt6_select(), fn->leaf could be pointing to net->ipv6.ip6_null_entry.
      In this case, we should directly return instead of trying to carry on
      with the rest of the process.
      If not, we could crash at:
        spin_lock_bh(&leaf->rt6i_table->rt6_lock);
      because net->ipv6.ip6_null_entry does not have rt6i_table set.
      
      Syzkaller recently reported following issue on net-next:
      Use struct sctp_sack_info instead
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      sctp: [Deprecated]: syz-executor4 (pid 26496) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
      CPU: 1 PID: 26523 Comm: syz-executor6 Not tainted 4.14.0-rc4+ #85
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d147e3c0 task.stack: ffff8801a4328000
      RIP: 0010:debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
      RIP: 0010:do_raw_spin_lock+0x23/0x1e0 kernel/locking/spinlock_debug.c:112
      RSP: 0018:ffff8801a432ed70 EFLAGS: 00010207
      RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000
      RDX: 0000000000000003 RSI: 0000000000000000 RDI: 000000000000001c
      RBP: ffff8801a432ed90 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: ffffffff8482b279 R12: ffff8801ce2ff3a0
      sctp: [Deprecated]: syz-executor1 (pid 26546) Use of int in maxseg socket option.
      Use struct sctp_assoc_value instead
      R13: dffffc0000000000 R14: ffff8801d971e000 R15: ffff8801ce2ff0d8
      FS:  00007f56e82f5700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001a4a04000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
       _raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:175
       spin_lock_bh include/linux/spinlock.h:321 [inline]
       rt6_select net/ipv6/route.c:786 [inline]
       ip6_pol_route+0x1be3/0x3bd0 net/ipv6/route.c:1650
      sctp: [Deprecated]: syz-executor1 (pid 26576) Use of int in maxseg socket option.
      Use struct sctp_assoc_value instead
      TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1843
       fib6_rule_lookup+0x9e/0x2a0 net/ipv6/ip6_fib.c:309
       ip6_route_output_flags+0x1f1/0x2b0 net/ipv6/route.c:1871
       ip6_route_output include/net/ip6_route.h:80 [inline]
       ip6_dst_lookup_tail+0x4ea/0x970 net/ipv6/ip6_output.c:953
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1076
       sctp_v6_get_dst+0x675/0x1c30 net/sctp/ipv6.c:274
       sctp_transport_route+0xa8/0x430 net/sctp/transport.c:287
       sctp_assoc_add_peer+0x4fe/0x1100 net/sctp/associola.c:656
       __sctp_connect+0x251/0xc80 net/sctp/socket.c:1187
       sctp_connect+0xb4/0xf0 net/sctp/socket.c:4209
       inet_dgram_connect+0x16b/0x1f0 net/ipv4/af_inet.c:541
       SYSC_connect+0x20a/0x480 net/socket.c:1642
       SyS_connect+0x24/0x30 net/socket.c:1623
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b1af8d
  7. 21 10月, 2017 3 次提交
  8. 11 10月, 2017 2 次提交
  9. 10 10月, 2017 2 次提交
  10. 09 10月, 2017 1 次提交
    • E
      ipv6: fix a BUG in rt6_get_pcpu_route() · 951f788a
      Eric Dumazet 提交于
      Ido reported following splat and provided a patch.
      
      [  122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672
      [  122.221845] caller is debug_smp_processor_id+0x17/0x20
      [  122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639
      [  122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  122.221893] Call Trace:
      [  122.221919]  dump_stack+0xb1/0x10c
      [  122.221946]  ? _atomic_dec_and_lock+0x124/0x124
      [  122.221974]  ? ___ratelimit+0xfe/0x240
      [  122.222020]  check_preemption_disabled+0x173/0x1b0
      [  122.222060]  debug_smp_processor_id+0x17/0x20
      [  122.222083]  ip6_pol_route+0x1482/0x24a0
      ...
      
      I believe we can simplify this code path a bit, since we no longer
      hold a read_lock and need to release it to avoid a dead lock.
      
      By disabling BH, we make sure we'll prevent code re-entry and
      rt6_get_pcpu_route()/rt6_make_pcpu_route() run on the same cpu.
      
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      951f788a
  11. 08 10月, 2017 13 次提交
    • W
      ipv6: take care of rt6_stats · 81eb8447
      Wei Wang 提交于
      Currently, most of the rt6_stats are not hooked up correctly. As the
      last part of this patch series, hook up all existing rt6_stats and add
      one new stat fib_rt_uncache to indicate the number of routes in the
      uncached list.
      For details of the stats, please refer to the comments added in
      include/net/ip6_fib.h.
      
      Note: fib_rt_alloc and fib_rt_uncache are not guaranteed to be modified
      under a lock. So atomic_t is used for them.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81eb8447
    • W
      ipv6: replace rwlock with rcu and spinlock in fib6_table · 66f5d6ce
      Wei Wang 提交于
      With all the preparation work before, we are now ready to replace rwlock
      with rcu and spinlock in fib6_table.
      That means now all fib6_node in fib6_table are protected by rcu. And
      when freeing fib6_node, call_rcu() is used to wait for the rcu grace
      period before releasing the memory.
      When accessing fib6_node, corresponding rcu APIs need to be used.
      And all previous sessions protected by the write lock will now be
      protected by the spin lock per table.
      All previous sessions protected by read lock will now be protected by
      rcu_read_lock().
      
      A couple of things to note here:
      1. As part of the work of replacing rwlock with rcu, the linked list of
      fn->leaf now has to be rcu protected as well. So both fn->leaf and
      rt->dst.rt6_next are now __rcu tagged and corresponding rcu APIs are
      used when manipulating them.
      
      2. For fn->rr_ptr, first of all, it also needs to be rcu protected now
      and is tagged with __rcu and rcu APIs are used in corresponding places.
      Secondly, fn->rr_ptr is changed in rt6_select() which is a reader
      thread. This makes the issue a bit complicated. We think a valid
      solution for it is to let rt6_select() grab the tb6_lock if it decides
      to change it. As it is not in the normal operation and only happens when
      there is no valid neighbor cache for the route, we think the performance
      impact should be low.
      
      3. fib6_walk_continue() has to be called with tb6_lock held even in the
      route dumping related functions, e.g. inet6_dump_fib(),
      fib6_tables_dump() and ipv6_route_seq_ops. It is because
      fib6_walk_continue() makes modifications to the walker structure, and so
      are fib6_repair_tree() and fib6_del_route(). In order to do proper
      syncing between them, we need to let fib6_walk_continue() hold the lock.
      We may be able to do further improvement on the way we do the tree walk
      to get rid of the need for holding the spin lock. But not for now.
      
      4. When fib6_del_route() removes a route from the tree, we no longer
      mark rt->dst.rt6_next to NULL to make simultaneous reader be able to
      further traverse the list with rcu. However, rt->dst.rt6_next is only
      valid within this same rcu period. No one should access it later.
      
      5. All the operation of atomic_inc(rt->rt6i_ref) is changed to be
      performed before we publish this route (either by linking it to fn->leaf
      or insert it in the list pointed by fn->leaf) just to be safe because as
      soon as we publish the route, some read thread will be able to access it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66f5d6ce
    • W
      ipv6: add key length check into rt6_select() · 17ecf590
      Wei Wang 提交于
      After rwlock is replaced with rcu and spinlock, fib6_lookup() could
      potentially return an intermediate node if other thread is doing
      fib6_del() on a route which is the only route on the node so that
      fib6_repair_tree() will be called on this node and potentially assigns
      fn->leaf to the its child's fn->leaf.
      
      In order to detect this situation in rt6_select(), we have to check if
      fn->fn_bit is consistent with the key length stored in the route. And
      depending on if the fn is in the subtree or not, the key is either
      rt->rt6i_dst or rt->rt6i_src.
      If any inconsistency is found, that means the node no longer holds valid
      routes in it. So net->ipv6.ip6_null_entry is returned.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17ecf590
    • W
      ipv6: check fn->leaf before it is used · 8d1040e8
      Wei Wang 提交于
      If rwlock is replaced with rcu and spinlock, it is possible that the
      reader thread will see fn->leaf as NULL in the following scenarios:
      1. fib6_add() is in progress and we have already inserted a new node but
      not yet inserted the route.
      2. fib6_del_route() is in progress and we have already set fn->leaf to
      NULL but not yet freed the node because of rcu grace period.
      
      This patch makes sure all the reader threads check fn->leaf first before
      using it. And together with later patch to grab rcu_read_lock() and
      rcu_dereference() fn->leaf, it makes sure reader threads are safe when
      accessing fn->leaf.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d1040e8
    • W
      ipv6: replace dst_hold() with dst_hold_safe() in routing code · d3843fe5
      Wei Wang 提交于
      With rwlock, it is safe to call dst_hold() in the read thread because
      read thread is guaranteed to be separated from write thread.
      However, after we replace rwlock with rcu, it is no longer safe to use
      dst_hold(). A dst might already have been deleted but is waiting for the
      rcu grace period to pass before freeing the memory when a read thread is
      trying to do dst_hold(). This could potentially cause double free issue.
      
      So this commit replaces all dst_hold() with dst_hold_safe() in all read
      thread to avoid this double free issue.
      And in order to make the code more compact, a new function ip6_hold_safe()
      is introduced. It calls dst_hold_safe() first, and if that fails, it will
      either fall back to hold and return net->ipv6.ip6_null_entry or set rt to
      NULL according to the caller's need.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3843fe5
    • W
      ipv6: grab rt->rt6i_ref before allocating pcpu rt · a94b9367
      Wei Wang 提交于
      After rwlock is replaced with rcu and spinlock, ip6_pol_route() will be
      called with only rcu held. That means rt6 route deletion could happen
      simultaneously with rt6_make_pcpu_rt(). This could potentially cause
      memory leak if rt6_release() is called right before rt6_make_pcpu_rt()
      on the same route.
      
      This patch grabs rt->rt6i_ref safely before calling rt6_make_pcpu_rt()
      to make sure rt6_release() will not get triggered while
      rt6_make_pcpu_rt() is in progress. And rt6_release() is called after
      rt6_make_pcpu_rt() is finished.
      
      Note: As we are incrementing rt->rt6i_ref in ip6_pol_route(), there is a
      very slim chance that fib6_purge_rt() will be triggered unnecessarily
      when deleting a route if ip6_pol_route() running on another thread picks
      this route as well and tries to make pcpu cache for it.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a94b9367
    • W
      ipv6: hook up exception table to store dst cache · 2b760fcf
      Wei Wang 提交于
      This commit makes use of the exception hash table implementation to
      store dst caches created by pmtu discovery and ip redirect into the hash
      table under the rt_info and no longer inserts these routes into fib6
      tree.
      This makes the fib6 tree only contain static configured routes and could
      now be protected by rcu instead of a rw lock.
      With this change, in the route lookup related functions, after finding
      the rt6_info with the longest prefix, we also need to search for the
      exception table before doing backtracking.
      In the route delete function, if the route being deleted is not a dst
      cache, deletion of this route also need to flush the whole hash table
      under it. If it is a dst cache, then only delete the cached dst in the
      hash table.
      
      Note: for fib6_walk_continue() function, w->root now is always pointing
      to a root node considering that fib6_prune_clones() is removed from the
      code. So we add a WARN_ON() msg to make sure w->root always points to a
      root node and also removed the update of w->root in fib6_repair_tree().
      This is a prerequisite for later patch because we don't need to make
      w->root as rcu protected when replacing rwlock with RCU.
      Also, we remove all prune related variables as it is no longer used.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b760fcf
    • W
      ipv6: prepare fib6_locate() for exception table · 38fbeeee
      Wei Wang 提交于
      fib6_locate() is used to find the fib6_node according to the passed in
      prefix address key. It currently tries to find the fib6_node with the
      exact match of the passed in key. However, when we move cached routes
      into the exception table, fib6_locate() will fail to find the fib6_node
      for it as the cached routes will be stored in the exception table under
      the fib6_node with the longest prefix match of the cache's dst addr key.
      This commit adds a new parameter to let the caller specify if it needs
      exact match or longest prefix match.
      Right now, all callers still does exact match when calling
      fib6_locate(). It will be changed in later commit where exception table
      is hooked up to store cached routes.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38fbeeee
    • W
      ipv6: prepare fib6_age() for exception table · c757faa8
      Wei Wang 提交于
      If all dst cache entries are stored in the exception table under the
      main route, we have to go through them during fib6_age() when doing
      garbage collecting.
      Introduce a new function rt6_age_exception() which goes through all dst
      entries in the exception table and remove those entries that are expired.
      This function is called in fib6_age() so that all dst caches are also
      garbage collected.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c757faa8
    • W
      ipv6: prepare rt6_clean_tohost() for exception table · b16cb459
      Wei Wang 提交于
      If we move all cached dst into the exception table under the main route,
      current rt6_clean_tohost() will no longer be able to access them.
      This commit makes fib6_clean_tohost() to also go through all cached
      routes in exception table and removes cached gateway routes to the
      passed in gateway.
      This is a preparation in order to move all cached routes into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b16cb459
    • W
      ipv6: prepare rt6_mtu_change() for exception table · f5bbe7ee
      Wei Wang 提交于
      If we move all cached dst into the exception table under the main route,
      current rt6_mtu_change() will no longer be able to access them.
      This commit makes rt6_mtu_change_route() function to also go through all
      cached routes in the exception table under the main route and do proper
      updates on the mtu.
      This is a preparation in order to move all cached routes into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5bbe7ee
    • W
      ipv6: prepare fib6_remove_prefsrc() for exception table · 60006a48
      Wei Wang 提交于
      After we move cached dst entries into the exception table under its
      parent route, current fib6_remove_prefsrc() no longer can access them.
      This commit makes fib6_remove_prefsrc() also go through all routes
      in the exception table to remove the pref src.
      This is a preparation patch in order to move all cached dst into the
      exception table.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60006a48
    • W
      ipv6: introduce a hash table to store dst cache · 35732d01
      Wei Wang 提交于
      Add a hash table into struct rt6_info in order to store dst caches
      created by pmtu discovery and ip redirect in ipv6 routing code.
      APIs to add dst cache, delete dst cache, find dst cache and update
      dst cache in the hash table are implemented and will be used in later
      commits.
      This is a preparation work to move all cache routes into the exception
      table instead of getting inserted into the fib6 tree.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35732d01
  12. 29 8月, 2017 2 次提交
    • X
      ipv6: set dst.obsolete when a cached route has expired · 1e2ea8ad
      Xin Long 提交于
      Now it doesn't check for the cached route expiration in ipv6's
      dst_ops->check(), because it trusts dst_gc that would clean the
      cached route up when it's expired.
      
      The problem is in dst_gc, it would clean the cached route only
      when it's refcount is 1. If some other module (like xfrm) keeps
      holding it and the module only release it when dst_ops->check()
      fails.
      
      But without checking for the cached route expiration, .check()
      may always return true. Meanwhile, without releasing the cached
      route, dst_gc couldn't del it. It will cause this cached route
      never to expire.
      
      This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
      when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
      in .check.
      
      Note that this is even needed when ipv6 dst_gc timer is removed
      one day. It would set dst.obsolete in .redirect and .update_pmtu
      instead, and check for cached route expiration when getting it,
      just like what ipv4 route does.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e2ea8ad
    • W
      ipv6: fix sparse warning on rt6i_node · 4e587ea7
      Wei Wang 提交于
      Commit c5cff856 adds rcu grace period before freeing fib6_node. This
      generates a new sparse warning on rt->rt6i_node related code:
        net/ipv6/route.c:1394:30: error: incompatible types in comparison
        expression (different address spaces)
        ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
        expression (different address spaces)
      
      This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
      rcu API is used for it.
      After this fix, sparse no longer generates the above warning.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e587ea7
  13. 26 8月, 2017 1 次提交
  14. 25 8月, 2017 3 次提交
  15. 23 8月, 2017 1 次提交
  16. 22 8月, 2017 1 次提交
    • D
      net: ipv6: put host and anycast routes on device with address · 4832c30d
      David Ahern 提交于
      One nagging difference between ipv4 and ipv6 is host routes for ipv6
      addresses are installed using the loopback device or VRF / L3 Master
      device. e.g.,
      
          2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
          local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium
      
      Using the loopback device is convenient -- necessary for local tx, but
      has some nasty side effects, most notably setting the 'lo' device down
      causes all host routes for all local IPv6 address to be removed from the
      FIB and completely breaks IPv6 networking across all interfaces.
      
      This patch puts FIB entries for IPv6 routes against the device. This
      simplifies the routes in the FIB, for example by making dst->dev and
      rt6i_idev->dev the same (a future patch can look at removing the device
      reference taken for rt6i_idev for FIB entries).
      
      When copies are made on FIB lookups, the cloned route has dst->dev
      set to loopback (or the L3 master device). This is needed for the
      local Tx of packets to local addresses.
      
      With fib entries allocated against the real network device, the addrconf
      code that reinserts host routes on admin up of 'lo' is no longer needed.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4832c30d
  17. 19 8月, 2017 1 次提交
    • A
      ipv6: fix false-postive maybe-uninitialized warning · 401481e0
      Arnd Bergmann 提交于
      Adding a lock around one of the assignments prevents gcc from
      tracking the state of the local 'fibmatch' variable, so it can no
      longer prove that 'dst' is always initialized, leading to a bogus
      warning:
      
      net/ipv6/route.c: In function 'inet6_rtm_getroute':
      net/ipv6/route.c:3659:2: error: 'dst' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This moves the other assignment into the same lock to shut up the
      warning.
      
      Fixes: 121622db ("ipv6: route: make rtm_getroute not assume rtnl is locked")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      401481e0
  18. 16 8月, 2017 3 次提交
    • F
      e3a22b7f
    • F
      ipv6: route: make rtm_getroute not assume rtnl is locked · 121622db
      Florian Westphal 提交于
      __dev_get_by_index assumes RTNL is held, use _rcu version instead.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      121622db
    • E
      ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80
      Eric Dumazet 提交于
      Based on a syzkaller report [1], I found that a per cpu allocation
      failure in snmp6_alloc_dev() would then lead to NULL dereference in
      ip6_route_dev_notify().
      
      It seems this is a very old bug, thus no Fixes tag in this submission.
      
      Let's add in6_dev_put_clear() helper, as we will probably use
      it elsewhere (once available/present in net-next)
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff88019f456680 task.stack: ffff8801c6e58000
      RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
      RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
      RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
      RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
      RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
      R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
      FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
      DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
       in6_dev_put include/net/addrconf.h:335 [inline]
       ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
       call_netdevice_notifiers net/core/dev.c:1694 [inline]
       rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
       rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
       register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
       register_netdev+0x1a/0x30 net/core/dev.c:7669
       loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
       ops_init+0x10a/0x570 net/core/net_namespace.c:118
       setup_net+0x313/0x710 net/core/net_namespace.c:294
       copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
       create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
       SYSC_unshare kernel/fork.c:2347 [inline]
       SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512c9
      RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
      R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
      Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
      f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
      0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
      RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
      ffff8801c6e5f1b0
      ---[ end trace e441d046c6410d31 ]---
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d94a80