1. 16 10月, 2018 2 次提交
    • E
      ipv6: mcast: fix a use-after-free in inet6_mc_check · dc012f36
      Eric Dumazet 提交于
      syzbot found a use-after-free in inet6_mc_check [1]
      
      The problem here is that inet6_mc_check() uses rcu
      and read_lock(&iml->sflock)
      
      So the fact that ip6_mc_leave_src() is called under RTNL
      and the socket lock does not help us, we need to acquire
      iml->sflock in write mode.
      
      In the future, we should convert all this stuff to RCU.
      
      [1]
      BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
      BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
      Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432
      
      CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       ipv6_addr_equal include/net/ipv6.h:521 [inline]
       inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
       __raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
       ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
       raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
       ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
       ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
       dst_input include/net/dst.h:450 [inline]
       ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
       __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
       netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
       napi_frags_finish net/core/dev.c:5664 [inline]
       napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
       tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
       call_write_iter include/linux/fs.h:1808 [inline]
       do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
       do_iter_write+0x185/0x5f0 fs/read_write.c:959
       vfs_writev+0x1f1/0x360 fs/read_write.c:1004
       do_writev+0x11a/0x310 fs/read_write.c:1039
       __do_sys_writev fs/read_write.c:1112 [inline]
       __se_sys_writev fs/read_write.c:1109 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457421
      Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
      RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
      RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
      RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
      R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff
      
      Allocated by task 22437:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       __do_kmalloc mm/slab.c:3718 [inline]
       __kmalloc+0x14e/0x760 mm/slab.c:3727
       kmalloc include/linux/slab.h:518 [inline]
       sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
       ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
       do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
       ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
       rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
       sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
       __sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
       __do_sys_setsockopt net/socket.c:1913 [inline]
       __se_sys_setsockopt net/socket.c:1910 [inline]
       __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 22430:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xcf/0x230 mm/slab.c:3813
       __sock_kfree_s net/core/sock.c:2004 [inline]
       sock_kfree_s+0x29/0x60 net/core/sock.c:2010
       ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
       __ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
       ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
       inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
       __sock_release+0xd7/0x250 net/socket.c:579
       sock_close+0x19/0x20 net/socket.c:1141
       __fput+0x385/0xa30 fs/file_table.c:278
       ____fput+0x15/0x20 fs/file_table.c:309
       task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
       do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801ce7f2500
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 16 bytes inside of
       192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
      The buggy address belongs to the page:
      page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
      raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
       ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc012f36
    • S
      ipv6: rate-limit probes for neighbourless routes · f547fac6
      Sabrina Dubroca 提交于
      When commit 27097255 ("[IPV6]: ROUTE: Add Router Reachability
      Probing (RFC4191).") introduced router probing, the rt6_probe() function
      required that a neighbour entry existed. This neighbour entry is used to
      record the timestamp of the last probe via the ->updated field.
      
      Later, commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      removed the requirement for a neighbour entry. Neighbourless routes skip
      the interval check and are not rate-limited.
      
      This patch adds rate-limiting for neighbourless routes, by recording the
      timestamp of the last probe in the fib6_info itself.
      
      Fixes: 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f547fac6
  2. 11 10月, 2018 1 次提交
  3. 06 10月, 2018 1 次提交
    • W
      ipv6: take rcu lock in rawv6_send_hdrinc() · a688caa3
      Wei Wang 提交于
      In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
      directly assign the dst to skb and set passed in dst to NULL to avoid
      double free.
      However, in error case, we free skb and then do stats update with the
      dst pointer passed in. This causes use-after-free on the dst.
      Fix it by taking rcu read lock right before dst could get released to
      make sure dst does not get freed until the stats update is done.
      Note: we don't have this issue in ipv4 cause dst is not used for stats
      update in v4.
      
      Syzkaller reported following crash:
      BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
      BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
      Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
      
      CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
       rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457099
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
      RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
      
      Allocated by task 32088:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:105
       ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
       ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
       ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
       fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
       ip6_route_output include/net/ip6_route.h:88 [inline]
       ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5356:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:141
       dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
       __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
       rcu_do_batch kernel/rcu/tree.c:2576 [inline]
       invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
       __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
       rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
       __do_softirq+0x30b/0xad8 kernel/softirq.c:292
      
      Fixes: 1789a640 ("raw: avoid two atomics in xmit")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a688caa3
  4. 27 9月, 2018 1 次提交
  5. 22 9月, 2018 1 次提交
    • J
      net/ipv6: Display all addresses in output of /proc/net/if_inet6 · 86f9bd1f
      Jeff Barnhill 提交于
      The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
      handle starting/stopping the iteration.  The problem is that at some point
      during the iteration, an overflow is detected and the process is
      subsequently stopped.  The item being shown via seq_printf() when the
      overflow occurs is not actually shown, though.  When start() is
      subsequently called to resume iterating, it returns the next item, and
      thus the item that was being processed when the overflow occurred never
      gets printed.
      
      Alter the meaning of the private data member "offset".  Currently, when it
      is not 0 (which only happens at the very beginning), "offset" represents
      the next hlist item to be printed.  After this change, "offset" always
      represents the current item.
      
      This is also consistent with the private data member "bucket", which
      represents the current bucket, and also the use of "pos" as defined in
      seq_file.txt:
          The pos passed to start() will always be either zero, or the most
          recent pos used in the previous session.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f9bd1f
  6. 20 9月, 2018 1 次提交
    • P
      ip6_tunnel: be careful when accessing the inner header · 76c0ddd8
      Paolo Abeni 提交于
      the ip6 tunnel xmit ndo assumes that the processed skb always
      contains an ip[v6] header, but syzbot has found a way to send
      frames that fall short of this assumption, leading to the following splat:
      
      BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
      [inline]
      BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
      net/ipv6/ip6_tunnel.c:1390
      CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:53
        kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
        __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
        ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
        ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
        __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
        netdev_start_xmit include/linux/netdevice.h:4075 [inline]
        xmit_one net/core/dev.c:3026 [inline]
        dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
        __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
        dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
        packet_snd net/packet/af_packet.c:2944 [inline]
        packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x441819
      RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
      RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
      R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
        kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
        kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
        slab_post_alloc_hook mm/slab.h:445 [inline]
        slab_alloc_node mm/slub.c:2737 [inline]
        __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:984 [inline]
        alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
        sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
        packet_alloc_skb net/packet/af_packet.c:2803 [inline]
        packet_snd net/packet/af_packet.c:2894 [inline]
        packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      This change addresses the issue adding the needed check before
      accessing the inner header.
      
      The ipv4 side of the issue is apparently there since the ipv4 over ipv6
      initial support, and the ipv6 side predates git history.
      
      Fixes: c4d3efaf ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
      Tested-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76c0ddd8
  7. 19 9月, 2018 2 次提交
    • W
      ipv6: fix memory leak on dst->_metrics · ce7ea4af
      Wei Wang 提交于
      When dst->_metrics and f6i->fib6_metrics share the same memory, both
      take reference count on the dst_metrics structure. However, when dst is
      destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
      which does not take care of READONLY metrics and does not release refcnt.
      This causes memory leak.
      Similar to ipv4 logic, the fix is to properly release refcnt and free
      the memory space pointed by dst->_metrics if refcnt becomes 0.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce7ea4af
    • W
      Revert "ipv6: fix double refcount of fib6_metrics" · 86758605
      Wei Wang 提交于
      This reverts commit e70a3aad.
      
      This change causes use-after-free on dst->_metrics.
      The crash trace looks like this:
      [   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
      [   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
      
      [   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
      [   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
      [   97.777957] Call Trace:
      [   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
      [   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
      [   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
      [   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
      [   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
      [   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
      [   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
      [   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
      [   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
      [   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
      [   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
      [   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
      [   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
      [   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
      [   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
      [   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
      [   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
      [   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
      [   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
      [   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
      [   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
      [   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
      [   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
      [   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
      [   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
      [   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
      [   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
      [   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
      [   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
      [   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
      [   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
      [   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
      [   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
      [   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
      [   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
      [   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
      [   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
      [   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
      [   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
      [   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
      [   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
      [   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [   97.778177] RIP: 0033:0x7f83fa36000d
      [   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
      [   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
      [   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
      [   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
      [   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000
      
      [   97.779684] Allocated by task 5919:
      [   97.783185]  save_stack+0x46/0xd0
      [   97.783187]  kasan_kmalloc+0xad/0xe0
      [   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
      [   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
      [   97.783192]  ip6_route_info_create+0x60a/0x2480
      [   97.783193]  ip6_route_add+0x1d/0x80
      [   97.783195]  inet6_rtm_newroute+0xdd/0xf0
      [   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
      [   97.783200]  netlink_rcv_skb+0x27b/0x3e0
      [   97.783202]  rtnetlink_rcv+0x15/0x20
      [   97.783203]  netlink_unicast+0x4be/0x720
      [   97.783204]  netlink_sendmsg+0x7bc/0xbf0
      [   97.783205]  sock_sendmsg+0xba/0xf0
      [   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
      [   97.783208]  __sys_sendmsg+0xe6/0x190
      [   97.783209]  SyS_sendmsg+0x13/0x20
      [   97.783211]  do_syscall_64+0x2ac/0x430
      [   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      [   97.784709] Freed by task 0:
      [   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
      [   97.794497]  save_stack+0x46/0xd0
      [   97.794499]  kasan_slab_free+0x71/0xc0
      [   97.794500]  kfree+0x7c/0xf0
      [   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
      [   97.794504]  rcu_process_callbacks+0x38b/0x1730
      [   97.794506]  __do_softirq+0x1c8/0x5d0
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86758605
  8. 18 9月, 2018 1 次提交
    • P
      net/ipv6: do not copy dst flags on rt init · 30bfd930
      Peter Oskolkov 提交于
      DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
      toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
      
      If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
      in dist_init() and decremented in dst_destroy().
      
      This flag is tied to allocation/deallocation of dst_entry and
      should not be copied from another dst/route. Otherwise it can happen
      that dst_ops::pcpuc_entries counter grows until no new routes can
      be allocated because the counter reached ip6_rt_max_size due to
      DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
      
      Fixes: 3b6761d1 ("net/ipv6: Move dst flags to booleans in fib entries")
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30bfd930
  9. 17 9月, 2018 2 次提交
  10. 14 9月, 2018 1 次提交
  11. 13 9月, 2018 1 次提交
    • X
      ipv6: use rt6_info members when dst is set in rt6_fill_node · 22d0bd82
      Xin Long 提交于
      In inet6_rtm_getroute, since Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"), it has used rt->from
      to dump route info instead of rt.
      
      However for some route like cache, some of its information like flags
      or gateway is not the same as that of the 'from' one. It caused 'ip
      route get' to dump the wrong route information.
      
      In Jianlin's testing, the output information even lost the expiration
      time for a pmtu route cache due to the wrong fib6_flags.
      
      So change to use rt6_info members for dst addr, src addr, flags and
      gateway when it tries to dump a route entry without fibmatch set.
      
      v1->v2:
        - not use rt6i_prefsrc.
        - also fix the gw dump issue.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d0bd82
  12. 10 9月, 2018 1 次提交
    • T
      ip: frags: fix crash in ip_do_fragment() · 5d407b07
      Taehee Yoo 提交于
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d407b07
  13. 04 9月, 2018 2 次提交
  14. 03 9月, 2018 2 次提交
    • T
      xfrm6: call kfree_skb when skb is toobig · 215ab0f0
      Thadeu Lima de Souza Cascardo 提交于
      After commit d6990976 ("vti6: fix PMTU caching
      and reporting on xmit"), some too big skbs might be potentially passed down to
      __xfrm6_output, causing it to fail to transmit but not free the skb, causing a
      leak of skb, and consequentially a leak of dst references.
      
      After running pmtu.sh, that shows as failure to unregister devices in a namespace:
      
      [  311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1
      
      The fix is to call kfree_skb in case of transmit failures.
      
      Fixes: dd767856 ("xfrm6: Don't call icmpv6_send on local error")
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      215ab0f0
    • D
      net/ipv6: Only update MTU metric if it set · 15a81b41
      David Ahern 提交于
      Jan reported a regression after an update to 4.18.5. In this case ipv6
      default route is setup by systemd-networkd based on data from an RA. The
      RA contains an MTU of 1492 which is used when the route is first inserted
      but then systemd-networkd pushes down updates to the default route
      without the mtu set.
      
      Prior to the change to fib6_info, metrics such as MTU were held in the
      dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
      any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
      the value from the metrics is used if it is set and finally falling
      back to the idev value.
      
      After the fib6_info change metrics are contained in the fib6_info struct
      and there is no equivalent to rt6i_pmtu. To maintain consistency with
      the old behavior the new code should only reset the MTU in the metrics
      if the route update has it set.
      
      Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
      Reported-by: NJan Janssen <medhefgo@web.de>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15a81b41
  15. 02 9月, 2018 1 次提交
    • A
      ipv6: don't get lwtstate twice in ip6_rt_copy_init() · 93bbadd6
      Alexey Kodanev 提交于
      Commit 80f1a0f4 ("net/ipv6: Put lwtstate when destroying fib6_info")
      partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
      with ip6_rt_copy_init(), and it should be done only once there.
      
      rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
      ip6_rt_copy_init(), so there is no need to get it again at the end.
      
      With this patch, lwtstate also isn't copied from RTF_REJECT routes.
      
      [1]:
      unreferenced object 0xffff880b6aaa14e0 (size 64):
        comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
        hex dump (first 32 bytes):
          01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
          [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
          [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
          [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
          [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
          [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
          [<000000004c893c76>] netlink_unicast+0x417/0x5a0
          [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
          [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
          [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
          [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
          [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
          [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000006d21f353>] 0xffffffffffffffff
      
      Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93bbadd6
  16. 30 8月, 2018 3 次提交
    • S
      ipv6: fix cleanup ordering for pingv6 registration · a03dc36b
      Sabrina Dubroca 提交于
      Commit 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      contains an error in the cleanup path of inet6_init(): when
      proto_register(&pingv6_prot, 1) fails, we try to unregister
      &pingv6_prot. When rawv6_init() fails, we skip unregistering
      &pingv6_prot.
      
      Example of panic (triggered by faking a failure of
       proto_register(&pingv6_prot, 1)):
      
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           proto_unregister+0xbb/0x550
           ? trace_preempt_on+0x6f0/0x6f0
           ? sock_no_shutdown+0x10/0x10
           inet6_init+0x153/0x1b8
      
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a03dc36b
    • S
      ipv6: fix cleanup ordering for ip6_mr failure · afe49de4
      Sabrina Dubroca 提交于
      Commit 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      moved the cleanup label for ipmr_fail, but should have changed the
      contents of the cleanup labels as well. Now we can end up cleaning up
      icmpv6 even though it hasn't been initialized (jump to icmp_fail or
      ipmr_fail).
      
      Simply undo things in the reverse order of their initialization.
      
      Example of panic (triggered by faking a failure of icmpv6_init):
      
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           ? lock_release+0x8a0/0x8a0
           unregister_pernet_operations+0xd4/0x560
           ? ops_free_list+0x480/0x480
           ? down_write+0x91/0x130
           ? unregister_pernet_subsys+0x15/0x30
           ? down_read+0x1b0/0x1b0
           ? up_read+0x110/0x110
           ? kmem_cache_create_usercopy+0x1b4/0x240
           unregister_pernet_subsys+0x1d/0x30
           icmpv6_cleanup+0x1d/0x30
           inet6_init+0x1b5/0x23f
      
      Fixes: 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afe49de4
    • A
      vti6: remove !skb->ignore_df check from vti6_xmit() · 9f289546
      Alexey Kodanev 提交于
      Before the commit d6990976 ("vti6: fix PMTU caching and reporting
      on xmit") '!skb->ignore_df' check was always true because the function
      skb_scrub_packet() was called before it, resetting ignore_df to zero.
      
      In the commit, skb_scrub_packet() was moved below, and now this check
      can be false for the packet, e.g. when sending it in the two fragments,
      this prevents successful PMTU updates in such case. The next attempts
      to send the packet lead to the same tx error. Moreover, vti6 initial
      MTU value relies on PMTU adjustments.
      
      This issue can be reproduced with the following LTP test script:
          udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
      
      Fixes: ccd740cb ("vti6: Add pmtu handling to vti6_xmit.")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f289546
  17. 28 8月, 2018 1 次提交
  18. 23 8月, 2018 2 次提交
  19. 21 8月, 2018 1 次提交
  20. 20 8月, 2018 2 次提交
    • H
      ip6_vti: fix a null pointer deference when destroy vti6 tunnel · 9c86336c
      Haishuang Yan 提交于
      If load ip6_vti module and create a network namespace when set
      fb_tunnels_only_for_init_net to 1, then exit the namespace will
      cause following crash:
      
      [ 6601.677036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [ 6601.679057] PGD 8000000425eca067 P4D 8000000425eca067 PUD 424292067 PMD 0
      [ 6601.680483] Oops: 0000 [#1] SMP PTI
      [ 6601.681223] CPU: 7 PID: 93 Comm: kworker/u16:1 Kdump: loaded Tainted: G            E     4.18.0+ #3
      [ 6601.683153] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
      [ 6601.684919] Workqueue: netns cleanup_net
      [ 6601.685742] RIP: 0010:vti6_exit_batch_net+0x87/0xd0 [ip6_vti]
      [ 6601.686932] Code: 7b 08 48 89 e6 e8 b9 ea d3 dd 48 8b 1b 48 85 db 75 ec 48 83 c5 08 48 81 fd 00 01 00 00 75 d5 49 8b 84 24 08 01 00 00 48 89 e6 <48> 8b 78 08 e8 90 ea d3 dd 49 8b 45 28 49 39 c6 4c 8d 68 d8 75 a1
      [ 6601.690735] RSP: 0018:ffffa897c2737de0 EFLAGS: 00010246
      [ 6601.691846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
      [ 6601.693324] RDX: 0000000000000015 RSI: ffffa897c2737de0 RDI: ffffffff9f2ea9e0
      [ 6601.694824] RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000000
      [ 6601.696314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dc323c07e00
      [ 6601.697812] R13: ffff8dc324a63100 R14: ffffa897c2737e30 R15: ffffa897c2737e30
      [ 6601.699345] FS:  0000000000000000(0000) GS:ffff8dc33fdc0000(0000) knlGS:0000000000000000
      [ 6601.701068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6601.702282] CR2: 0000000000000008 CR3: 0000000424966002 CR4: 00000000001606e0
      [ 6601.703791] Call Trace:
      [ 6601.704329]  cleanup_net+0x1b4/0x2c0
      [ 6601.705268]  process_one_work+0x16c/0x370
      [ 6601.706145]  worker_thread+0x49/0x3e0
      [ 6601.706942]  kthread+0xf8/0x130
      [ 6601.707626]  ? rescuer_thread+0x340/0x340
      [ 6601.708476]  ? kthread_bind+0x10/0x10
      [ 6601.709266]  ret_from_fork+0x35/0x40
      
      Reproduce:
      modprobe ip6_vti
      echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
      unshare -n
      exit
      
      This because ip6n->tnls_wc[0] point to fallback device in default, but
      in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL
      check comparatively.
      
      Fixes: e2948e5a ("ip6_vti: fix creating fallback tunnel device for vti6")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c86336c
    • H
      ip6_vti: fix creating fallback tunnel device for vti6 · e2948e5a
      Haishuang Yan 提交于
      When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
      device for vti6 when a new namespace is created.
      
      Tested:
      [root@builder2 ~]# modprobe ip6_tunnel
      [root@builder2 ~]# modprobe ip6_vti
      [root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
      [root@builder2 ~]# unshare -n
      [root@builder2 ~]# ip link
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
      default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2948e5a
  21. 19 8月, 2018 1 次提交
  22. 17 8月, 2018 1 次提交
  23. 13 8月, 2018 1 次提交
    • V
      ipv6: Add icmp_echo_ignore_all support for ICMPv6 · e6f86b0f
      Virgile Jarry 提交于
      Preventing the kernel from responding to ICMP Echo Requests messages
      can be useful in several ways. The sysctl parameter
      'icmp_echo_ignore_all' can be used to prevent the kernel from
      responding to IPv4 ICMP echo requests. For IPv6 pings, such
      a sysctl kernel parameter did not exist.
      
      Add the ability to prevent the kernel from responding to IPv6
      ICMP echo requests through the use of the following sysctl
      parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
      Update the documentation to reflect this change.
      Signed-off-by: NVirgile Jarry <virgile@acceis.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6f86b0f
  24. 11 8月, 2018 1 次提交
    • M
      bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection · 8217ca65
      Martin KaFai Lau 提交于
      This patch allows a BPF_PROG_TYPE_SK_REUSEPORT bpf prog to select a
      SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY introduced in
      the earlier patch.  "bpf_run_sk_reuseport()" will return -ECONNREFUSED
      when the BPF_PROG_TYPE_SK_REUSEPORT prog returns SK_DROP.
      The callers, in inet[6]_hashtable.c and ipv[46]/udp.c, are modified to
      handle this case and return NULL immediately instead of continuing the
      sk search from its hashtable.
      
      It re-uses the existing SO_ATTACH_REUSEPORT_EBPF setsockopt to attach
      BPF_PROG_TYPE_SK_REUSEPORT.  The "sk_reuseport_attach_bpf()" will check
      if the attaching bpf prog is in the new SK_REUSEPORT or the existing
      SOCKET_FILTER type and then check different things accordingly.
      
      One level of "__reuseport_attach_prog()" call is removed.  The
      "sk_unhashed() && ..." and "sk->sk_reuseport_cb" tests are pushed
      back to "reuseport_attach_prog()" in sock_reuseport.c.  sock_reuseport.c
      seems to have more knowledge on those test requirements than filter.c.
      In "reuseport_attach_prog()", after new_prog is attached to reuse->prog,
      the old_prog (if any) is also directly freed instead of returning the
      old_prog to the caller and asking the caller to free.
      
      The sysctl_optmem_max check is moved back to the
      "sk_reuseport_attach_filter()" and "sk_reuseport_attach_bpf()".
      As of other bpf prog types, the new BPF_PROG_TYPE_SK_REUSEPORT is only
      bounded by the usual "bpf_prog_charge_memlock()" during load time
      instead of bounded by both bpf_prog_charge_memlock and sysctl_optmem_max.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8217ca65
  25. 10 8月, 2018 1 次提交
    • M
      net: ipv6_gre: Fix GRO to work on IPv6 over GRE tap · eb95f52f
      Maria Pasechnik 提交于
      IPv6 GRO over GRE tap is not working while GRO is not set
      over the native interface.
      
      gro_list_prepare function updates the same_flow variable
      of existing sessions to 1 if their mac headers match the one
      of the incoming packet.
      same_flow is used to filter out non-matching sessions and keep
      potential ones for aggregation.
      
      The number of bytes to compare should be the number of bytes
      in the mac headers. In gro_list_prepare this number is set to
      be skb->dev->hard_header_len. For GRE interfaces this hard_header_len
      should be as it is set in the initialization process (when GRE is
      created), it should not be overridden. But currently it is being overridden
      by the value that is actually supposed to represent the needed_headroom.
      Therefore, the number of bytes compared in order to decide whether the
      the mac headers are the same is greater than the length of the headers.
      
      As it's documented in netdevice.h, hard_header_len is the maximum
      hardware header length, and needed_headroom is the extra headroom
      the hardware may need.
      hard_header_len is basically all the bytes received by the physical
      till layer 3 header of the packet received by the interface.
      For example, if the interface is a GRE tap then the needed_headroom
      should be the total length of the following headers:
      IP header of the physical, GRE header, mac header of GRE.
      It is often used to calculate the MTU of the created interface.
      
      This patch removes the override of the hard_header_len, and
      assigns the calculated value to needed_headroom.
      This way, the comparison in gro_list_prepare is really of
      the mac headers, and if the packets have the same mac headers
      the same_flow will be set to 1.
      
      Performance testing: 45% higher bandwidth.
      Measuring bandwidth of single-stream IPv4 TCP traffic over IPv6
      GRE tap while GRO is not set on the native.
      NIC: ConnectX-4LX
      Before (GRO not working) : 7.2 Gbits/sec
      After (GRO working): 10.5 Gbits/sec
      Signed-off-by: NMaria Pasechnik <mariap@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb95f52f
  26. 08 8月, 2018 1 次提交
  27. 06 8月, 2018 5 次提交
    • X
      ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit · 82a40777
      Xin Long 提交于
      According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
      device must be able to forward without further fragmentation while 576
      bytes is the minimum size of IPv4 datagram every device has to be able
      to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
      value for the ipv4 min mtu check in ip6_tnl_xmit.
      
      While at it, change to use max() instead of if statement.
      
      Fixes: c9fefa08 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82a40777
    • C
      ipv6: fix double refcount of fib6_metrics · e70a3aad
      Cong Wang 提交于
      All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt
      of the "from" fib6_info, so there is no need to hold fib6_metrics
      refcnt again, because fib6_metrics refcnt is only released when
      fib6_info is gone, that is, they have the same life time, so the
      whole fib6_metrics refcnt can be removed actually.
      
      This fixes a kmemleak warning reported by Sabrina.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e70a3aad
    • F
      ipv6: defrag: drop non-last frags smaller than min mtu · 0ed4229b
      Florian Westphal 提交于
      don't bother with pathological cases, they only waste cycles.
      IPv6 requires a minimum MTU of 1280 so we should never see fragments
      smaller than this (except last frag).
      
      v3: don't use awkward "-offset + len"
      v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
          There were concerns that there could be even smaller frags
          generated by intermediate nodes, e.g. on radio networks.
      
      Cc: Peter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed4229b
    • P
      ip: use rb trees for IP frag queue. · fa0f5273
      Peter Oskolkov 提交于
      Similar to TCP OOO RX queue, it makes sense to use rb trees to store
      IP fragments, so that OOO fragments are inserted faster.
      
      Tested:
      
      - a follow-up patch contains a rather comprehensive ip defrag
        self-test (functional)
      - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
          netstat --statistics
          Ip:
              282078937 total packets received
              0 forwarded
              0 incoming packets discarded
              946760 incoming packets delivered
              18743456 requests sent out
              101 fragments dropped after timeout
              282077129 reassemblies required
              944952 packets reassembled ok
              262734239 packet reassembles failed
         (The numbers/stats above are somewhat better re:
          reassemblies vs a kernel without this patchset. More
          comprehensive performance testing TBD).
      Reported-by: NJann Horn <jannh@google.com>
      Reported-by: NJuha-Matti Tilli <juha-matti.tilli@iki.fi>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa0f5273
    • G
      ipv6: icmp: Updating pmtu for link local route · 5f379ef5
      Georg Kohmann 提交于
      When a ICMPV6_PKT_TOOBIG is received from a link local address the pmtu will
      be updated on a route with an arbitrary interface index. Subsequent packets
      sent back to the same link local address may therefore end up not
      considering the updated pmtu.
      
      Current behavior breaks TAHI v6LC4.1.4 Reduce PMTU On-link. Referring to RFC
      1981: Section 3: "Note that Path MTU Discovery must be performed even in
      cases where a node "thinks" a destination is attached to the same link as
      itself. In a situation such as when a neighboring router acts as proxy [ND]
      for some destination, the destination can to appear to be directly
      connected but is in fact more than one hop away."
      
      Using the interface index from the incoming ICMPV6_PKT_TOOBIG when updating
      the pmtu.
      Signed-off-by: NGeorg Kohmann <geokohma@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f379ef5