1. 16 5月, 2019 1 次提交
    • G
      net: kernel hookers service for toa module · 3c74cfbb
      George Zhang 提交于
      LVS fullnat will replace network traffic's source ip with its local ip,
      and thus the backend servers cannot obtain the real client ip.
      
      To solve this, LVS has introduced the tcp option address (TOA) to store
      the essential ip address information in the last tcp ack packet of the
      3-way handshake, and the backend servers need to retrieve it from the
      packet header.
      
      In this patch, we have introduced the sk_toa_data member in the sock
      structure to hold the TOA information. There used to be an in-tree
      module for TOA managing, whereas it has now been maintained as an
      standalone module.
      
      In this case, the toa module should register its hook function(s) using
      the provided interfaces in the hookers module.
      
      TOA in sock structure:
      
      	__be32 sk_toa_data[16];
      
      The hookers module only provides the sk_toa_data placeholder, and the
      toa module can use this variable through the layout it needs.
      
      Hook interfaces:
      
      The hookers module replaces the kernel's syn_recv_sock and getname
      handler with a stub that chains the toa module's hook function(s) to the
      original handling function. The hookers module allows hook functions to
      be installed and uninstalled in any order.
      
      toa module:
      
      The external toa module will be provided in separate RPM package.
      
      [xuyu@linux.alibaba.com: amend commit log]
      Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      3c74cfbb
  2. 05 5月, 2019 4 次提交
    • W
      ipv6: invert flowlabel sharing check in process and user mode · f78ec0cd
      Willem de Bruijn 提交于
      [ Upstream commit 95c169251bf734aa555a1e8043e4d88ec97a04ec ]
      
      A request for a flowlabel fails in process or user exclusive mode must
      fail if the caller pid or uid does not match. Invert the test.
      
      Previously, the test was unsafe wrt PID recycling, but indeed tested
      for inequality: fl1->owner != fl->owner
      
      Fixes: 4f82f457 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f78ec0cd
    • E
      ipv6/flowlabel: wait rcu grace period before put_pid() · 39eddbb7
      Eric Dumazet 提交于
      [ Upstream commit 6c0afef5fb0c27758f4d52b2210c61b6bd8b4470 ]
      
      syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
      
      ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
      but fl_free() releases fl->owner.pid before rcu grace period is started.
      
      [1]
      
      BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
      Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
      
      CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       pid_nr_ns+0x128/0x140 kernel/pid.c:407
       ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
       seq_read+0xad3/0x1130 fs/seq_file.c:268
       proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
       do_loop_readv_writev fs/read_write.c:701 [inline]
       do_loop_readv_writev fs/read_write.c:688 [inline]
       do_iter_read+0x4a9/0x660 fs/read_write.c:922
       vfs_readv+0xf0/0x160 fs/read_write.c:984
       kernel_readv fs/splice.c:358 [inline]
       default_file_splice_read+0x475/0x890 fs/splice.c:413
       do_splice_to+0x12a/0x190 fs/splice.c:876
       splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
       do_sendfile+0x597/0xd00 fs/read_write.c:1443
       __do_sys_sendfile64 fs/read_write.c:1498 [inline]
       __se_sys_sendfile64 fs/read_write.c:1490 [inline]
       __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
      RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
      R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
      
      Allocated by task 17543:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc mm/kasan/common.c:497 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3393 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
       alloc_pid+0x55/0x8f0 kernel/pid.c:168
       copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
       copy_process kernel/fork.c:1709 [inline]
       _do_fork+0x257/0xfd0 kernel/fork.c:2226
       __do_sys_clone kernel/fork.c:2333 [inline]
       __se_sys_clone kernel/fork.c:2327 [inline]
       __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 7789:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
       __cache_free mm/slab.c:3499 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3765
       put_pid.part.0+0x111/0x150 kernel/pid.c:111
       put_pid+0x20/0x30 kernel/pid.c:105
       fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
       ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
       call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
       expire_timers kernel/time/timer.c:1362 [inline]
       __run_timers kernel/time/timer.c:1681 [inline]
       __run_timers kernel/time/timer.c:1649 [inline]
       run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
       __do_softirq+0x266/0x95a kernel/softirq.c:293
      
      The buggy address belongs to the object at ffff888094012a00
       which belongs to the cache pid_2 of size 88
      The buggy address is located 4 bytes inside of
       88-byte region [ffff888094012a00, ffff888094012a58)
      The buggy address belongs to the page:
      page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
      raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      >ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
                         ^
       ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      
      Fixes: 4f82f457 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39eddbb7
    • E
      ipv6: fix races in ip6_dst_destroy() · 1a9e0134
      Eric Dumazet 提交于
      [ Upstream commit 0e2338749192ce0e52e7174c5352f627632f478a ]
      
      We had many syzbot reports that seem to be caused by use-after-free
      of struct fib6_info.
      
      ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
      are writers vs rt->from, and use non consistent synchronization among
      themselves.
      
      Switching to xchg() will solve the issues with no possible
      lockdep issues.
      
      BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
      BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
      BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
      Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
      
      CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       kasan_check_write+0x14/0x20 mm/kasan/common.c:108
       atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
       fib6_info_release include/net/ip6_fib.h:294 [inline]
       fib6_info_release include/net/ip6_fib.h:292 [inline]
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
       fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
       fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
       fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
       fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
       fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
       fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
       __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
       fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
       rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
       rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
       addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
       addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
       call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
       call_netdevice_notifiers net/core/dev.c:1779 [inline]
       dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
       rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
       rollback_registered+0x109/0x1d0 net/core/dev.c:8242
       unregister_netdevice_queue net/core/dev.c:9289 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
       unregister_netdevice include/linux/netdevice.h:2658 [inline]
       __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
       tun_detach drivers/net/tun.c:744 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
       __fput+0x2e5/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x90a/0x2fa0 kernel/exit.c:876
       do_group_exit+0x135/0x370 kernel/exit.c:980
       __do_sys_exit_group kernel/exit.c:991 [inline]
       __se_sys_exit_group kernel/exit.c:989 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
      RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
      RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
      R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a9e0134
    • M
      ipv6: A few fixes on dereferencing rt->from · 7ea4f000
      Martin KaFai Lau 提交于
      [ Upstream commit 886b7a50100a50f1cbd08a6f8ec5884dfbe082dc ]
      
      It is a followup after the fix in
      commit 9c69a1320515 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ea4f000
  3. 04 5月, 2019 1 次提交
  4. 27 4月, 2019 3 次提交
  5. 20 4月, 2019 2 次提交
    • L
      net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version · 8c5e9ea1
      Lorenzo Bianconi 提交于
      [ Upstream commit efcc9bcaf77c07df01371a7c34e50424c291f3ac ]
      
      Fix a possible NULL pointer dereference in ip6erspan_set_version checking
      nlattr data pointer
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
      #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
        __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
        rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
        rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
        netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg+0xdd/0x130 net/socket.c:631
        ___sys_sendmsg+0x806/0x930 net/socket.c:2136
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
        __do_sys_sendmsg net/socket.c:2183 [inline]
        __se_sys_sendmsg net/socket.c:2181 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440159
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
      RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
      R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace 09f8a7d13b4faaa1 ]---
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 4974d5f678ab ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
      Reported-and-tested-by: syzbot+30191cf1057abd3064af@syzkaller.appspotmail.com
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c5e9ea1
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · bbbe4746
      Cong Wang 提交于
      [ Upstream commit f75a2804da391571563c4b6b29e7797787332673 ]
      
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bbbe4746
  6. 17 4月, 2019 4 次提交
    • L
      net: ip6_gre: fix possible use-after-free in ip6erspan_rcv · a2ef7723
      Lorenzo Bianconi 提交于
      [ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
      
      erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
      erspan header. This can determine a possible use-after-free accessing
      pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
      running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
      been sent though a veth device). Fix it resetting pkt_md pointer after
      __iptunnel_pull_header
      
      Fixes: 1d7e2ed2 ("net: erspan: refactor existing erspan code")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2ef7723
    • L
      ipv6: sit: reset ip header pointer in ipip6_rcv · 42f1fa0f
      Lorenzo Bianconi 提交于
      [ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
      
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      42f1fa0f
    • J
      ipv6: Fix dangling pointer when ipv6 fragment · ea06796f
      Junwei Hu 提交于
      [ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
      
      At the beginning of ip6_fragment func, the prevhdr pointer is
      obtained in the ip6_find_1stfragopt func.
      However, all the pointers pointing into skb header may change
      when calling skb_checksum_help func with
      skb->ip_summed = CHECKSUM_PARTIAL condition.
      The prevhdr pointe will be dangling if it is not reloaded after
      calling __skb_linearize func in skb_checksum_help func.
      
      Here, I add a variable, nexthdr_offset, to evaluate the offset,
      which does not changes even after calling __skb_linearize func.
      
      Fixes: 405c92f7 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea06796f
    • S
      ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type · 8e4b4da3
      Sheena Mira-ato 提交于
      [ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
      
      The device type for ip6 tunnels is set to
      ARPHRD_TUNNEL6. However, the ip4ip6_err function
      is expecting the device type of the tunnel to be
      ARPHRD_TUNNEL.  Since the device types do not
      match, the function exits and the ICMP error
      packet is not sent to the originating host. Note
      that the device type for IPv4 tunnels is set to
      ARPHRD_TUNNEL.
      
      Fix is to expect a tunnel device type of
      ARPHRD_TUNNEL6 instead.  Now the tunnel device
      type matches and the ICMP error packet is sent
      to the originating host.
      Signed-off-by: NSheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e4b4da3
  7. 03 4月, 2019 3 次提交
  8. 24 3月, 2019 1 次提交
    • M
      esp: Skip TX bytes accounting when sending from a request socket · b92eaed3
      Martin Willi 提交于
      [ Upstream commit 09db51241118aeb06e1c8cd393b45879ce099b36 ]
      
      On ESP output, sk_wmem_alloc is incremented for the added padding if a
      socket is associated to the skb. When replying with TCP SYNACKs over
      IPsec, the associated sk is a casted request socket, only. Increasing
      sk_wmem_alloc on a request socket results in a write at an arbitrary
      struct offset. In the best case, this produces the following WARNING:
      
      WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
      refcount_t: addition on 0; use-after-free.
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
      Hardware name: Marvell Armada 380/385 (Device Tree)
      [...]
      [<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
      [<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
      [<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
      [<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
      [<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
      [<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
      [<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
      [<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
      [<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
      [<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
      [<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
      [<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
      [<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
      [...]
      
      The issue triggers only when not using TCP syncookies, as for syncookies
      no socket is associated.
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b92eaed3
  9. 19 3月, 2019 5 次提交
    • P
      ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() · 2e4b2aeb
      Paolo Abeni 提交于
      [ Upstream commit bf1dc8bad1d42287164d216d8efb51c5cd381b18 ]
      
      We need a RCU critical section around rt6_info->from deference, and
      proper annotation.
      
      Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e4b2aeb
    • P
      ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() · 96dd4ef3
      Paolo Abeni 提交于
      [ Upstream commit 193f3685d0546b0cea20c99894aadb70098e47bf ]
      
      We must access rt6_info->from under RCU read lock: move the
      dereference under such lock, with proper annotation.
      
      v1 -> v2:
       - avoid using multiple, racy, fetch operations for rt->from
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96dd4ef3
    • P
      ipv6: route: purge exception on removal · b9d0cb75
      Paolo Abeni 提交于
      [ Upstream commit f5b51fe804ec2a6edce0f8f6b11ea57283f5857b ]
      
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9d0cb75
    • K
      net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 · fe38cbc9
      Kalash Nainwal 提交于
      [ Upstream commit 97f0082a0592212fc15d4680f5a4d80f79a1687c ]
      
      Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
      keep legacy software happy. This is similar to what was done for
      ipv4 in commit 709772e6 ("net: Fix routing tables with
      id > 255 for legacy software").
      Signed-off-by: NKalash Nainwal <kalash@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe38cbc9
    • M
      net: sit: fix UBSAN Undefined behaviour in check_6rd · 7cfb97ba
      Miaohe Lin 提交于
      [ Upstream commit a843dc4ebaecd15fca1f4d35a97210f72ea1473b ]
      
      In func check_6rd,tunnel->ip6rd.relay_prefixlen may equal to
      32,so UBSAN complain about it.
      
      UBSAN: Undefined behaviour in net/ipv6/sit.c:781:47
      shift exponent 32 is too large for 32-bit type 'unsigned int'
      CPU: 6 PID: 20036 Comm: syz-executor.0 Not tainted 4.19.27 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
      04/01/2014
      Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0xca/0x13e lib/dump_stack.c:113
      ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
      __ubsan_handle_shift_out_of_bounds+0x293/0x2e8 lib/ubsan.c:425
      check_6rd.constprop.9+0x433/0x4e0 net/ipv6/sit.c:781
      try_6rd net/ipv6/sit.c:806 [inline]
      ipip6_tunnel_xmit net/ipv6/sit.c:866 [inline]
      sit_tunnel_xmit+0x141c/0x2720 net/ipv6/sit.c:1033
      __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
      netdev_start_xmit include/linux/netdevice.h:4309 [inline]
      xmit_one net/core/dev.c:3243 [inline]
      dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
      __dev_queue_xmit+0x1656/0x2500 net/core/dev.c:3829
      neigh_output include/net/neighbour.h:501 [inline]
      ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
      ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
      NF_HOOK_COND include/linux/netfilter.h:278 [inline]
      ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
      dst_output include/net/dst.h:444 [inline]
      ip6_local_out+0x99/0x170 net/ipv6/output_core.c:176
      ip6_send_skb+0x9d/0x2f0 net/ipv6/ip6_output.c:1697
      ip6_push_pending_frames+0xc0/0x100 net/ipv6/ip6_output.c:1717
      rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
      rawv6_sendmsg+0x2435/0x3530 net/ipv6/raw.c:946
      inet_sendmsg+0xf8/0x5c0 net/ipv4/af_inet.c:798
      sock_sendmsg_nosec net/socket.c:621 [inline]
      sock_sendmsg+0xc8/0x110 net/socket.c:631
      ___sys_sendmsg+0x6cf/0x890 net/socket.c:2114
      __sys_sendmsg+0xf0/0x1b0 net/socket.c:2152
      do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: Nlinmiaohe <linmiaohe@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cfb97ba
  10. 10 3月, 2019 4 次提交
    • D
      ipv6: Return error for RTA_VIA attribute · a68d31cc
      David Ahern 提交于
      [ Upstream commit e3818541b49fb88650ba339d33cc53e4095da5b3 ]
      
      IPv6 currently does not support nexthops outside of the AF_INET6 family.
      Specifically, it does not handle RTA_VIA attribute. If it is passed
      in a route add request, the actual route added only uses the device
      which is clearly not what the user intended:
      
        $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
        $ ip ro ls
        ...
        2001:db8:2::/64 dev eth0 metric 1024 pref medium
      
      Catch this and fail the route add:
        $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
        Error: IPv6 does not support RTA_VIA attribute.
      
      Fixes: 03c05665 ("mpls: Netlink commands to add, remove, and dump routes")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a68d31cc
    • M
      net: sit: fix memory leak in sit_init_net() · d0bedaac
      Mao Wenan 提交于
      [ Upstream commit 07f12b26e21ab359261bf75cfcb424fdc7daeb6d ]
      
      If register_netdev() is failed to register sitn->fb_tunnel_dev,
      it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev).
      
      BUG: memory leak
      unreferenced object 0xffff888378daad00 (size 512):
        comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s)
        hex dump (first 32 bytes):
          00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
          [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline]
          [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline]
          [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline]
          [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970
          [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848
          [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129
          [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314
          [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437
          [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107
          [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165
          [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919
          [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline]
          [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224
          [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
          [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<0000000039acff8a>] 0xffffffffffffffff
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0bedaac
    • H
      ipv4: Add ICMPv6 support when parse route ipproto · 99ed9458
      Hangbin Liu 提交于
      [ Upstream commit 5e1a99eae84999a2536f50a0beaf5d5262337f40 ]
      
      For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
      But for ip -6 route, currently we only support tcp, udp and icmp.
      
      Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.
      
      v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
      rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: eacb9384 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99ed9458
    • I
      ip6mr: Do not call __IP6_INC_STATS() from preemptible context · b5ff77dd
      Ido Schimmel 提交于
      [ Upstream commit 87c11f1ddbbad38ad8bad47af133a8208985fbdf ]
      
      Similar to commit 44f49dd8 ("ipmr: fix possible race resulting from
      improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
      assume preemption is disabled when incrementing the counter and
      accessing a per-CPU variable.
      
      Preemption can be enabled when we add a route in process context that
      corresponds to packets stored in the unresolved queue, which are then
      forwarded using this route [1].
      
      Fix this by using IP6_INC_STATS() which takes care of disabling
      preemption on architectures where it is needed.
      
      [1]
      [  157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
      [  157.460409] caller is ip6mr_forward2+0x73e/0x10e0
      [  157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
      [  157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  157.460461] Call Trace:
      [  157.460486]  dump_stack+0xf9/0x1be
      [  157.460553]  check_preemption_disabled+0x1d6/0x200
      [  157.460576]  ip6mr_forward2+0x73e/0x10e0
      [  157.460705]  ip6_mr_forward+0x9a0/0x1510
      [  157.460771]  ip6mr_mfc_add+0x16b3/0x1e00
      [  157.461155]  ip6_mroute_setsockopt+0x3cb/0x13c0
      [  157.461384]  do_ipv6_setsockopt.isra.8+0x348/0x4060
      [  157.462013]  ipv6_setsockopt+0x90/0x110
      [  157.462036]  rawv6_setsockopt+0x4a/0x120
      [  157.462058]  __sys_setsockopt+0x16b/0x340
      [  157.462198]  __x64_sys_setsockopt+0xbf/0x160
      [  157.462220]  do_syscall_64+0x14d/0x610
      [  157.462349]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 0912ea38 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAmit Cohen <amitc@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5ff77dd
  11. 27 2月, 2019 4 次提交
  12. 23 2月, 2019 2 次提交
    • L
      net: ip6_gre: initialize erspan_ver just for erspan tunnels · 4523bc86
      Lorenzo Bianconi 提交于
      [ Upstream commit 4974d5f678abb34401558559d47e2ea3d1c15cba ]
      
      After commit c706863bc890 ("net: ip6_gre: always reports o_key to
      userspace"), ip6gre and ip6gretap tunnels started reporting TUNNEL_KEY
      output flag even if it is not configured.
      ip6gre_fill_info checks erspan_ver value to add TUNNEL_KEY for
      erspan tunnels, however in commit 84581bda ("erspan: set
      erspan_ver to 1 by default when adding an erspan dev")
      erspan_ver is initialized to 1 even for ip6gre or ip6gretap
      Fix the issue moving erspan_ver initialization in a dedicated routine
      
      Fixes: c706863bc890 ("net: ip6_gre: always reports o_key to userspace")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4523bc86
    • Z
      net: fix IPv6 prefix route residue · 4c1b91b8
      Zhiqiang Liu 提交于
      [ Upstream commit e75913c93f7cd5f338ab373c34c93a655bd309cb ]
      
      Follow those steps:
       # ip addr add 2001:123::1/32 dev eth0
       # ip addr add 2001:123:456::2/64 dev eth0
       # ip addr del 2001:123::1/32 dev eth0
       # ip addr del 2001:123:456::2/64 dev eth0
      and then prefix route of 2001:123::1/32 will still exist.
      
      This is because ipv6_prefix_equal in check_cleanup_prefix_route
      func does not check whether two IPv6 addresses have the same
      prefix length. If the prefix of one address starts with another
      shorter address prefix, even though their prefix lengths are
      different, the return value of ipv6_prefix_equal is true.
      
      Here I add a check of whether two addresses have the same prefix
      to decide whether their prefixes are equal.
      
      Fixes: 5b84efec ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4c1b91b8
  13. 13 2月, 2019 1 次提交
  14. 07 2月, 2019 5 次提交
    • N
      ip6mr: Fix notifiers call on mroute_clean_tables() · 505e5f3d
      Nir Dotan 提交于
      [ Upstream commit 146820cc240f4389cf33481c058d9493aef95e25 ]
      
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada3b82 ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      505e5f3d
    • L
      net: ip6_gre: always reports o_key to userspace · 9f7d849b
      Lorenzo Bianconi 提交于
      [ Upstream commit c706863bc8902d0c2d1a5a27ac8e1ead5d06b79d ]
      
      As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
          key 1 seq erspan_ver 1
      $ip link set ip6erspan1 up
      ip -d link sh ip6erspan1
      
      ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
          link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
          ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ip6gre_fill_info
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f7d849b
    • L
      net: ip_gre: use erspan key field for tunnel lookup · 0a198e0b
      Lorenzo Bianconi 提交于
      [ Upstream commit cb73ee40b1b381eaf3749e6dbeed567bb38e5258 ]
      
      Use ERSPAN key header field as tunnel key in gre_parse_header routine
      since ERSPAN protocol sets the key field of the external GRE header to
      0 resulting in a tunnel lookup fail in ip6gre_err.
      In addition remove key field parsing and pskb_may_pull check in
      erspan_rcv and ip6erspan_rcv
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a198e0b
    • Y
      ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation · 2f704348
      Yohei Kanemaru 提交于
      [ Upstream commit ef489749aae508e6f17886775c075f12ff919fb1 ]
      
      skb->cb may contain data from previous layers (in an observed case
      IPv4 with L3 Master Device). In the observed scenario, the data in
      IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
      eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
      through ip6_finish_output().
      
      This patch clears IP6CB(skb), which potentially contains garbage data,
      on the SRH ip4ip6 encapsulation.
      
      Fixes: 32d99d0b ("ipv6: sr: add support for ip4ip6 encapsulation")
      Signed-off-by: NYohei Kanemaru <yohei.kanemaru@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f704348
    • D
      ipv6: Consider sk_bound_dev_if when binding a socket to an address · 7e9a6476
      David Ahern 提交于
      [ Upstream commit c5ee066333ebc322a24a00a743ed941a0c68617e ]
      
      IPv6 does not consider if the socket is bound to a device when binding
      to an address. The result is that a socket can be bound to eth0 and then
      bound to the address of eth1. If the device is a VRF, the result is that
      a socket can only be bound to an address in the default VRF.
      
      Resolve by considering the device if sk_bound_dev_if is set.
      
      This problem exists from the beginning of git history.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e9a6476