1. 27 4月, 2019 7 次提交
  2. 20 4月, 2019 9 次提交
    • D
      rxrpc: Fix client call connect/disconnect race · 11582064
      David Howells 提交于
      [ Upstream commit 930c9f9125c85b5134b3e711bc252ecc094708e3 ]
      
      rxrpc_disconnect_client_call() reads the call's connection ID protocol
      value (call->cid) as part of that function's variable declarations.  This
      is bad because it's not inside the locked section and so may race with
      someone granting use of the channel to the call.
      
      This manifests as an assertion failure (see below) where the call in the
      presumed channel (0 because call->cid wasn't set when we read it) doesn't
      match the call attached to the channel we were actually granted (if 1, 2 or
      3).
      
      Fix this by moving the read and dependent calculations inside of the
      channel_lock section.  Also, only set the channel number and pointer
      variables if cid is not zero (ie. unset).
      
      This problem can be induced by injecting an occasional error in
      rxrpc_wait_for_channel() before the call to schedule().
      
      Make two further changes also:
      
       (1) Add a trace for wait failure in rxrpc_connect_call().
      
       (2) Drop channel_lock before BUG'ing in the case of the assertion failure.
      
      The failure causes a trace akin to the following:
      
      rxrpc: Assertion failed - 18446612685268945920(0xffff8880beab8c00) == 18446612685268621312(0xffff8880bea69800) is false
      ------------[ cut here ]------------
      kernel BUG at net/rxrpc/conn_client.c:824!
      ...
      RIP: 0010:rxrpc_disconnect_client_call+0x2bf/0x99d
      ...
      Call Trace:
       rxrpc_connect_call+0x902/0x9b3
       ? wake_up_q+0x54/0x54
       rxrpc_new_client_call+0x3a0/0x751
       ? rxrpc_kernel_begin_call+0x141/0x1bc
       ? afs_alloc_call+0x1b5/0x1b5
       rxrpc_kernel_begin_call+0x141/0x1bc
       afs_make_call+0x20c/0x525
       ? afs_alloc_call+0x1b5/0x1b5
       ? __lock_is_held+0x40/0x71
       ? lockdep_init_map+0xaf/0x193
       ? lockdep_init_map+0xaf/0x193
       ? __lock_is_held+0x40/0x71
       ? yfs_fs_fetch_data+0x33b/0x34a
       yfs_fs_fetch_data+0x33b/0x34a
       afs_fetch_data+0xdc/0x3b7
       afs_read_dir+0x52d/0x97f
       afs_dir_iterate+0xa0/0x661
       ? iterate_dir+0x63/0x141
       iterate_dir+0xa2/0x141
       ksys_getdents64+0x9f/0x11b
       ? filldir+0x111/0x111
       ? do_syscall_64+0x3e/0x1a0
       __x64_sys_getdents64+0x16/0x19
       do_syscall_64+0x7d/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 45025bce ("rxrpc: Improve management and caching of client connection objects")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11582064
    • Y
      appletalk: Fix use-after-free in atalk_proc_exit · 6c42507f
      YueHaibing 提交于
      [ Upstream commit 6377f787aeb945cae7abbb6474798de129e1f3ac ]
      
      KASAN report this:
      
      BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
      Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806
      
      CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
       remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
       atalk_proc_exit+0x18/0x820 [appletalk]
       atalk_exit+0xf/0x5a [appletalk]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
      R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff
      
      Allocated by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2739 [inline]
       slab_alloc mm/slub.c:2747 [inline]
       kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
       kmem_cache_zalloc include/linux/slab.h:730 [inline]
       __proc_create+0x30f/0xa20 fs/proc/generic.c:408
       proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
       0xffffffffc10c01bb
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
       pde_put+0x6e/0x80 fs/proc/generic.c:647
       remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
       0xffffffffc10c031c
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881f41fe500
       which belongs to the cache proc_dir_entry of size 256
      The buggy address is located 176 bytes inside of
       256-byte region [ffff8881f41fe500, ffff8881f41fe600)
      The buggy address belongs to the page:
      page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      It should check the return value of atalk_proc_init fails,
      otherwise atalk_exit will trgger use-after-free in pde_subdir_find
      while unload the module.This patch fix error cleanup path of atalk_init
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6c42507f
    • L
      net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version · 8c5e9ea1
      Lorenzo Bianconi 提交于
      [ Upstream commit efcc9bcaf77c07df01371a7c34e50424c291f3ac ]
      
      Fix a possible NULL pointer dereference in ip6erspan_set_version checking
      nlattr data pointer
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
      #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
        __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
        rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
        rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
        netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg+0xdd/0x130 net/socket.c:631
        ___sys_sendmsg+0x806/0x930 net/socket.c:2136
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
        __do_sys_sendmsg net/socket.c:2183 [inline]
        __se_sys_sendmsg net/socket.c:2181 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440159
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
      RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
      R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace 09f8a7d13b4faaa1 ]---
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 4974d5f678ab ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
      Reported-and-tested-by: syzbot+30191cf1057abd3064af@syzkaller.appspotmail.com
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c5e9ea1
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · bbbe4746
      Cong Wang 提交于
      [ Upstream commit f75a2804da391571563c4b6b29e7797787332673 ]
      
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bbbe4746
    • S
      net/rds: fix warn in rds_message_alloc_sgs · 5be4bb31
      shamir rabinovitch 提交于
      [ Upstream commit ea010070d0a7497253d5a6f919f6dd107450b31a ]
      
      redundant copy_from_user in rds_sendmsg system call expose rds
      to issue where rds_rdma_extra_size walk the rds iovec and and
      calculate the number pf pages (sgs) it need to add to the tail of
      rds message and later rds_cmsg_rdma_args copy the rds iovec again
      and re calculate the same number and get different result causing
      WARN_ON in rds_message_alloc_sgs.
      
      fix this by doing the copy_from_user only once per rds_sendmsg
      system call.
      
      When issue occur the below dump is seen:
      
      WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x244/0x39d lib/dump_stack.c:113
       panic+0x2ad/0x55c kernel/panic.c:188
       __warn.cold.8+0x20/0x45 kernel/panic.c:540
       report_bug+0x254/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
      RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
      RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
      RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
      RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
      RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
      R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
      R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
       rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
       rds_cmsg_send net/rds/send.c:971 [inline]
       rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
       __sys_sendmsg+0x11d/0x280 net/socket.c:2155
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44a859
      Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
      RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
      RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
      R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: Nshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5be4bb31
    • T
      netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine · 263ed7e6
      Taehee Yoo 提交于
      [ Upstream commit b7f1a16d29b2e28d3dcbb070511bd703e306281b ]
      
      When device is unregistered, flowtable flush routine is called
      by notifier_call(nf_tables_flowtable_event). and exit callback of
      nftables pernet_operation(nf_tables_exit_net) also has flowtable flush
      routine. but when network namespace is destroyed, both notifier_call
      and pernet_operation are called. hence flowtable flush routine in
      pernet_operation is unnecessary.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 nft add table ip filter
         %ip netns exec vm1 nft add flowtable ip filter w \
      	{ hook ingress priority 0\; devices = { lo }\; }
         %ip netns del vm1
      
      splat looks like:
      [  265.187019] WARNING: CPU: 0 PID: 87 at net/netfilter/core.c:309 nf_hook_entry_head+0xc7/0xf0
      [  265.187112] Modules linked in: nf_flow_table_ipv4 nf_flow_table nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables
      [  265.187390] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc3+ #5
      [  265.187453] Workqueue: netns cleanup_net
      [  265.187514] RIP: 0010:nf_hook_entry_head+0xc7/0xf0
      [  265.187546] Code: 8d 81 68 03 00 00 5b c3 89 d0 83 fa 04 48 8d 84 c7 e8 11 00 00 76 81 0f 0b 31 c0 e9 78 ff ff ff 0f 0b 48 83 c4 08 31 c0 5b c3 <0f> 0b 31 c0 e9 65 ff ff ff 0f 0b 31 c0 e9 5c ff ff ff 48 89 0c 24
      [  265.187573] RSP: 0018:ffff88011546f098 EFLAGS: 00010246
      [  265.187624] RAX: ffffffff8d90e135 RBX: 1ffff10022a8de1c RCX: 0000000000000000
      [  265.187645] RDX: 0000000000000000 RSI: 0000000000000005 RDI: ffff880116298040
      [  265.187645] RBP: ffff88010ea4c1a8 R08: 0000000000000000 R09: 0000000000000000
      [  265.187645] R10: ffff88011546f1d8 R11: ffffed0022c532c1 R12: ffff88010ea4c1d0
      [  265.187645] R13: 0000000000000005 R14: dffffc0000000000 R15: ffff88010ea4c1c4
      [  265.187645] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [  265.187645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  265.187645] CR2: 00007fdfb8d00000 CR3: 0000000057a16000 CR4: 00000000001006f0
      [  265.187645] Call Trace:
      [  265.187645]  __nf_unregister_net_hook+0xca/0x5d0
      [  265.187645]  ? nf_hook_entries_free.part.3+0x80/0x80
      [  265.187645]  ? save_trace+0x300/0x300
      [  265.187645]  nf_unregister_net_hooks+0x2e/0x40
      [  265.187645]  nf_tables_exit_net+0x479/0x1340 [nf_tables]
      [  265.187645]  ? find_held_lock+0x39/0x1c0
      [  265.187645]  ? nf_tables_abort+0x30/0x30 [nf_tables]
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? trace_hardirqs_on+0x93/0x210
      [  265.187645]  ? __bpf_trace_preemptirq_template+0x10/0x10
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
      [  265.187645]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  265.187645]  ? wait_for_completion+0x710/0x710
      [  265.187645]  ? bucket_table_free+0xb2/0x1f0
      [  265.187645]  ? nested_table_free+0x130/0x130
      [  265.187645]  ? __lock_is_held+0xb4/0x140
      [  265.187645]  ops_exit_list.isra.10+0x94/0x140
      [  265.187645]  cleanup_net+0x45b/0x900
      [ ... ]
      
      This WARNING means that hook unregisteration is failed because
      all flowtables hooks are already unregistered by notifier_call.
      
      Network namespace exit routine guarantees that all devices will be
      unregistered first. then, other exit callbacks of pernet_operations
      are called. so that removing flowtable flush routine in exit callback of
      pernet_operation(nf_tables_exit_net) doesn't make flowtable leak.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      263ed7e6
    • M
      Bluetooth: Fix debugfs NULL pointer dereference · 09b6c080
      Matias Karhumaa 提交于
      [ Upstream commit 30d65e0804d58a03d1a8ea4e12c6fc07ed08218b ]
      
      Fix crash caused by NULL pointer dereference when debugfs functions
      le_max_key_read, le_max_key_size_write, le_min_key_size_read or
      le_min_key_size_write and Bluetooth adapter was powered off.
      
      Fix is to move max_key_size and min_key_size from smp_dev to hci_dev.
      At the same time they were renamed to le_max_key_size and
      le_min_key_size.
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000002e8
      PGD 0 P4D 0
      Oops: 0000 [#24] SMP PTI
      CPU: 2 PID: 6255 Comm: cat Tainted: G      D    OE     4.18.9-200.fc28.x86_64 #1
      Hardware name: LENOVO 4286CTO/4286CTO, BIOS 8DET76WW (1.46 ) 06/21/2018
      RIP: 0010:le_max_key_size_read+0x45/0xb0 [bluetooth]
      Code: 00 00 00 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 87 c8 00 00 00 48 8d 7c 24 04 48 8b 80 48 0a 00 00 <48> 8b 80 e8 02 00 00 0f b6 48 52 e8 fb b6 b3 ed be 04 00 00 00 48
      RSP: 0018:ffffab23c3ff3df0 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00007f0b4ca2e000 RCX: ffffab23c3ff3f08
      RDX: ffffffffc0ddb033 RSI: 0000000000000004 RDI: ffffab23c3ff3df4
      RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffab23c3ff3ed8 R11: 0000000000000000 R12: ffffab23c3ff3f08
      R13: 00007f0b4ca2e000 R14: 0000000000020000 R15: ffffab23c3ff3f08
      FS:  00007f0b4ca0f540(0000) GS:ffff91bd5e280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000002e8 CR3: 00000000629fa006 CR4: 00000000000606e0
      Call Trace:
       full_proxy_read+0x53/0x80
       __vfs_read+0x36/0x180
       vfs_read+0x8a/0x140
       ksys_read+0x4f/0xb0
       do_syscall_64+0x5b/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NMatias Karhumaa <matias.karhumaa@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      09b6c080
    • P
      netfilter: xt_cgroup: shrink size of v2 path · 1f2b1c6a
      Pablo Neira Ayuso 提交于
      [ Upstream commit 0d704967f4a49cc2212350b3e4a8231f8b4283ed ]
      
      cgroup v2 path field is PATH_MAX which is too large, this is placing too
      much pressure on memory allocation for people with many rules doing
      cgroup v1 classid matching, side effects of this are bug reports like:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200639
      
      This patch registers a new revision that shrinks the cgroup path to 512
      bytes, which is the same approach we follow in similar extensions that
      have a path field.
      
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f2b1c6a
    • G
      9p: do not trust pdu content for stat item size · db77c789
      Gertjan Halkes 提交于
      [ Upstream commit 2803cf4379ed252894f046cb8812a48db35294e3 ]
      
      v9fs_dir_readdir() could deadloop if a struct was sent with a size set
      to -2
      
      Link: http://lkml.kernel.org/r/1536134432-11997-1-git-send-email-asmadeus@codewreck.org
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88021Signed-off-by: NGertjan Halkes <gertjan@google.com>
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      db77c789
  3. 17 4月, 2019 20 次提交
    • F
      netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too · 40177a79
      Florian Westphal 提交于
      commit 89259088c1b7fecb43e8e245dc931909132a4e03 upstream
      
      syzbot was able to trigger the WARN in cttimeout_default_get() by
      passing UDPLITE as l4protocol.  Alias UDPLITE to UDP, both use
      same timeout values.
      
      Furthermore, also fetch GRE timeouts.  GRE is a bit more complicated,
      as it still can be a module and its netns_proto_gre struct layout isn't
      visible outside of the gre module. Can't move timeouts around, it
      appears conntrack sysctl unregister assumes net_generic() returns
      nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
      
      A followup nf-next patch could make gre tracker be built-in as well
      if needed, its not that large.
      
      Last, make the WARN() mention the missing protocol value in case
      anything else is missing.
      
      Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
      Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NZubin Mithra <zsm@chromium.org>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      40177a79
    • P
      netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr · c2d27b33
      Pablo Neira Ayuso 提交于
      commit 8866df9264a34e675b4ee8a151db819b87cce2d3 upstream
      
      Otherwise, we hit a NULL pointer deference since handlers always assume
      default timeout policy is passed.
      
        netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297
      
      Fixes: c779e849 ("netfilter: conntrack: remove get_timeout() indirection")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NZubin Mithra <zsm@chromium.org>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      c2d27b33
    • A
      net: core: netif_receive_skb_list: unlist skb before passing to pt->func · 55a7f7b2
      Alexander Lobakin 提交于
      [ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]
      
      __netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
      it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
      setup) crashes like:
      
      [ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
      [ 88.618666] Oops[#1]:
      [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
      [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
      [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
      [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
      [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
      [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
      [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
      [ 88.665793] $24 : 00000000 8011e658
      [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
      [ 88.677427] Hi : 00000003
      [ 88.680622] Lo : 7b5b4220
      [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
      [ 88.696734] Status: 10000404	IEp
      [ 88.700422] Cause : 50000008 (ExcCode 02)
      [ 88.704874] BadVA : 0000000e
      [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
      [ 88.713005] Modules linked in:
      [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
      [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
      [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
      [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
      [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
      [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
      [ 88.771770] ...
      [ 88.774483] Call Trace:
      [ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
      [ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
      [ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
      [ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
      [ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
      [ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
      [ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
      [ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
      [ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
      [ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
      [ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
      [ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
      [ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
      [ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
      [ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
      [ 88.866477] [<80105974>] handle_int+0x14c/0x158
      [ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
      [ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a
      [ 88.887332]
      [ 88.888982] ---[ end trace eb863d007da11cf1 ]---
      [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
      [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fix this by pulling skb off the sublist and zeroing skb->next pointer
      before calling ptype callback.
      
      Fixes: 88eb1944 ("net: core: propagate SKB lists through packet_type lookup")
      Reviewed-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      55a7f7b2
    • L
      net: ip6_gre: fix possible use-after-free in ip6erspan_rcv · a2ef7723
      Lorenzo Bianconi 提交于
      [ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
      
      erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
      erspan header. This can determine a possible use-after-free accessing
      pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
      running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
      been sent though a veth device). Fix it resetting pkt_md pointer after
      __iptunnel_pull_header
      
      Fixes: 1d7e2ed2 ("net: erspan: refactor existing erspan code")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2ef7723
    • L
      net: ip_gre: fix possible use-after-free in erspan_rcv · 5c6f2f4c
      Lorenzo Bianconi 提交于
      [ Upstream commit 492b67e28ee5f2a2522fb72e3d3bcb990e461514 ]
      
      erspan tunnels run __iptunnel_pull_header on received skbs to remove
      gre and erspan headers. This can determine a possible use-after-free
      accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned'
      running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
      been sent though a veth device). Fix it resetting pkt_md pointer after
      __iptunnel_pull_header
      
      Fixes: 1d7e2ed2 ("net: erspan: refactor existing erspan code")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5c6f2f4c
    • S
      vrf: check accept_source_route on the original netdevice · 0516ef27
      Stephen Suryaputra 提交于
      [ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
      
      Configuration check to accept source route IP options should be made on
      the incoming netdevice when the skb->dev is an l3mdev master. The route
      lookup for the source route next hop also needs the incoming netdev.
      
      v2->v3:
      - Simplify by passing the original netdevice down the stack (per David
        Ahern).
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0516ef27
    • D
      tcp: fix a potential NULL pointer dereference in tcp_sk_exit · 7243e352
      Dust Li 提交于
      [ Upstream commit b506bc975f60f06e13e74adb35e708a23dc4e87c ]
      
       When tcp_sk_init() failed in inet_ctl_sock_create(),
       'net->ipv4.tcp_congestion_control' will be left
       uninitialized, but tcp_sk_exit() hasn't check for
       that.
      
       This patch add checking on 'net->ipv4.tcp_congestion_control'
       in tcp_sk_exit() to prevent NULL-ptr dereference.
      
      Fixes: 6670e152 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7243e352
    • K
      tcp: Ensure DCTCP reacts to losses · 0e0afb06
      Koen De Schepper 提交于
      [ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
      
      RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
      loss episodes in the same way as conventional TCP".
      
      Currently, Linux DCTCP performs no cwnd reduction when losses
      are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
      alpha to its maximal value if a RTO happens. This behavior
      is sub-optimal for at least two reasons: i) it ignores losses
      triggering fast retransmissions; and ii) it causes unnecessary large
      cwnd reduction in the future if the loss was isolated as it resets
      the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
      denoting a total congestion). The second reason has an especially
      noticeable effect when using DCTCP in high BDP environments, where
      alpha normally stays at low values.
      
      This patch replace the clamping of alpha by setting ssthresh to
      half of cwnd for both fast retransmissions and RTOs, at most once
      per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
      has been removed.
      
      The table below shows experimental results where we measured the
      drop probability of a PIE AQM (not applying ECN marks) at a
      bottleneck in the presence of a single TCP flow with either the
      alpha-clamping option enabled or the cwnd halving proposed by this
      patch. Results using reno or cubic are given for comparison.
      
                                |  Link   |   RTT    |    Drop
                       TCP CC   |  speed  | base+AQM | probability
              ==================|=========|==========|============
                          CUBIC |  40Mbps |  7+20ms  |    0.21%
                           RENO |         |          |    0.19%
              DCTCP-CLAMP-ALPHA |         |          |   25.80%
               DCTCP-HALVE-CWND |         |          |    0.22%
              ------------------|---------|----------|------------
                          CUBIC | 100Mbps |  7+20ms  |    0.03%
                           RENO |         |          |    0.02%
              DCTCP-CLAMP-ALPHA |         |          |   23.30%
               DCTCP-HALVE-CWND |         |          |    0.04%
              ------------------|---------|----------|------------
                          CUBIC | 800Mbps |   1+1ms  |    0.04%
                           RENO |         |          |    0.05%
              DCTCP-CLAMP-ALPHA |         |          |   18.70%
               DCTCP-HALVE-CWND |         |          |    0.06%
      
      We see that, without halving its cwnd for all source of losses,
      DCTCP drives the AQM to large drop probabilities in order to keep
      the queue length under control (i.e., it repeatedly faces RTOs).
      Instead, if DCTCP reacts to all source of losses, it can then be
      controlled by the AQM using similar drop levels than cubic or reno.
      Signed-off-by: NKoen De Schepper <koen.de_schepper@nokia-bell-labs.com>
      Signed-off-by: NOlivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
      Cc: Bob Briscoe <research@bobbriscoe.net>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <borkmann@iogearbox.net>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0e0afb06
    • X
      sctp: initialize _pad of sockaddr_in before copying to user memory · 87349583
      Xin Long 提交于
      [ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
      
      Syzbot report a kernel-infoleak:
      
        BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
        Call Trace:
          _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
          copy_to_user include/linux/uaccess.h:174 [inline]
          sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
          sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
          ...
        Uninit was stored to memory at:
          sctp_transport_init net/sctp/transport.c:61 [inline]
          sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
          sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
          sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
          sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
          ...
        Bytes 8-15 of 16 are uninitialized
      
      It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
      struct sockaddr_in) wasn't initialized, but directly copied to user memory
      in sctp_getsockopt_peer_addrs().
      
      So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
      sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
      sctp_v6_addr_to_user() does.
      
      Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Tested-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      87349583
    • A
      openvswitch: fix flow actions reallocation · ec0e32da
      Andrea Righi 提交于
      [ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
      
      The flow action buffer can be resized if it's not big enough to contain
      all the requested flow actions. However, this resize doesn't take into
      account the new requested size, the buffer is only increased by a factor
      of 2x. This might be not enough to contain the new data, causing a
      buffer overflow, for example:
      
      [   42.044472] =============================================================================
      [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
      [   42.046415] -----------------------------------------------------------------------------
      
      [   42.047715] Disabling lock debugging due to kernel taint
      [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
      [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
      [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
      
      [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc                          ........
      [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00  kkkkkkkk....l...
      [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6  l...........x...
      [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
      [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.059061] Redzone 8bf2c4a5: 00 00 00 00                                      ....
      [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
      
      Fix by making sure the new buffer is properly resized to contain all the
      requested data.
      
      BugLink: https://bugs.launchpad.net/bugs/1813244Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ec0e32da
    • N
      net/sched: fix ->get helper of the matchall cls · eeedfa94
      Nicolas Dichtel 提交于
      [ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
      
      It returned always NULL, thus it was never possible to get the filter.
      
      Example:
      $ ip link add foo type dummy
      $ ip link add bar type dummy
      $ tc qdisc add dev foo clsact
      $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
      	matchall action mirred ingress mirror dev bar
      
      Before the patch:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      Error: Specified filter handle not found.
      We have an error talking to the kernel
      
      After:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
        not_in_hw
              action order 1: mirred (Ingress Mirror to device bar) pipe
              index 1 ref 1 bind 1
      
      CC: Yotam Gigi <yotamg@mellanox.com>
      CC: Jiri Pirko <jiri@mellanox.com>
      Fixes: fd62d9f5 ("net/sched: matchall: Fix configuration race")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eeedfa94
    • D
      net/sched: act_sample: fix divide by zero in the traffic path · 15c0770e
      Davide Caratti 提交于
      [ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
      
      the control path of 'sample' action does not validate the value of 'rate'
      provided by the user, but then it uses it as divisor in the traffic path.
      Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
      message in case that value is zero, to fix a splat with the script below:
      
       # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
       # tc -s a s action sample
       total acts 1
      
               action order 0: sample rate 1/0 group 1 pipe
                index 2 ref 1 bind 1 installed 19 sec used 19 sec
               Action statistics:
               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
               backlog 0b 0p requeues 0
       # ping 192.0.2.1 -I test0 -c1 -q
      
       divide error: 0000 [#1] SMP PTI
       CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
       Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
       RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
       RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
       RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
       RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
       R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
       R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
       FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
       Call Trace:
        tcf_action_exec+0x7c/0x1c0
        tcf_classify+0x57/0x160
        __dev_queue_xmit+0x3dc/0xd10
        ip_finish_output2+0x257/0x6d0
        ip_output+0x75/0x280
        ip_send_skb+0x15/0x40
        raw_sendmsg+0xae3/0x1410
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
        Kernel panic - not syncing: Fatal exception in interrupt
      
      Add a TDC selftest to document that 'rate' is now being validated.
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NYotam Gigi <yotam.gi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      15c0770e
    • M
      net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock(). · 78b4bf26
      Mao Wenan 提交于
      [ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
      
      When it is to cleanup net namespace, rds_tcp_exit_net() will call
      rds_tcp_kill_sock(), if t_sock is NULL, it will not call
      rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
      connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
      net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
      and reference 'net' which has already been freed.
      
      In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
      sock->ops->connect, but if connect() is failed, it will call
      rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
      failed, rds_connect_worker() will try to reconnect all the time, so
      rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
      connections.
      
      Therefore, the condition !tc->t_sock is not needed if it is going to do
      cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
      NULL, and there is on other path to cancel cp_conn_w and free
      connection. So this patch is to fix this.
      
      rds_tcp_kill_sock():
      ...
      if (net != c_net || !tc->t_sock)
      ...
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      
      ==================================================================
      BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
      net/ipv4/af_inet.c:340
      Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
      
      CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
      Hardware name: linux,dummy-virt (DT)
      Workqueue: krdsd rds_connect_worker
      Call trace:
       dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x120/0x188 lib/dump_stack.c:113
       print_address_description+0x68/0x278 mm/kasan/report.c:253
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x21c/0x348 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
       inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
       __sock_create+0x4f8/0x770 net/socket.c:1276
       sock_create_kern+0x50/0x68 net/socket.c:1322
       rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
       rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      Allocated by task 687:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2705 [inline]
       slab_alloc mm/slub.c:2713 [inline]
       kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
       kmem_cache_zalloc include/linux/slab.h:697 [inline]
       net_alloc net/core/net_namespace.c:384 [inline]
       copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
       create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
       ksys_unshare+0x340/0x628 kernel/fork.c:2577
       __do_sys_unshare kernel/fork.c:2645 [inline]
       __se_sys_unshare kernel/fork.c:2643 [inline]
       __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
      
      Freed by task 264:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
       kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
       slab_free_hook mm/slub.c:1370 [inline]
       slab_free_freelist_hook mm/slub.c:1397 [inline]
       slab_free mm/slub.c:2952 [inline]
       kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
       net_free net/core/net_namespace.c:400 [inline]
       net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
       net_drop_ns net/core/net_namespace.c:406 [inline]
       cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      The buggy address belongs to the object at ffff8003496a3f80
       which belongs to the cache net_namespace of size 7872
      The buggy address is located 1796 bytes inside of
       7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
      The buggy address belongs to the page:
      page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
      index:0x0 compound_mapcount: 0
      flags: 0xffffe0000008100(slab|head)
      raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
      raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 467fa153("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      78b4bf26
    • E
      netns: provide pure entropy for net_hash_mix() · a1c2f322
      Eric Dumazet 提交于
      [ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
      
      net_hash_mix() currently uses kernel address of a struct net,
      and is used in many places that could be used to reveal this
      address to a patient attacker, thus defeating KASLR, for
      the typical case (initial net namespace, &init_net is
      not dynamically allocated)
      
      I believe the original implementation tried to avoid spending
      too many cycles in this function, but security comes first.
      
      Also provide entropy regardless of CONFIG_NET_NS.
      
      Fixes: 0b441916 ("netns: introduce the net_hash_mix "salt" for hashes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Reported-by: NBenny Pinkas <benny@pinkas.net>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a1c2f322
    • S
      net-gro: Fix GRO flush when receiving a GSO packet. · b87ec813
      Steffen Klassert 提交于
      [ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
      
      Currently we may merge incorrectly a received GSO packet
      or a packet with frag_list into a packet sitting in the
      gro_hash list. skb_segment() may crash case because
      the assumptions on the skb layout are not met.
      The correct behaviour would be to flush the packet in the
      gro_hash list and send the received GSO packet directly
      afterwards. Commit d61d072e ("net-gro: avoid reorders")
      sets NAPI_GRO_CB(skb)->flush in this case, but this is not
      checked before merging. This patch makes sure to check this
      flag and to not merge in that case.
      
      Fixes: d61d072e ("net-gro: avoid reorders")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b87ec813
    • L
      net: ethtool: not call vzalloc for zero sized memory request · 80c20581
      Li RongQing 提交于
      [ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
      
      NULL or ZERO_SIZE_PTR will be returned for zero sized memory
      request, and derefencing them will lead to a segfault
      
      so it is unnecessory to call vzalloc for zero sized memory
      request and not call functions which maybe derefence the
      NULL allocated memory
      
      this also fixes a possible memory leak if phy_ethtool_get_stats
      returns error, memory should be freed before exit
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Reviewed-by: NWang Li <wangli39@baidu.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80c20581
    • J
      kcm: switch order of device registration to fix a crash · b7b05831
      Jiri Slaby 提交于
      [ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
      
      When kcm is loaded while many processes try to create a KCM socket, a
      crash occurs:
       BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
       IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
       Oops: 0002 [#1] SMP KASAN PTI
       CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
       RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
       ...
       CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
       Call Trace:
        kcm_create+0x600/0xbf0 [kcm]
        __sock_create+0x324/0x750 net/socket.c:1272
       ...
      
      This is due to race between sock_create and unfinished
      register_pernet_device. kcm_create tries to do "net_generic(net,
      kcm_net_id)". but kcm_net_id is not initialized yet.
      
      So switch the order of the two to close the race.
      
      This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
      and one process doing module removal.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b7b05831
    • L
      ipv6: sit: reset ip header pointer in ipip6_rcv · 42f1fa0f
      Lorenzo Bianconi 提交于
      [ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
      
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      42f1fa0f
    • J
      ipv6: Fix dangling pointer when ipv6 fragment · ea06796f
      Junwei Hu 提交于
      [ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
      
      At the beginning of ip6_fragment func, the prevhdr pointer is
      obtained in the ip6_find_1stfragopt func.
      However, all the pointers pointing into skb header may change
      when calling skb_checksum_help func with
      skb->ip_summed = CHECKSUM_PARTIAL condition.
      The prevhdr pointe will be dangling if it is not reloaded after
      calling __skb_linearize func in skb_checksum_help func.
      
      Here, I add a variable, nexthdr_offset, to evaluate the offset,
      which does not changes even after calling __skb_linearize func.
      
      Fixes: 405c92f7 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea06796f
    • S
      ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type · 8e4b4da3
      Sheena Mira-ato 提交于
      [ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
      
      The device type for ip6 tunnels is set to
      ARPHRD_TUNNEL6. However, the ip4ip6_err function
      is expecting the device type of the tunnel to be
      ARPHRD_TUNNEL.  Since the device types do not
      match, the function exits and the ICMP error
      packet is not sent to the originating host. Note
      that the device type for IPv4 tunnels is set to
      ARPHRD_TUNNEL.
      
      Fix is to expect a tunnel device type of
      ARPHRD_TUNNEL6 instead.  Now the tunnel device
      type matches and the ICMP error packet is sent
      to the originating host.
      Signed-off-by: NSheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e4b4da3
  4. 06 4月, 2019 4 次提交
    • F
      netfilter: physdev: relax br_netfilter dependency · 2e6bcc32
      Florian Westphal 提交于
      [ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ]
      
      Following command:
        iptables -D FORWARD -m physdev ...
      causes connectivity loss in some setups.
      
      Reason is that iptables userspace will probe kernel for the module revision
      of the physdev patch, and physdev has an artificial dependency on
      br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
      is loaded).
      
      This causes the "phydev" module to be loaded, which in turn enables the
      "call-iptables" infrastructure.
      
      bridged packets might then get dropped by the iptables ruleset.
      
      The better fix would be to change the "call-iptables" defaults to 0 and
      enforce explicit setting to 1, but that breaks backwards compatibility.
      
      This does the next best thing: add a request_module call to checkentry.
      This was a stray '-D ... -m physdev' won't activate br_netfilter
      anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2e6bcc32
    • C
      netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm · 3a1ce979
      Chieh-Min Wang 提交于
      [ Upstream commit 13f5251fd17088170c18844534682d9cab5ff5aa ]
      
      For bridge(br_flood) or broadcast/multicast packets, they could clone
      skb with unconfirmed conntrack which break the rule that unconfirmed
      skb->_nfct is never shared.  With nfqueue running on my system, the race
      can be easily reproduced with following warning calltrace:
      
      [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P        W       4.4.60 #7744
      [13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
      [13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14)
      [13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8)
      [13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0)
      [13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24)
      [13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618)
      [13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc)
      [13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c)
      [13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4)
      [13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac)
      [13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170)
      [13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420)
      [13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248)
      [13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0)
      [13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c)
      [13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368)
      [13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44)
      [13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200)
      [13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64)
      [13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34)
      
      The original code just triggered the warning but do nothing. It will
      caused the shared conntrack moves to the dying list and the packet be
      droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).
      
      - Reproduce steps:
      
      +----------------------------+
      |          br0(bridge)       |
      |                            |
      +-+---------+---------+------+
        | eth0|   | eth1|   | eth2|
        |     |   |     |   |     |
        +--+--+   +--+--+   +---+-+
           |         |          |
           |         |          |
        +--+-+     +-+--+    +--+-+
        | PC1|     | PC2|    | PC3|
        +----+     +----+    +----+
      
      iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass
      
      ps: Our nfq userspace program will set mark on packets whose connection
      has already been processed.
      
      PC1 sends broadcast packets simulated by hping3:
      
      hping3 --rand-source --udp 192.168.1.255 -i u100
      
      - Broadcast racing flow chart is as follow:
      
      br_handle_frame
        BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
        // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
        br_handle_frame_finish
          // check if this packet is broadcast
          br_flood_forward
            br_flood
              list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
                maybe_deliver
                  deliver_clone
                    skb = skb_clone(skb)
                    __br_forward
                      BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
                      // queue in our nfq and received by our userspace program
                      // goto __nf_conntrack_confirm with process context on CPU 1
          br_pass_frame_up
            BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
            // goto __nf_conntrack_confirm with softirq context on CPU 0
      
      Because conntrack confirm can happen at both INPUT and POSTROUTING
      stage.  So with NFQUEUE running, skb->_nfct with the same unconfirmed
      conntrack could race on different core.
      
      This patch fixes a repeating kernel splat, now it is only displayed
      once.
      Signed-off-by: NChieh-Min Wang <chiehminw@synology.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3a1ce979
    • F
      netfilter: conntrack: tcp: only close if RST matches exact sequence · ca66f667
      Florian Westphal 提交于
      [ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]
      
      TCP resets cause instant transition from established to closed state
      provided the reset is in-window.  Endpoints that implement RFC 5961
      require resets to match the next expected sequence number.
      RST segments that are in-window (but that do not match RCV.NXT) are
      ignored, and a "challenge ACK" is sent back.
      
      Main problem for conntrack is that its a middlebox, i.e.  whereas an end
      host might have ACK'd SEQ (and would thus accept an RST with this
      sequence number), conntrack might not have seen this ACK (yet).
      
      Therefore we can't simply flag RSTs with non-exact match as invalid.
      
      This updates RST processing as follows:
      
      1. If the connection is in a state other than ESTABLISHED, nothing is
         changed, RST is subject to normal in-window check.
      
      2. If the RSTs sequence number either matches exactly RCV.NXT,
         connection state moves to CLOSE.
      
      3. The same applies if the RST sequence number aligns with a previous
         packet in the same direction.
      
      In all other cases, the connection remains in ESTABLISHED state.
      If the normal-in-window check passes, the timeout will be lowered
      to that of CLOSE.
      
      If the peer sends a challenge ack, connection timeout will be reset.
      
      If the challenge ACK triggers another RST (RST was valid after all),
      this 2nd RST will match expected sequence and conntrack state changes to
      CLOSE.
      
      If no challenge ACK is received, the connection will time out after
      CLOSE seconds (10 seconds by default), just like without this patch.
      
      Packetdrill test case:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive a segment.
      0.210 < P. 1:1001(1000) ack 1 win 46
      0.210 > . 1:1(0) ack 1001
      
      // Application writes 1000 bytes.
      0.250 write(4, ..., 1000) = 1000
      0.250 > P. 1:1001(1000) ack 1001
      
      // First reset, old sequence. Conntrack (correctly) considers this
      // invalid due to failed window validation (regardless of this patch).
      0.260 < R  2:2(0) ack 1001 win 260
      
      // 2nd reset, but too far ahead sequence.  Same: correctly handled
      // as invalid.
      0.270 < R 99990001:99990001(0) ack 1001 win 260
      
      // in-window, but not exact sequence.
      // Current Linux kernels might reply with a challenge ack, and do not
      // remove connection.
      // Without this patch, conntrack state moves to CLOSE.
      // With patch, timeout is lowered like CLOSE, but connection stays
      // in ESTABLISHED state.
      0.280 < R 1010:1010(0) ack 1001 win 260
      
      // Expect challenge ACK
      0.281 > . 1001:1001(0) ack 1001 win 501
      
      // With or without this patch, RST will cause connection
      // to move to CLOSE (sequence number matches)
      // 0.282 < R 1001:1001(0) ack 1001 win 260
      
      // ACK
      0.300 < . 1001:1001(0) ack 1001 win 257
      
      // more data could be exchanged here, connection
      // is still established
      
      // Client closes the connection.
      0.610 < F. 1001:1001(0) ack 1001 win 260
      0.650 > . 1001:1001(0) ack 1002
      
      // Close the connection without reading outstanding data
      0.700 close(4) = 0
      
      // so one more reset.  Will be deemed acceptable with patch as well:
      // connection is already closing.
      0.701 > R. 1001:1001(0) ack 1002 win 501
      // End packetdrill test case.
      
      With patch, this generates following conntrack events:
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      
      Without patch, first RST moves connection to close, whereas socket state
      does not change until FIN is received.
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca66f667
    • L
      netfilter: nf_tables: check the result of dereferencing base_chain->stats · 709aaa09
      Li RongQing 提交于
      [ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]
      
      Check the result of dereferencing base_chain->stats, instead of result
      of this_cpu_ptr with NULL.
      
      base_chain->stats maybe be changed to NULL when a chain is updated and a
      new NULL counter can be attached.
      
      And we do not need to check returning of this_cpu_ptr since
      base_chain->stats is from percpu allocator if it is non-NULL,
      this_cpu_ptr returns a valid value.
      
      And fix two sparse error by replacing rcu_access_pointer and
      rcu_dereference with READ_ONCE under rcu_read_lock.
      
      Thanks for Eric's help to finish this patch.
      
      Fixes: 00924094 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      709aaa09