1. 03 10月, 2018 1 次提交
  2. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  3. 26 5月, 2018 1 次提交
  4. 08 4月, 2018 1 次提交
    • E
      dccp: initialize ireq->ir_mark · b855ff82
      Eric Dumazet 提交于
      syzbot reported an uninit-value read of skb->mark in iptable_mangle_hook()
      
      Thanks to the nice report, I tracked the problem to dccp not caring
      of ireq->ir_mark for passive sessions.
      
      BUG: KMSAN: uninit-value in ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline]
      BUG: KMSAN: uninit-value in iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84
      CPU: 0 PID: 5300 Comm: syz-executor3 Not tainted 4.16.0+ #81
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline]
       iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84
       nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
       nf_hook_slow+0x158/0x3d0 net/netfilter/core.c:483
       nf_hook include/linux/netfilter.h:243 [inline]
       __ip_local_out net/ipv4/ip_output.c:113 [inline]
       ip_local_out net/ipv4/ip_output.c:122 [inline]
       ip_queue_xmit+0x1d21/0x21c0 net/ipv4/ip_output.c:504
       dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142
       dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281
       dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363
       dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x455259
      RSP: 002b:00007f1a4473dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f1a4473e6d4 RCX: 0000000000455259
      RDX: 0000000000000000 RSI: 0000000020b76fc8 RDI: 0000000000000015
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000004f0 R14: 00000000006fa720 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       ip_queue_xmit+0x1e35/0x21c0 net/ipv4/ip_output.c:502
       dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142
       dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281
       dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363
       dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       inet_csk_clone_lock+0x503/0x580 net/ipv4/inet_connection_sock.c:797
       dccp_create_openreq_child+0x7f/0x890 net/dccp/minisocks.c:92
       dccp_v4_request_recv_sock+0x22c/0xe90 net/dccp/ipv4.c:408
       dccp_v6_request_recv_sock+0x290/0x2000 net/dccp/ipv6.c:414
       dccp_check_req+0x7b9/0x8f0 net/dccp/minisocks.c:197
       dccp_v4_rcv+0x12e4/0x2630 net/dccp/ipv4.c:840
       ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:449 [inline]
       ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
       __netif_receive_skb net/core/dev.c:4627 [inline]
       process_backlog+0x62d/0xe20 net/core/dev.c:5307
       napi_poll net/core/dev.c:5705 [inline]
       net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
       __do_softirq+0x56d/0x93d kernel/softirq.c:285
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
       reqsk_alloc include/net/request_sock.h:88 [inline]
       inet_reqsk_alloc+0xc4/0x7f0 net/ipv4/tcp_input.c:6145
       dccp_v4_conn_request+0x5cc/0x1770 net/dccp/ipv4.c:600
       dccp_v6_conn_request+0x299/0x1880 net/dccp/ipv6.c:317
       dccp_rcv_state_process+0x2ea/0x2410 net/dccp/input.c:612
       dccp_v4_do_rcv+0x229/0x340 net/dccp/ipv4.c:682
       dccp_v6_do_rcv+0x16d/0x1220 net/dccp/ipv6.c:578
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __sk_receive_skb+0x60e/0xf20 net/core/sock.c:513
       dccp_v4_rcv+0x24d4/0x2630 net/dccp/ipv4.c:874
       ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:449 [inline]
       ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
       __netif_receive_skb net/core/dev.c:4627 [inline]
       process_backlog+0x62d/0xe20 net/core/dev.c:5307
       napi_poll net/core/dev.c:5705 [inline]
       net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
       __do_softirq+0x56d/0x93d kernel/softirq.c:285
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b855ff82
  5. 28 3月, 2018 1 次提交
  6. 05 3月, 2018 1 次提交
  7. 26 10月, 2017 1 次提交
    • E
      tcp/dccp: fix other lockdep splats accessing ireq_opt · 06f877d6
      Eric Dumazet 提交于
      In my first attempt to fix the lockdep splat, I forgot we could
      enter inet_csk_route_req() with a freshly allocated request socket,
      for which refcount has not yet been elevated, due to complex
      SLAB_TYPESAFE_BY_RCU rules.
      
      We either are in rcu_read_lock() section _or_ we own a refcount on the
      request.
      
      Correct RCU verb to use here is rcu_dereference_check(), although it is
      not possible to prove we actually own a reference on a shared
      refcount :/
      
      In v2, I added ireq_opt_deref() helper and use in three places, to fix other
      possible splats.
      
      [   49.844590]  lockdep_rcu_suspicious+0xea/0xf3
      [   49.846487]  inet_csk_route_req+0x53/0x14d
      [   49.848334]  tcp_v4_route_req+0xe/0x10
      [   49.850174]  tcp_conn_request+0x31c/0x6a0
      [   49.851992]  ? __lock_acquire+0x614/0x822
      [   49.854015]  tcp_v4_conn_request+0x5a/0x79
      [   49.855957]  ? tcp_v4_conn_request+0x5a/0x79
      [   49.858052]  tcp_rcv_state_process+0x98/0xdcc
      [   49.859990]  ? sk_filter_trim_cap+0x2f6/0x307
      [   49.862085]  tcp_v4_do_rcv+0xfc/0x145
      [   49.864055]  ? tcp_v4_do_rcv+0xfc/0x145
      [   49.866173]  tcp_v4_rcv+0x5ab/0xaf9
      [   49.868029]  ip_local_deliver_finish+0x1af/0x2e7
      [   49.870064]  ip_local_deliver+0x1b2/0x1c5
      [   49.871775]  ? inet_del_offload+0x45/0x45
      [   49.873916]  ip_rcv_finish+0x3f7/0x471
      [   49.875476]  ip_rcv+0x3f1/0x42f
      [   49.876991]  ? ip_local_deliver_finish+0x2e7/0x2e7
      [   49.878791]  __netif_receive_skb_core+0x6d3/0x950
      [   49.880701]  ? process_backlog+0x7e/0x216
      [   49.882589]  __netif_receive_skb+0x1d/0x5e
      [   49.884122]  process_backlog+0x10c/0x216
      [   49.885812]  net_rx_action+0x147/0x3df
      
      Fixes: a6ca7abe ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f877d6
  8. 21 10月, 2017 1 次提交
    • E
      tcp/dccp: fix ireq->opt races · c92e8c02
      Eric Dumazet 提交于
      syzkaller found another bug in DCCP/TCP stacks [1]
      
      For the reasons explained in commit ce105008 ("tcp/dccp: fix
      ireq->pktopts race"), we need to make sure we do not access
      ireq->opt unless we own the request sock.
      
      Note the opt field is renamed to ireq_opt to ease grep games.
      
      [1]
      BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
      Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
      
      CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
       ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
       tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
       tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
       tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
       __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
       tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
       tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
       tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
       tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x40c341
      RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
      RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
      R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
      
      Allocated by task 3295:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       __do_kmalloc mm/slab.c:3725 [inline]
       __kmalloc+0x162/0x760 mm/slab.c:3734
       kmalloc include/linux/slab.h:498 [inline]
       tcp_v4_save_options include/net/tcp.h:1962 [inline]
       tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
       tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
       tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
       tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
       tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
       tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3306:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xca/0x250 mm/slab.c:3820
       inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
       __sk_destruct+0xfd/0x910 net/core/sock.c:1560
       sk_destruct+0x47/0x80 net/core/sock.c:1595
       __sk_free+0x57/0x230 net/core/sock.c:1603
       sk_free+0x2a/0x40 net/core/sock.c:1614
       sock_put include/net/sock.h:1652 [inline]
       inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
       tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c92e8c02
  9. 08 8月, 2017 1 次提交
    • D
      net: ipv4: add second dif to inet socket lookups · 3fa6f616
      David Ahern 提交于
      Add a second device index, sdif, to inet socket lookups. sdif is the
      index for ingress devices enslaved to an l3mdev. It allows the lookups
      to consider the enslaved device as well as the L3 domain when searching
      for a socket.
      
      TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
      ingress index is obtained from IPCB using inet_sdif and after the cb move
      in  tcp_v4_rcv the tcp_v4_sdif helper is used.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa6f616
  10. 27 7月, 2017 1 次提交
  11. 21 6月, 2017 1 次提交
    • X
      dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init · d5494acb
      Xin Long 提交于
      Now dccp_ipv4 works as a kernel module. During loading this module, if
      one dccp packet is being recieved after inet_add_protocol but before
      register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
      dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.
      
      Jianlin found this issue when the following call trace occurred:
      
      [  171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
      [  171.951007] IP: [<ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
      [...]
      [  171.984629] Call Trace:
      [  171.984859]  <IRQ>
      [  171.985061]
      [  171.985213]  [<ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
      [  171.985711]  [<ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0
      [  171.986309]  [<ffffffff815ca339>] ip_local_deliver+0x59/0xd0
      [  171.986852]  [<ffffffff810cd7a4>] ? update_curr+0x104/0x190
      [  171.986956]  [<ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350
      [  171.986956]  [<ffffffff815ca666>] ip_rcv+0x2b6/0x410
      [  171.986956]  [<ffffffff810c83b4>] ? task_cputime+0x44/0x80
      [  171.986956]  [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
      [  171.986956]  [<ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0
      [  171.986956]  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
      [  171.986956]  [<ffffffff8158841e>] process_backlog+0xae/0x180
      [  171.986956]  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
      [  171.986956]  [<ffffffff81090b7f>] __do_softirq+0xef/0x280
      [  171.986956]  [<ffffffff816b6a1c>] call_softirq+0x1c/0x30
      
      This patch is to move inet_add_protocol after register_pernet_subsys in
      dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
      packets are processed.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5494acb
  12. 19 4月, 2017 1 次提交
    • P
      mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a
      Paul E. McKenney 提交于
      A group of Linux kernel hackers reported chasing a bug that resulted
      from their assumption that SLAB_DESTROY_BY_RCU provided an existence
      guarantee, that is, that no block from such a slab would be reallocated
      during an RCU read-side critical section.  Of course, that is not the
      case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
      slab of blocks.
      
      However, there is a phrase for this, namely "type safety".  This commit
      therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
      to avoid future instances of this sort of confusion.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      [ paulmck: Add comments mentioning the old name, as requested by Eric
        Dumazet, in order to help people familiar with the old name find
        the new one. ]
      Acked-by: NDavid Rientjes <rientjes@google.com>
      5f0d5a3a
  13. 14 3月, 2017 1 次提交
    • J
      dccp/tcp: fix routing redirect race · 45caeaa5
      Jon Maxwell 提交于
      As Eric Dumazet pointed out this also needs to be fixed in IPv6.
      v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
      
      We have seen a few incidents lately where a dst_enty has been freed
      with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
      dst_entry. If the conditions/timings are right a crash then ensues when the
      freed dst_entry is referenced later on. A Common crashing back trace is:
      
       #8 [] page_fault at ffffffff8163e648
          [exception RIP: __tcp_ack_snd_check+74]
      .
      .
       #9 [] tcp_rcv_established at ffffffff81580b64
      #10 [] tcp_v4_do_rcv at ffffffff8158b54a
      #11 [] tcp_v4_rcv at ffffffff8158cd02
      #12 [] ip_local_deliver_finish at ffffffff815668f4
      #13 [] ip_local_deliver at ffffffff81566bd9
      #14 [] ip_rcv_finish at ffffffff8156656d
      #15 [] ip_rcv at ffffffff81566f06
      #16 [] __netif_receive_skb_core at ffffffff8152b3a2
      #17 [] __netif_receive_skb at ffffffff8152b608
      #18 [] netif_receive_skb at ffffffff8152b690
      #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
      #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
      #21 [] net_rx_action at ffffffff8152bac2
      #22 [] __do_softirq at ffffffff81084b4f
      #23 [] call_softirq at ffffffff8164845c
      #24 [] do_softirq at ffffffff81016fc5
      #25 [] irq_exit at ffffffff81084ee5
      #26 [] do_IRQ at ffffffff81648ff8
      
      Of course it may happen with other NIC drivers as well.
      
      It's found the freed dst_entry here:
      
       224 static bool tcp_in_quickack_mode(struct sock *sk)
       225 {
       226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);
       227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);
       228 
       229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
       230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
       231 }
      
      But there are other backtraces attributed to the same freed dst_entry in
      netfilter code as well.
      
      All the vmcores showed 2 significant clues:
      
      - Remote hosts behind the default gateway had always been redirected to a
      different gateway. A rtable/dst_entry will be added for that host. Making
      more dst_entrys with lower reference counts. Making this more probable.
      
      - All vmcores showed a postitive LockDroppedIcmps value, e.g:
      
      LockDroppedIcmps                  267
      
      A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
      regardless of whether user space has the socket locked. This can result in a
      race condition where the same dst_entry cached in sk->sk_dst_entry can be
      decremented twice for the same socket via:
      
      do_redirect()->__sk_dst_check()-> dst_release().
      
      Which leads to the dst_entry being prematurely freed with another socket
      pointing to it via sk->sk_dst_cache and a subsequent crash.
      
      To fix this skip do_redirect() if usespace has the socket locked. Instead let
      the redirect take place later when user space does not have the socket
      locked.
      
      The dccp/IPv6 code is very similar in this respect, so fixing it there too.
      
      As Eric Garver pointed out the following commit now invalidates routes. Which
      can set the dst->obsolete flag so that ipv4_dst_check() returns null and
      triggers the dst_release().
      
      Fixes: ceb33206 ("ipv4: Kill routes during PMTU/redirect updates.")
      Cc: Eric Garver <egarver@redhat.com>
      Cc: Hannes Sowa <hsowa@redhat.com>
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45caeaa5
  14. 23 2月, 2017 1 次提交
    • A
      net/dccp: fix use after free in tw_timer_handler() · ec7cb62d
      Andrey Ryabinin 提交于
      DCCP doesn't purge timewait sockets on network namespace shutdown.
      So, after net namespace destroyed we could still have an active timer
      which will trigger use after free in tw_timer_handler():
      
          BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
          Read of size 8 by task swapper/1/0
          Call Trace:
           __asan_load8+0x54/0x90
           tw_timer_handler+0x4a/0xa0
           call_timer_fn+0x127/0x480
           expire_timers+0x1db/0x2e0
           run_timer_softirq+0x12f/0x2a0
           __do_softirq+0x105/0x5b4
           irq_exit+0xdd/0xf0
           smp_apic_timer_interrupt+0x57/0x70
           apic_timer_interrupt+0x90/0xa0
      
          Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
          Allocated:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xee/0x180
           kasan_slab_alloc+0x12/0x20
           kmem_cache_alloc+0x134/0x310
           copy_net_ns+0x8d/0x280
           create_new_namespaces+0x23f/0x340
           unshare_nsproxy_namespaces+0x75/0xf0
           SyS_unshare+0x299/0x4f0
           entry_SYSCALL_64_fastpath+0x18/0xad
          Freed:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xae/0x180
           kmem_cache_free+0xb4/0x350
           net_drop_ns+0x3f/0x50
           cleanup_net+0x3df/0x450
           process_one_work+0x419/0xbb0
           worker_thread+0x92/0x850
           kthread+0x192/0x1e0
           ret_from_fork+0x2e/0x40
      
      Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
      timewait sockets on net namespace destruction and prevent above issue.
      
      Fixes: f2bf415c ("mib: add net to NET_ADD_STATS_BH")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec7cb62d
  15. 19 1月, 2017 1 次提交
  16. 30 11月, 2016 1 次提交
  17. 04 11月, 2016 2 次提交
    • E
      dccp: fix out of bound access in dccp_v4_err() · 6706a97f
      Eric Dumazet 提交于
      dccp_v4_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
      
      This patch might allow to process more ICMP messages, as some routers
      are still limiting the size of reflected bytes to 28 (RFC 792), instead
      of extended lengths (RFC 1812 4.3.2.3)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6706a97f
    • E
      dccp: do not release listeners too soon · c3f24cfb
      Eric Dumazet 提交于
      Andrey Konovalov reported following error while fuzzing with syzkaller :
      
      IPv4: Attempt to release alive inet socket ffff880068e98940
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88006b9e0000 task.stack: ffff880068770000
      RIP: 0010:[<ffffffff819ead5f>]  [<ffffffff819ead5f>]
      selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
      RSP: 0018:ffff8800687771c8  EFLAGS: 00010202
      RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
      RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
      RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
      R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
      R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
      FS:  00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
      Stack:
       ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
       ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
       ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
      Call Trace:
       [<ffffffff819c8dd5>] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
       [<ffffffff82c2a9e7>] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
       [<ffffffff82b81e60>] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
       [<ffffffff838bbf12>] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306abd2>] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
       [<     inline     >] dst_input ./include/net/dst.h:507
       [<ffffffff83068500>] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306b82f>] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
       [<ffffffff82bd9fb7>] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
       [<ffffffff82bdb19a>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
       [<ffffffff82bdb493>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
       [<ffffffff82bdb6b8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
       [<ffffffff8241fc75>] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
       [<ffffffff82421b5a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
       [<     inline     >] new_sync_write fs/read_write.c:499
       [<ffffffff8151bd44>] __vfs_write+0x334/0x570 fs/read_write.c:512
       [<ffffffff8151f85b>] vfs_write+0x17b/0x500 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81523184>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
       [<ffffffff83fc02c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It turns out DCCP calls __sk_receive_skb(), and this broke when
      lookups no longer took a reference on listeners.
      
      Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
      so that sock_put() is used only when needed.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f24cfb
  18. 30 10月, 2016 1 次提交
  19. 14 7月, 2016 1 次提交
    • W
      dccp: limit sk_filter trim to payload · 4f0c40d9
      Willem de Bruijn 提交于
      Dccp verifies packet integrity, including length, at initial rcv in
      dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
      
      A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
      skb_copy_datagram_msg interprets this as a negative value, so
      (correctly) fails with EFAULT. The negative length is reported in
      ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
      
      Introduce an sk_receive_skb variant that caps how small a filter
      program can trim packets, and call this in dccp with the header
      length. Excessively trimmed packets are now processed normally and
      queued for reception as 0B payloads.
      
      Fixes: 7c657876 ("[DCCP]: Initial implementation")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f0c40d9
  20. 10 7月, 2016 1 次提交
  21. 03 5月, 2016 1 次提交
  22. 28 4月, 2016 4 次提交
  23. 08 4月, 2016 1 次提交
  24. 05 4月, 2016 1 次提交
    • E
      tcp/dccp: do not touch listener sk_refcnt under synflood · 3b24d854
      Eric Dumazet 提交于
      When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
      cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.
      
      By letting listeners use SOCK_RCU_FREE infrastructure,
      we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt
      
      Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
      only listeners are impacted by this change.
      
      Peak performance under SYNFLOOD is increased by ~33% :
      
      On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps
      
      Most consuming functions are now skb_set_owner_w() and sock_wfree()
      contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b24d854
  25. 18 3月, 2016 1 次提交
  26. 19 2月, 2016 1 次提交
    • E
      tcp/dccp: fix another race at listener dismantle · 7716682c
      Eric Dumazet 提交于
      Ilya reported following lockdep splat:
      
      kernel: =========================
      kernel: [ BUG: held lock freed! ]
      kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
      kernel: -------------------------
      kernel: swapper/5/0 is freeing memory
      ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
      kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      kernel: 4 locks held by swapper/5/0:
      kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
      netif_receive_skb_internal+0x4b/0x1f0
      kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
      ip_local_deliver_finish+0x3f/0x380
      kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
      sk_clone_lock+0x19b/0x440
      kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      
      To properly fix this issue, inet_csk_reqsk_queue_add() needs
      to return to its callers if the child as been queued
      into accept queue.
      
      We also need to make sure listener is still there before
      calling sk->sk_data_ready(), by holding a reference on it,
      since the reference carried by the child can disappear as
      soon as the child is put on accept queue.
      Reported-by: NIlya Dryomov <idryomov@gmail.com>
      Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7716682c
  27. 11 2月, 2016 1 次提交
  28. 23 10月, 2015 1 次提交
    • E
      tcp/dccp: fix hashdance race for passive sessions · 5e0724d0
      Eric Dumazet 提交于
      Multiple cpus can process duplicates of incoming ACK messages
      matching a SYN_RECV request socket. This is a rare event under
      normal operations, but definitely can happen.
      
      Only one must win the race, otherwise corruption would occur.
      
      To fix this without adding new atomic ops, we use logic in
      inet_ehash_nolisten() to detect the request was present in the same
      ehash bucket where we try to insert the new child.
      
      If request socket was not found, we have to undo the child creation.
      
      This actually removes a spin_lock()/spin_unlock() pair in
      reqsk_queue_unlink() for the fast path.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e0724d0
  29. 16 10月, 2015 2 次提交
  30. 14 10月, 2015 1 次提交
    • E
      tcp/dccp: fix behavior of stale SYN_RECV request sockets · 4bdc3d66
      Eric Dumazet 提交于
      When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
      become stale, meaning 3WHS can not complete.
      
      But current behavior is wrong :
      incoming packets finding such stale sockets are dropped.
      
      We need instead to cleanup the request socket and perform another
      lookup :
      - Incoming ACK will give a RST answer,
      - SYN rtx might find another listener if available.
      - We expedite cleanup of request sockets and old listener socket.
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bdc3d66
  31. 05 10月, 2015 1 次提交
  32. 03 10月, 2015 1 次提交
    • E
      tcp/dccp: install syn_recv requests into ehash table · 079096f1
      Eric Dumazet 提交于
      In this patch, we insert request sockets into TCP/DCCP
      regular ehash table (where ESTABLISHED and TIMEWAIT sockets
      are) instead of using the per listener hash table.
      
      ACK packets find SYN_RECV pseudo sockets without having
      to find and lock the listener.
      
      In nominal conditions, this halves pressure on listener lock.
      
      Note that this will allow for SO_REUSEPORT refinements,
      so that we can select a listener using cpu/numa affinities instead
      of the prior 'consistent hash', since only SYN packets will
      apply this selection logic.
      
      We will shrink listen_sock in the following patch to ease
      code review.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ying Cai <ycai@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      079096f1
  33. 30 9月, 2015 2 次提交
  34. 26 9月, 2015 1 次提交