1. 26 1月, 2020 1 次提交
  2. 24 1月, 2020 2 次提交
    • M
      mptcp: Add MPTCP socket stubs · f870fa0b
      Mat Martineau 提交于
      Implements the infrastructure for MPTCP sockets.
      
      MPTCP sockets open one in-kernel TCP socket per subflow. These subflow
      sockets are only managed by the MPTCP socket that owns them and are not
      visible from userspace. This commit allows a userspace program to open
      an MPTCP socket with:
      
        sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
      
      The resulting socket is simply a wrapper around a single regular TCP
      socket, without any of the MPTCP protocol implemented over the wire.
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Co-developed-by: NPeter Krystad <peter.krystad@linux.intel.com>
      Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
      Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f870fa0b
    • E
      tcp: do not leave dangling pointers in tp->highest_sack · 2bec445f
      Eric Dumazet 提交于
      Latest commit 85369750 ("tcp: Fix highest_sack and highest_sack_seq")
      apparently allowed syzbot to trigger various crashes in TCP stack [1]
      
      I believe this commit only made things easier for syzbot to find
      its way into triggering use-after-frees. But really the bugs
      could lead to bad TCP behavior or even plain crashes even for
      non malicious peers.
      
      I have audited all calls to tcp_rtx_queue_unlink() and
      tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated
      if we are removing from rtx queue the skb that tp->highest_sack points to.
      
      These updates were missing in three locations :
      
      1) tcp_clean_rtx_queue() [This one seems quite serious,
                                I have no idea why this was not caught earlier]
      
      2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]
      
      3) tcp_send_synack()     [Probably not a big deal for normal operations]
      
      [1]
      BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
      BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
      BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
      Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16
      
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.5.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
       __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
       kasan_report+0x12/0x20 mm/kasan/common.c:639
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134
       tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
       tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
       tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
       tcp_try_undo_partial net/ipv4/tcp_input.c:2730 [inline]
       tcp_fastretrans_alert+0xf74/0x23f0 net/ipv4/tcp_input.c:2847
       tcp_ack+0x2577/0x5bf0 net/ipv4/tcp_input.c:3710
       tcp_rcv_established+0x6dd/0x1e90 net/ipv4/tcp_input.c:5706
       tcp_v4_do_rcv+0x619/0x8d0 net/ipv4/tcp_ipv4.c:1619
       tcp_v4_rcv+0x307f/0x3b40 net/ipv4/tcp_ipv4.c:2001
       ip_protocol_deliver_rcu+0x5a/0x880 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x23b/0x380 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x1db/0x2f0 net/ipv4/ip_input.c:428
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:538
       __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5148
       __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5262
       process_backlog+0x206/0x750 net/core/dev.c:6093
       napi_poll net/core/dev.c:6530 [inline]
       net_rx_action+0x508/0x1120 net/core/dev.c:6598
       __do_softirq+0x262/0x98c kernel/softirq.c:292
       run_ksoftirqd kernel/softirq.c:603 [inline]
       run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
       smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
       kthread+0x361/0x430 kernel/kthread.c:255
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      
      Allocated by task 10091:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       __kasan_kmalloc mm/kasan/common.c:513 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
       slab_post_alloc_hook mm/slab.h:584 [inline]
       slab_alloc_node mm/slab.c:3263 [inline]
       kmem_cache_alloc_node+0x138/0x740 mm/slab.c:3575
       __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:198
       alloc_skb_fclone include/linux/skbuff.h:1099 [inline]
       sk_stream_alloc_skb net/ipv4/tcp.c:875 [inline]
       sk_stream_alloc_skb+0x113/0xc90 net/ipv4/tcp.c:852
       tcp_sendmsg_locked+0xcf9/0x3470 net/ipv4/tcp.c:1282
       tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1432
       inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:672
       __sys_sendto+0x262/0x380 net/socket.c:1998
       __do_sys_sendto net/socket.c:2010 [inline]
       __se_sys_sendto net/socket.c:2006 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:2006
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 10095:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       kasan_set_free_info mm/kasan/common.c:335 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
       __cache_free mm/slab.c:3426 [inline]
       kmem_cache_free+0x86/0x320 mm/slab.c:3694
       kfree_skbmem+0x178/0x1c0 net/core/skbuff.c:645
       __kfree_skb+0x1e/0x30 net/core/skbuff.c:681
       sk_eat_skb include/net/sock.h:2453 [inline]
       tcp_recvmsg+0x1252/0x2930 net/ipv4/tcp.c:2166
       inet_recvmsg+0x136/0x610 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:886 [inline]
       sock_recvmsg net/socket.c:904 [inline]
       sock_recvmsg+0xce/0x110 net/socket.c:900
       __sys_recvfrom+0x1ff/0x350 net/socket.c:2055
       __do_sys_recvfrom net/socket.c:2073 [inline]
       __se_sys_recvfrom net/socket.c:2069 [inline]
       __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:2069
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8880a488d040
       which belongs to the cache skbuff_fclone_cache of size 456
      The buggy address is located 40 bytes inside of
       456-byte region [ffff8880a488d040, ffff8880a488d208)
      The buggy address belongs to the page:
      page:ffffea0002922340 refcount:1 mapcount:0 mapping:ffff88821b057000 index:0x0
      raw: 00fffe0000000200 ffffea00022a5788 ffffea0002624a48 ffff88821b057000
      raw: 0000000000000000 ffff8880a488d040 0000000100000006 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                                ^
       ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 85369750 ("tcp: Fix highest_sack and highest_sack_seq")
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Fixes: 737ff314 ("tcp: use sequence distance to detect reordering")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cambda Zhu <cambda@linux.alibaba.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bec445f
  3. 21 1月, 2020 1 次提交
  4. 10 1月, 2020 1 次提交
  5. 16 12月, 2019 1 次提交
  6. 14 12月, 2019 1 次提交
  7. 11 12月, 2019 1 次提交
  8. 10 12月, 2019 1 次提交
  9. 15 11月, 2019 1 次提交
    • A
      y2038: socket: use __kernel_old_timespec instead of timespec · df1b4ba9
      Arnd Bergmann 提交于
      The 'timespec' type definition and helpers like ktime_to_timespec()
      or timespec64_to_timespec() should no longer be used in the kernel so
      we can remove them and avoid introducing y2038 issues in new code.
      
      Change the socket code that needs to pass a timespec to user space for
      backward compatibility to use __kernel_old_timespec instead.  This type
      has the same layout but with a clearer defined name.
      
      Slightly reformat tcp_recv_timestamp() for consistency after the removal
      of timespec64_to_timespec().
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      df1b4ba9
  10. 07 11月, 2019 4 次提交
    • E
      tcp: fix data-race in tcp_recvmsg() · a5a7daa5
      Eric Dumazet 提交于
      Reading tp->recvmsg_inq after socket lock is released
      raises a KCSAN warning [1]
      
      Replace has_tss & has_cmsg by cmsg_flags and make
      sure to not read tp->recvmsg_inq a second time.
      
      [1]
      BUG: KCSAN: data-race in tcp_chrono_stop / tcp_recvmsg
      
      write to 0xffff888126adef24 of 2 bytes by interrupt on cpu 0:
       tcp_chrono_set net/ipv4/tcp_output.c:2309 [inline]
       tcp_chrono_stop+0x14c/0x280 net/ipv4/tcp_output.c:2338
       tcp_clean_rtx_queue net/ipv4/tcp_input.c:3165 [inline]
       tcp_ack+0x274f/0x3170 net/ipv4/tcp_input.c:3688
       tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
       tcp_v4_rcv+0x19dc/0x1bb0 net/ipv4/tcp_ipv4.c:1942
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5214
       napi_skb_finish net/core/dev.c:5677 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5710
      
      read to 0xffff888126adef25 of 1 bytes by task 7275 on cpu 1:
       tcp_recvmsg+0x77b/0x1a30 net/ipv4/tcp.c:2187
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7275 Comm: sshd Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: b75eba76 ("tcp: send in-queue bytes in cmsg upon read")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5a7daa5
    • E
      net: silence data-races on sk_backlog.tail · 9ed498c6
      Eric Dumazet 提交于
      sk->sk_backlog.tail might be read without holding the socket spinlock,
      we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:907 [inline]
       sk_add_backlog include/net/sock.h:938 [inline]
       tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759
       tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133
       napi_skb_finish net/core/dev.c:5596 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5629
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6311 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6379
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0xbb/0xe0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263
       ret_from_intr+0x0/0x19
       native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
       arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
       default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
       cpuidle_idle_call kernel/sched/idle.c:154 [inline]
       do_idle+0x1af/0x280 kernel/sched/idle.c:263
       cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
       start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
      
      read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0:
       tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed498c6
    • E
      net: annotate lockless accesses to sk->sk_max_ack_backlog · 099ecf59
      Eric Dumazet 提交于
      sk->sk_max_ack_backlog can be read without any lock being held
      at least in TCP/DCCP cases.
      
      We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing
      and/or potential KCSAN warnings.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      099ecf59
    • E
      net: annotate lockless accesses to sk->sk_ack_backlog · 288efe86
      Eric Dumazet 提交于
      sk->sk_ack_backlog can be read without any lock being held.
      We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing
      and/or potential KCSAN warnings.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      288efe86
  11. 29 10月, 2019 2 次提交
  12. 26 10月, 2019 1 次提交
    • J
      tcp: add TCP_INFO status for failed client TFO · 48027478
      Jason Baron 提交于
      The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
      or not data-in-SYN was ack'd on both the client and server side. We'd like
      to gather more information on the client-side in the failure case in order
      to indicate the reason for the failure. This can be useful for not only
      debugging TFO, but also for creating TFO socket policies. For example, if
      a middle box removes the TFO option or drops a data-in-SYN, we can
      can detect this case, and turn off TFO for these connections saving the
      extra retransmits.
      
      The newly added tcpi_fastopen_client_fail status is 2 bits and has the
      following 4 states:
      
      1) TFO_STATUS_UNSPEC
      
      Catch-all state which includes when TFO is disabled via black hole
      detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.
      
      2) TFO_COOKIE_UNAVAILABLE
      
      If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
      is available in the cache.
      
      3) TFO_DATA_NOT_ACKED
      
      Data was sent with SYN, we received a SYN/ACK but it did not cover the data
      portion. Cookie is not accepted by server because the cookie may be invalid
      or the server may be overloaded.
      
      4) TFO_SYN_RETRANSMITTED
      
      Data was sent with SYN, we received a SYN/ACK which did not cover the data
      after at least 1 additional SYN was sent (without data). It may be the case
      that a middle-box is dropping data-in-SYN packets. Thus, it would be more
      efficient to not use TFO on this connection to avoid extra retransmits
      during connection establishment.
      
      These new fields do not cover all the cases where TFO may fail, but other
      failures, such as SYN/ACK + data being dropped, will result in the
      connection not becoming established. And a connection blackhole after
      session establishment shows up as a stalled connection.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48027478
  13. 16 10月, 2019 1 次提交
  14. 14 10月, 2019 10 次提交
    • S
      tcp: improve recv_skip_hint for tcp_zerocopy_receive · c208bdb9
      Soheil Hassas Yeganeh 提交于
      tcp_zerocopy_receive() rounds down the zc->length a multiple of
      PAGE_SIZE. This results in two issues:
      - tcp_zerocopy_receive sets recv_skip_hint to the length of the
        receive queue if the zc->length input is smaller than the
        PAGE_SIZE, even though the data in receive queue could be
        zerocopied.
      - tcp_zerocopy_receive would set recv_skip_hint of 0, in cases
        where we have a little bit of data after the perfectly-sized
        packets.
      
      To fix these issues, do not store the rounded down value in
      zc->length. Round down the length passed to zap_page_range(),
      and return min(inq, zc->length) when the zap_range is 0.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c208bdb9
    • E
      tcp: annotate sk->sk_wmem_queued lockless reads · ab4e846a
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_wmem_queued while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      sk_wmem_queued_add() helper is added so that we can in
      the future convert to ADD_ONCE() or equivalent if/when
      available.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab4e846a
    • E
      tcp: annotate sk->sk_sndbuf lockless reads · e292f05e
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_sndbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e292f05e
    • E
      tcp: annotate sk->sk_rcvbuf lockless reads · ebb3b78d
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_rcvbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebb3b78d
    • E
      tcp: annotate tp->urg_seq lockless reads · d9b55bf7
      Eric Dumazet 提交于
      There two places where we fetch tp->urg_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write side use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9b55bf7
    • E
      tcp: annotate tp->snd_nxt lockless reads · e0d694d6
      Eric Dumazet 提交于
      There are few places where we fetch tp->snd_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0d694d6
    • E
      tcp: annotate tp->write_seq lockless reads · 0f317464
      Eric Dumazet 提交于
      There are few places where we fetch tp->write_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f317464
    • E
      tcp: annotate tp->copied_seq lockless reads · 7db48e98
      Eric Dumazet 提交于
      There are few places where we fetch tp->copied_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7db48e98
    • E
      tcp: annotate tp->rcv_nxt lockless reads · dba7d9b8
      Eric Dumazet 提交于
      There are few places where we fetch tp->rcv_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
      
      write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
       tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
       tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
       tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
      
      read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
       tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
       tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
       ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
       ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
       ep_send_events fs/eventpoll.c:1793 [inline]
       ep_poll+0xe3/0x900 fs/eventpoll.c:1930
       do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
       __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
       __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
       __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba7d9b8
    • E
      tcp: add rcu protection around tp->fastopen_rsk · d983ea6f
      Eric Dumazet 提交于
      Both tcp_v4_err() and tcp_v6_err() do the following operations
      while they do not own the socket lock :
      
      	fastopen = tp->fastopen_rsk;
       	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
      
      The problem is that without appropriate barrier, the compiler
      might reload tp->fastopen_rsk and trigger a NULL deref.
      
      request sockets are protected by RCU, we can simply add
      the missing annotations and barriers to solve the issue.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d983ea6f
  15. 10 10月, 2019 2 次提交
  16. 04 10月, 2019 1 次提交
    • E
      tcp: fix slab-out-of-bounds in tcp_zerocopy_receive() · 3afb0961
      Eric Dumazet 提交于
      Apparently a refactoring patch brought a bug, that was caught
      by syzbot [1]
      
      Original code was correct, do not try to be smarter than the
      compiler :/
      
      [1]
      BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
      BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
      Read of size 4 at addr ffff8880943cf188 by task syz-executor.2/17508
      
      CPU: 0 PID: 17508 Comm: syz-executor.2 Not tainted 5.3.0-rc7+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
       __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
       kasan_report+0x12/0x17 mm/kasan/common.c:618
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
       do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
       tcp_getsockopt+0xbf/0xe0 net/ipv4/tcp.c:3680
       sock_common_getsockopt+0x94/0xd0 net/core/sock.c:3098
       __sys_getsockopt+0x16d/0x310 net/socket.c:2129
       __do_sys_getsockopt net/socket.c:2144 [inline]
       __se_sys_getsockopt net/socket.c:2141 [inline]
       __x64_sys_getsockopt+0xbe/0x150 net/socket.c:2141
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
      
      Fixes: d8e18a51 ("net: Use skb accessors in network core")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3afb0961
  17. 16 9月, 2019 2 次提交
  18. 28 8月, 2019 1 次提交
  19. 10 8月, 2019 1 次提交
  20. 09 8月, 2019 1 次提交
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
  21. 31 7月, 2019 1 次提交
  22. 23 7月, 2019 1 次提交
  23. 19 7月, 2019 1 次提交
  24. 09 7月, 2019 1 次提交