1. 14 12月, 2019 1 次提交
  2. 15 11月, 2019 1 次提交
    • A
      y2038: socket: use __kernel_old_timespec instead of timespec · df1b4ba9
      Arnd Bergmann 提交于
      The 'timespec' type definition and helpers like ktime_to_timespec()
      or timespec64_to_timespec() should no longer be used in the kernel so
      we can remove them and avoid introducing y2038 issues in new code.
      
      Change the socket code that needs to pass a timespec to user space for
      backward compatibility to use __kernel_old_timespec instead.  This type
      has the same layout but with a clearer defined name.
      
      Slightly reformat tcp_recv_timestamp() for consistency after the removal
      of timespec64_to_timespec().
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      df1b4ba9
  3. 07 11月, 2019 4 次提交
    • E
      tcp: fix data-race in tcp_recvmsg() · a5a7daa5
      Eric Dumazet 提交于
      Reading tp->recvmsg_inq after socket lock is released
      raises a KCSAN warning [1]
      
      Replace has_tss & has_cmsg by cmsg_flags and make
      sure to not read tp->recvmsg_inq a second time.
      
      [1]
      BUG: KCSAN: data-race in tcp_chrono_stop / tcp_recvmsg
      
      write to 0xffff888126adef24 of 2 bytes by interrupt on cpu 0:
       tcp_chrono_set net/ipv4/tcp_output.c:2309 [inline]
       tcp_chrono_stop+0x14c/0x280 net/ipv4/tcp_output.c:2338
       tcp_clean_rtx_queue net/ipv4/tcp_input.c:3165 [inline]
       tcp_ack+0x274f/0x3170 net/ipv4/tcp_input.c:3688
       tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
       tcp_v4_rcv+0x19dc/0x1bb0 net/ipv4/tcp_ipv4.c:1942
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5214
       napi_skb_finish net/core/dev.c:5677 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5710
      
      read to 0xffff888126adef25 of 1 bytes by task 7275 on cpu 1:
       tcp_recvmsg+0x77b/0x1a30 net/ipv4/tcp.c:2187
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7275 Comm: sshd Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: b75eba76 ("tcp: send in-queue bytes in cmsg upon read")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5a7daa5
    • E
      net: silence data-races on sk_backlog.tail · 9ed498c6
      Eric Dumazet 提交于
      sk->sk_backlog.tail might be read without holding the socket spinlock,
      we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:907 [inline]
       sk_add_backlog include/net/sock.h:938 [inline]
       tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759
       tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133
       napi_skb_finish net/core/dev.c:5596 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5629
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6311 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6379
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0xbb/0xe0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263
       ret_from_intr+0x0/0x19
       native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
       arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
       default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
       cpuidle_idle_call kernel/sched/idle.c:154 [inline]
       do_idle+0x1af/0x280 kernel/sched/idle.c:263
       cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
       start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
      
      read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0:
       tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed498c6
    • E
      net: annotate lockless accesses to sk->sk_max_ack_backlog · 099ecf59
      Eric Dumazet 提交于
      sk->sk_max_ack_backlog can be read without any lock being held
      at least in TCP/DCCP cases.
      
      We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing
      and/or potential KCSAN warnings.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      099ecf59
    • E
      net: annotate lockless accesses to sk->sk_ack_backlog · 288efe86
      Eric Dumazet 提交于
      sk->sk_ack_backlog can be read without any lock being held.
      We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing
      and/or potential KCSAN warnings.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      288efe86
  4. 29 10月, 2019 2 次提交
  5. 26 10月, 2019 1 次提交
    • J
      tcp: add TCP_INFO status for failed client TFO · 48027478
      Jason Baron 提交于
      The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
      or not data-in-SYN was ack'd on both the client and server side. We'd like
      to gather more information on the client-side in the failure case in order
      to indicate the reason for the failure. This can be useful for not only
      debugging TFO, but also for creating TFO socket policies. For example, if
      a middle box removes the TFO option or drops a data-in-SYN, we can
      can detect this case, and turn off TFO for these connections saving the
      extra retransmits.
      
      The newly added tcpi_fastopen_client_fail status is 2 bits and has the
      following 4 states:
      
      1) TFO_STATUS_UNSPEC
      
      Catch-all state which includes when TFO is disabled via black hole
      detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.
      
      2) TFO_COOKIE_UNAVAILABLE
      
      If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
      is available in the cache.
      
      3) TFO_DATA_NOT_ACKED
      
      Data was sent with SYN, we received a SYN/ACK but it did not cover the data
      portion. Cookie is not accepted by server because the cookie may be invalid
      or the server may be overloaded.
      
      4) TFO_SYN_RETRANSMITTED
      
      Data was sent with SYN, we received a SYN/ACK which did not cover the data
      after at least 1 additional SYN was sent (without data). It may be the case
      that a middle-box is dropping data-in-SYN packets. Thus, it would be more
      efficient to not use TFO on this connection to avoid extra retransmits
      during connection establishment.
      
      These new fields do not cover all the cases where TFO may fail, but other
      failures, such as SYN/ACK + data being dropped, will result in the
      connection not becoming established. And a connection blackhole after
      session establishment shows up as a stalled connection.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48027478
  6. 16 10月, 2019 1 次提交
  7. 14 10月, 2019 10 次提交
    • S
      tcp: improve recv_skip_hint for tcp_zerocopy_receive · c208bdb9
      Soheil Hassas Yeganeh 提交于
      tcp_zerocopy_receive() rounds down the zc->length a multiple of
      PAGE_SIZE. This results in two issues:
      - tcp_zerocopy_receive sets recv_skip_hint to the length of the
        receive queue if the zc->length input is smaller than the
        PAGE_SIZE, even though the data in receive queue could be
        zerocopied.
      - tcp_zerocopy_receive would set recv_skip_hint of 0, in cases
        where we have a little bit of data after the perfectly-sized
        packets.
      
      To fix these issues, do not store the rounded down value in
      zc->length. Round down the length passed to zap_page_range(),
      and return min(inq, zc->length) when the zap_range is 0.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c208bdb9
    • E
      tcp: annotate sk->sk_wmem_queued lockless reads · ab4e846a
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_wmem_queued while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      sk_wmem_queued_add() helper is added so that we can in
      the future convert to ADD_ONCE() or equivalent if/when
      available.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab4e846a
    • E
      tcp: annotate sk->sk_sndbuf lockless reads · e292f05e
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_sndbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e292f05e
    • E
      tcp: annotate sk->sk_rcvbuf lockless reads · ebb3b78d
      Eric Dumazet 提交于
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_rcvbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebb3b78d
    • E
      tcp: annotate tp->urg_seq lockless reads · d9b55bf7
      Eric Dumazet 提交于
      There two places where we fetch tp->urg_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write side use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9b55bf7
    • E
      tcp: annotate tp->snd_nxt lockless reads · e0d694d6
      Eric Dumazet 提交于
      There are few places where we fetch tp->snd_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0d694d6
    • E
      tcp: annotate tp->write_seq lockless reads · 0f317464
      Eric Dumazet 提交于
      There are few places where we fetch tp->write_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f317464
    • E
      tcp: annotate tp->copied_seq lockless reads · 7db48e98
      Eric Dumazet 提交于
      There are few places where we fetch tp->copied_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7db48e98
    • E
      tcp: annotate tp->rcv_nxt lockless reads · dba7d9b8
      Eric Dumazet 提交于
      There are few places where we fetch tp->rcv_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
      
      write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
       tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
       tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
       tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
      
      read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
       tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
       tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
       ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
       ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
       ep_send_events fs/eventpoll.c:1793 [inline]
       ep_poll+0xe3/0x900 fs/eventpoll.c:1930
       do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
       __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
       __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
       __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba7d9b8
    • E
      tcp: add rcu protection around tp->fastopen_rsk · d983ea6f
      Eric Dumazet 提交于
      Both tcp_v4_err() and tcp_v6_err() do the following operations
      while they do not own the socket lock :
      
      	fastopen = tp->fastopen_rsk;
       	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
      
      The problem is that without appropriate barrier, the compiler
      might reload tp->fastopen_rsk and trigger a NULL deref.
      
      request sockets are protected by RCU, we can simply add
      the missing annotations and barriers to solve the issue.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d983ea6f
  8. 10 10月, 2019 2 次提交
  9. 04 10月, 2019 1 次提交
    • E
      tcp: fix slab-out-of-bounds in tcp_zerocopy_receive() · 3afb0961
      Eric Dumazet 提交于
      Apparently a refactoring patch brought a bug, that was caught
      by syzbot [1]
      
      Original code was correct, do not try to be smarter than the
      compiler :/
      
      [1]
      BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
      BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
      Read of size 4 at addr ffff8880943cf188 by task syz-executor.2/17508
      
      CPU: 0 PID: 17508 Comm: syz-executor.2 Not tainted 5.3.0-rc7+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
       __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
       kasan_report+0x12/0x17 mm/kasan/common.c:618
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
       do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
       tcp_getsockopt+0xbf/0xe0 net/ipv4/tcp.c:3680
       sock_common_getsockopt+0x94/0xd0 net/core/sock.c:3098
       __sys_getsockopt+0x16d/0x310 net/socket.c:2129
       __do_sys_getsockopt net/socket.c:2144 [inline]
       __se_sys_getsockopt net/socket.c:2141 [inline]
       __x64_sys_getsockopt+0xbe/0x150 net/socket.c:2141
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
      
      Fixes: d8e18a51 ("net: Use skb accessors in network core")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3afb0961
  10. 16 9月, 2019 2 次提交
  11. 28 8月, 2019 1 次提交
  12. 10 8月, 2019 1 次提交
  13. 09 8月, 2019 1 次提交
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
  14. 31 7月, 2019 1 次提交
  15. 23 7月, 2019 1 次提交
  16. 19 7月, 2019 1 次提交
  17. 09 7月, 2019 1 次提交
  18. 23 6月, 2019 1 次提交
    • A
      net: fastopen: robustness and endianness fixes for SipHash · 438ac880
      Ard Biesheuvel 提交于
      Some changes to the TCP fastopen code to make it more robust
      against future changes in the choice of key/cookie size, etc.
      
      - Instead of keeping the SipHash key in an untyped u8[] buffer
        and casting it to the right type upon use, use the correct
        type directly. This ensures that the key will appear at the
        correct alignment if we ever change the way these data
        structures are allocated. (Currently, they are only allocated
        via kmalloc so they always appear at the correct alignment)
      
      - Use DIV_ROUND_UP when sizing the u64[] array to hold the
        cookie, so it is always of sufficient size, even if
        TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.
      
      - Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
        function, which is no longer used.
      
      - Add endian swabbing when setting the keys and calculating the hash,
        to ensure that cookie values are the same for a given key and
        source/destination address pair regardless of the endianness of
        the server.
      
      Note that none of these are functional changes wrt the current
      state of the code, with the exception of the swabbing, which only
      affects big endian systems.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      438ac880
  19. 17 6月, 2019 1 次提交
  20. 16 6月, 2019 1 次提交
    • E
      tcp: limit payload size of sacked skbs · 3b4929f6
      Eric Dumazet 提交于
      Jonathan Looney reported that TCP can trigger the following crash
      in tcp_shifted_skb() :
      
      	BUG_ON(tcp_skb_pcount(skb) < pcount);
      
      This can happen if the remote peer has advertized the smallest
      MSS that linux TCP accepts : 48
      
      An skb can hold 17 fragments, and each fragment can hold 32KB
      on x86, or 64KB on PowerPC.
      
      This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
      can overflow.
      
      Note that tcp_sendmsg() builds skbs with less than 64KB
      of payload, so this problem needs SACK to be enabled.
      SACK blocks allow TCP to coalesce multiple skbs in the retransmit
      queue, thus filling the 17 fragments to maximal capacity.
      
      CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
      
      Fixes: 832d11c5 ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJonathan Looney <jtl@netflix.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b4929f6
  21. 13 6月, 2019 1 次提交
    • E
      tcp: add optional per socket transmit delay · a842fe14
      Eric Dumazet 提交于
      Adding delays to TCP flows is crucial for studying behavior
      of TCP stacks, including congestion control modules.
      
      Linux offers netem module, but it has unpractical constraints :
      - Need root access to change qdisc
      - Hard to setup on egress if combined with non trivial qdisc like FQ
      - Single delay for all flows.
      
      EDT (Earliest Departure Time) adoption in TCP stack allows us
      to enable a per socket delay at a very small cost.
      
      Networking tools can now establish thousands of flows, each of them
      with a different delay, simulating real world conditions.
      
      This requires FQ packet scheduler or a EDT-enabled NIC.
      
      This patchs adds TCP_TX_DELAY socket option, to set a delay in
      usec units.
      
        unsigned int tx_delay = 10000; /* 10 msec */
      
        setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));
      
      Note that FQ packet scheduler limits might need some tweaking :
      
      man tc-fq
      
      PARAMETERS
         limit
             Hard  limit  on  the  real  queue  size. When this limit is
             reached, new packets are dropped. If the value is  lowered,
             packets  are  dropped so that the new limit is met. Default
             is 10000 packets.
      
         flow_limit
             Hard limit on the maximum  number  of  packets  queued  per
             flow.  Default value is 100.
      
      Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
      so packets would be dropped if any of the previous limit is hit.
      
      Use of a jump label makes this support runtime-free, for hosts
      never using the option.
      
      Also note that TSQ (TCP Small Queues) limits are slightly changed
      with this patch : we need to account that skbs artificially delayed
      wont stop us providind more skbs to feed the pipe (netem uses
      skb_orphan_partial() for this purpose, but FQ can not use this trick)
      
      Because of that, using big delays might very well trigger
      old bugs in TSO auto defer logic and/or sndbuf limited detection.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a842fe14
  22. 31 5月, 2019 3 次提交
  23. 16 5月, 2019 1 次提交