1. 17 11月, 2020 2 次提交
  2. 24 10月, 2020 1 次提交
  3. 03 10月, 2020 1 次提交
    • C
      tcp: use sendpage_ok() to detect misused .sendpage · cf83a17e
      Coly Li 提交于
      commit a10674bf ("tcp: detecting the misuse of .sendpage for Slab
      objects") adds the checks for Slab pages, but the pages don't have
      page_count are still missing from the check.
      
      Network layer's sendpage method is not designed to send page_count 0
      pages neither, therefore both PageSlab() and page_count() should be
      both checked for the sending page. This is exactly what sendpage_ok()
      does.
      
      This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused
      .sendpage, to make the code more robust.
      
      Fixes: a10674bf ("tcp: detecting the misuse of .sendpage for Slab objects")
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf83a17e
  4. 01 10月, 2020 1 次提交
  5. 15 9月, 2020 3 次提交
  6. 11 9月, 2020 2 次提交
  7. 25 8月, 2020 4 次提交
  8. 11 8月, 2020 1 次提交
    • J
      tcp: correct read of TFO keys on big endian systems · f19008e6
      Jason Baron 提交于
      When TFO keys are read back on big endian systems either via the global
      sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
      don't match what was written.
      
      For example, on s390x:
      
      # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
      # cat /proc/sys/net/ipv4/tcp_fastopen_key
      02000000-01000000-04000000-03000000
      
      Instead of:
      
      # cat /proc/sys/net/ipv4/tcp_fastopen_key
      00000001-00000002-00000003-00000004
      
      Fix this by converting to the correct endianness on read. This was
      reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
      selftest on s390x, which depends on the read value matching what was
      written. I've confirmed that the test now passes on big and little endian
      systems.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Fixes: 438ac880 ("net: fastopen: robustness and endianness fixes for SipHash")
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-and-tested-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f19008e6
  9. 01 8月, 2020 1 次提交
  10. 29 7月, 2020 1 次提交
  11. 25 7月, 2020 3 次提交
  12. 20 7月, 2020 1 次提交
  13. 10 7月, 2020 1 次提交
    • C
      tcp: make sure listeners don't initialize congestion-control state · ce69e563
      Christoph Paasch 提交于
      syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
      tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
      just copies all the memory, the allocated pointer will be copied as
      well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
      If now the socket will be destroyed before the congestion-control
      has properly been initialized (through a call to tcp_init_transfer), we
      will end up freeing memory that does not belong to that particular
      socket, opening the door to a double-free:
      
      [   11.413102] ==================================================================
      [   11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
      [   11.415329]
      [   11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
      [   11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.418148] Call Trace:
      [   11.418534]  <IRQ>
      [   11.418834]  dump_stack+0x7d/0xb0
      [   11.419297]  print_address_description.constprop.0+0x1a/0x210
      [   11.422079]  kasan_report_invalid_free+0x51/0x80
      [   11.423433]  __kasan_slab_free+0x15e/0x170
      [   11.424761]  kfree+0x8c/0x230
      [   11.425157]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.425872]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.426493]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.427093]  tcp_v4_syn_recv_sock+0xb29/0x1100
      [   11.427731]  tcp_get_cookie_sock+0xc3/0x4a0
      [   11.429457]  cookie_v4_check+0x13d0/0x2500
      [   11.433189]  tcp_v4_do_rcv+0x60e/0x780
      [   11.433727]  tcp_v4_rcv+0x2869/0x2e10
      [   11.437143]  ip_protocol_deliver_rcu+0x23/0x190
      [   11.437810]  ip_local_deliver+0x294/0x350
      [   11.439566]  __netif_receive_skb_one_core+0x15d/0x1a0
      [   11.441995]  process_backlog+0x1b1/0x6b0
      [   11.443148]  net_rx_action+0x37e/0xc40
      [   11.445361]  __do_softirq+0x18c/0x61a
      [   11.445881]  asm_call_on_stack+0x12/0x20
      [   11.446409]  </IRQ>
      [   11.446716]  do_softirq_own_stack+0x34/0x40
      [   11.447259]  do_softirq.part.0+0x26/0x30
      [   11.447827]  __local_bh_enable_ip+0x46/0x50
      [   11.448406]  ip_finish_output2+0x60f/0x1bc0
      [   11.450109]  __ip_queue_xmit+0x71c/0x1b60
      [   11.451861]  __tcp_transmit_skb+0x1727/0x3bb0
      [   11.453789]  tcp_rcv_state_process+0x3070/0x4d3a
      [   11.456810]  tcp_v4_do_rcv+0x2ad/0x780
      [   11.457995]  __release_sock+0x14b/0x2c0
      [   11.458529]  release_sock+0x4a/0x170
      [   11.459005]  __inet_stream_connect+0x467/0xc80
      [   11.461435]  inet_stream_connect+0x4e/0xa0
      [   11.462043]  __sys_connect+0x204/0x270
      [   11.465515]  __x64_sys_connect+0x6a/0xb0
      [   11.466088]  do_syscall_64+0x3e/0x70
      [   11.466617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.467341] RIP: 0033:0x7f56046dc469
      [   11.467844] Code: Bad RIP value.
      [   11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      [   11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
      [   11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
      [   11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
      [   11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
      [   11.474321]
      [   11.474527] Allocated by task 4884:
      [   11.475031]  save_stack+0x1b/0x40
      [   11.475548]  __kasan_kmalloc.constprop.0+0xc2/0xd0
      [   11.476182]  tcp_cdg_init+0xf0/0x150
      [   11.476744]  tcp_init_congestion_control+0x9b/0x3a0
      [   11.477435]  tcp_set_congestion_control+0x270/0x32f
      [   11.478088]  do_tcp_setsockopt.isra.0+0x521/0x1a00
      [   11.478744]  __sys_setsockopt+0xff/0x1e0
      [   11.479259]  __x64_sys_setsockopt+0xb5/0x150
      [   11.479895]  do_syscall_64+0x3e/0x70
      [   11.480395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.481097]
      [   11.481321] Freed by task 4872:
      [   11.481783]  save_stack+0x1b/0x40
      [   11.482230]  __kasan_slab_free+0x12c/0x170
      [   11.482839]  kfree+0x8c/0x230
      [   11.483240]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.483948]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.484502]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.485144]  tcp_close+0x932/0xfe0
      [   11.485642]  inet_release+0xc1/0x1c0
      [   11.486131]  __sock_release+0xc0/0x270
      [   11.486697]  sock_close+0xc/0x10
      [   11.487145]  __fput+0x277/0x780
      [   11.487632]  task_work_run+0xeb/0x180
      [   11.488118]  __prepare_exit_to_usermode+0x15a/0x160
      [   11.488834]  do_syscall_64+0x4a/0x70
      [   11.489326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Wei Wang fixed a part of these CDG-malloc issues with commit c1201444
      ("tcp: memset ca_priv data to 0 properly").
      
      This patch here fixes the listener-scenario: We make sure that listeners
      setting the congestion-control through setsockopt won't initialize it
      (thus CDG never allocates on listeners). For those who use AF_UNSPEC to
      reuse a socket, tcp_disconnect() is changed to cleanup afterwards.
      
      (The issue can be reproduced at least down to v4.4.x.)
      
      Cc: Wei Wang <weiwan@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce69e563
  14. 03 7月, 2020 1 次提交
    • E
      tcp: md5: allow changing MD5 keys in all socket states · 1ca0fafd
      Eric Dumazet 提交于
      This essentially reverts commit 72123032 ("tcp: md5: reject TCP_MD5SIG
      or TCP_MD5SIG_EXT on established sockets")
      
      Mathieu reported that many vendors BGP implementations can
      actually switch TCP MD5 on established flows.
      
      Quoting Mathieu :
         Here is a list of a few network vendors along with their behavior
         with respect to TCP MD5:
      
         - Cisco: Allows for password to be changed, but within the hold-down
           timer (~180 seconds).
         - Juniper: When password is initially set on active connection it will
           reset, but after that any subsequent password changes no network
           resets.
         - Nokia: No notes on if they flap the tcp connection or not.
         - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
           both sides are ok with new passwords.
         - Meta-Switch: Expects the password to be set before a connection is
           attempted, but no further info on whether they reset the TCP
           connection on a change.
         - Avaya: Disable the neighbor, then set password, then re-enable.
         - Zebos: Would normally allow the change when socket connected.
      
      We can revert my prior change because commit 9424e2e7 ("tcp: md5: fix potential
      overestimation of TCP option space") removed the leak of 4 kernel bytes to
      the wire that was the main reason for my patch.
      
      While doing my investigations, I found a bug when a MD5 key is changed, leading
      to these commits that stable teams want to consider before backporting this revert :
      
       Commit 6a2febec ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
       Commit e6ced831 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")
      
      Fixes: 72123032 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ca0fafd
  15. 02 7月, 2020 1 次提交
    • E
      tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers · e6ced831
      Eric Dumazet 提交于
      My prior fix went a bit too far, according to Herbert and Mathieu.
      
      Since we accept that concurrent TCP MD5 lookups might see inconsistent
      keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()
      
      Clearing all key->key[] is needed to avoid possible KMSAN reports,
      if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
      using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.
      
      data_race() was added in linux-5.8 and will prevent KCSAN reports,
      this can safely be removed in stable backports, if data_race() is
      not yet backported.
      
      v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()
      
      Fixes: 6a2febec ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Marco Elver <elver@google.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6ced831
  16. 01 7月, 2020 1 次提交
    • E
      tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key() · 6a2febec
      Eric Dumazet 提交于
      MD5 keys are read with RCU protection, and tcp_md5_do_add()
      might update in-place a prior key.
      
      Normally, typical RCU updates would allocate a new piece
      of memory. In this case only key->key and key->keylen might
      be updated, and we do not care if an incoming packet could
      see the old key, the new one, or some intermediate value,
      since changing the key on a live flow is known to be problematic
      anyway.
      
      We only want to make sure that in the case key->keylen
      is changed, cpus in tcp_md5_hash_key() wont try to use
      uninitialized data, or crash because key->keylen was
      read twice to feed sg_init_one() and ahash_request_set_crypt()
      
      Fixes: 9ea88a15 ("tcp: md5: check md5 signature without socket lock")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a2febec
  17. 25 6月, 2020 1 次提交
  18. 10 6月, 2020 2 次提交
  19. 09 6月, 2020 1 次提交
  20. 29 5月, 2020 8 次提交
  21. 15 5月, 2020 1 次提交
    • E
      tcp: fix error recovery in tcp_zerocopy_receive() · e776af60
      Eric Dumazet 提交于
      If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE
      operation we want to return -EINVAL error.
      
      But depending on zc->recv_skip_hint content, we might return
      -EIO error if the socket has SOCK_DONE set.
      
      Make sure to return -EINVAL in this case.
      
      BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
      BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
      CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
       do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
       tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728
       sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131
       __sys_getsockopt+0x533/0x7b0 net/socket.c:2177
       __do_sys_getsockopt net/socket.c:2192 [inline]
       __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189
       __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c829
      Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
      RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829
      RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009
      RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000
      R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4
      
      Local variable ----zc@do_tcp_getsockopt created at:
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e776af60
  22. 13 5月, 2020 1 次提交
    • E
      tcp: fix SO_RCVLOWAT hangs with fat skbs · 24adbc16
      Eric Dumazet 提交于
      We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
      overhead in tcp_set_rcvlowat()
      
      This works well when skb->len/skb->truesize ratio is bigger than 0.5
      
      But if we receive packets with small MSS, we can end up in a situation
      where not enough bytes are available in the receive queue to satisfy
      RCVLOWAT setting.
      As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
      preventing remote peer from sending more data.
      
      Even autotuning does not help, because it only triggers at the time
      user process drains the queue. If no EPOLLIN is generated, this
      can not happen.
      
      Note poll() has a similar issue, after commit
      c7004482 ("tcp: Respect SO_RCVLOWAT in tcp_poll().")
      
      Fixes: 03f45c88 ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24adbc16
  23. 09 5月, 2020 1 次提交