1. 15 11月, 2017 1 次提交
  2. 10 11月, 2017 1 次提交
  3. 28 10月, 2017 10 次提交
  4. 27 10月, 2017 9 次提交
  5. 26 10月, 2017 1 次提交
    • E
      tcp/dccp: fix other lockdep splats accessing ireq_opt · 06f877d6
      Eric Dumazet 提交于
      In my first attempt to fix the lockdep splat, I forgot we could
      enter inet_csk_route_req() with a freshly allocated request socket,
      for which refcount has not yet been elevated, due to complex
      SLAB_TYPESAFE_BY_RCU rules.
      
      We either are in rcu_read_lock() section _or_ we own a refcount on the
      request.
      
      Correct RCU verb to use here is rcu_dereference_check(), although it is
      not possible to prove we actually own a reference on a shared
      refcount :/
      
      In v2, I added ireq_opt_deref() helper and use in three places, to fix other
      possible splats.
      
      [   49.844590]  lockdep_rcu_suspicious+0xea/0xf3
      [   49.846487]  inet_csk_route_req+0x53/0x14d
      [   49.848334]  tcp_v4_route_req+0xe/0x10
      [   49.850174]  tcp_conn_request+0x31c/0x6a0
      [   49.851992]  ? __lock_acquire+0x614/0x822
      [   49.854015]  tcp_v4_conn_request+0x5a/0x79
      [   49.855957]  ? tcp_v4_conn_request+0x5a/0x79
      [   49.858052]  tcp_rcv_state_process+0x98/0xdcc
      [   49.859990]  ? sk_filter_trim_cap+0x2f6/0x307
      [   49.862085]  tcp_v4_do_rcv+0xfc/0x145
      [   49.864055]  ? tcp_v4_do_rcv+0xfc/0x145
      [   49.866173]  tcp_v4_rcv+0x5ab/0xaf9
      [   49.868029]  ip_local_deliver_finish+0x1af/0x2e7
      [   49.870064]  ip_local_deliver+0x1b2/0x1c5
      [   49.871775]  ? inet_del_offload+0x45/0x45
      [   49.873916]  ip_rcv_finish+0x3f7/0x471
      [   49.875476]  ip_rcv+0x3f1/0x42f
      [   49.876991]  ? ip_local_deliver_finish+0x2e7/0x2e7
      [   49.878791]  __netif_receive_skb_core+0x6d3/0x950
      [   49.880701]  ? process_backlog+0x7e/0x216
      [   49.882589]  __netif_receive_skb+0x1d/0x5e
      [   49.884122]  process_backlog+0x10c/0x216
      [   49.885812]  net_rx_action+0x147/0x3df
      
      Fixes: a6ca7abe ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f877d6
  6. 24 10月, 2017 2 次提交
  7. 21 10月, 2017 1 次提交
    • E
      tcp/dccp: fix ireq->opt races · c92e8c02
      Eric Dumazet 提交于
      syzkaller found another bug in DCCP/TCP stacks [1]
      
      For the reasons explained in commit ce105008 ("tcp/dccp: fix
      ireq->pktopts race"), we need to make sure we do not access
      ireq->opt unless we own the request sock.
      
      Note the opt field is renamed to ireq_opt to ease grep games.
      
      [1]
      BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
      Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
      
      CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
       ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
       tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
       tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
       tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
       __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
       tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
       tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
       tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
       tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x40c341
      RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
      RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
      R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
      
      Allocated by task 3295:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       __do_kmalloc mm/slab.c:3725 [inline]
       __kmalloc+0x162/0x760 mm/slab.c:3734
       kmalloc include/linux/slab.h:498 [inline]
       tcp_v4_save_options include/net/tcp.h:1962 [inline]
       tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
       tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
       tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
       tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
       tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
       tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3306:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xca/0x250 mm/slab.c:3820
       inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
       __sk_destruct+0xfd/0x910 net/core/sock.c:1560
       sk_destruct+0x47/0x80 net/core/sock.c:1595
       __sk_free+0x57/0x230 net/core/sock.c:1603
       sk_free+0x2a/0x40 net/core/sock.c:1614
       sock_put include/net/sock.h:1652 [inline]
       inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
       tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c92e8c02
  8. 20 10月, 2017 1 次提交
  9. 18 10月, 2017 1 次提交
  10. 07 10月, 2017 1 次提交
    • E
      tcp: implement rb-tree based retransmit queue · 75c119af
      Eric Dumazet 提交于
      Using a linear list to store all skbs in write queue has been okay
      for quite a while : O(N) is not too bad when N < 500.
      
      Things get messy when N is the order of 100,000 : Modern TCP stacks
      want 10Gbit+ of throughput even with 200 ms RTT flows.
      
      40 ns per cache line miss means a full scan can use 4 ms,
      blowing away CPU caches.
      
      SACK processing often can use various hints to avoid parsing
      whole retransmit queue. But with high packet losses and/or high
      reordering, hints no longer work.
      
      Sender has to process thousands of unfriendly SACK, accumulating
      a huge socket backlog, burning a cpu and massively dropping packets.
      
      Using an rb-tree for retransmit queue has been avoided for years
      because it added complexity and overhead, but now is the time
      to be more resistant and say no to quadratic behavior.
      
      1) RTX queue is no longer part of the write queue : already sent skbs
      are stored in one rb-tree.
      
      2) Since reaching the head of write queue no longer needs
      sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue
      
      Tested:
      
       On receiver :
       netem on ingress : delay 150ms 200us loss 1
       GRO disabled to force stress and SACK storms.
      
      for f in `seq 1 10`
      do
       ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
      done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'
      
      Before patch :
      
      323.87
      351.48
      339.59
      338.62
      306.72
      204.07
      304.93
      291.88
      202.47
      176.88
         2840
      
      After patch:
      
      1700.83
      2207.98
      2070.17
      1544.26
      2114.76
      2124.89
      1693.14
      1080.91
      2216.82
      1299.94
        18053
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75c119af
  11. 02 10月, 2017 3 次提交
  12. 01 10月, 2017 1 次提交
  13. 09 9月, 2017 1 次提交
  14. 24 8月, 2017 1 次提交
    • M
      tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg · 98aaa913
      Mike Maloney 提交于
      When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
      timestamp corresponding to the highest sequence number data returned.
      
      Previously the skb->tstamp is overwritten when a TCP packet is placed
      in the out of order queue.  While the packet is in the ooo queue, save the
      timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
      options which are only used on the tx path, and a previously unused 4
      byte hole.
      
      When skbs are coalesced either in the sk_receive_queue or the
      out_of_order_queue always choose the timestamp of the appended skb to
      maintain the invariant of returning the timestamp of the last byte in
      the recvmsg buffer.
      Signed-off-by: NMike Maloney <maloney@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98aaa913
  15. 15 8月, 2017 1 次提交
  16. 08 8月, 2017 1 次提交
    • D
      net: ipv4: add second dif to inet socket lookups · 3fa6f616
      David Ahern 提交于
      Add a second device index, sdif, to inet socket lookups. sdif is the
      index for ingress devices enslaved to an l3mdev. It allows the lookups
      to consider the enslaved device as well as the L3 domain when searching
      for a socket.
      
      TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
      ingress index is obtained from IPCB using inet_sdif and after the cb move
      in  tcp_v4_rcv the tcp_v4_sdif helper is used.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa6f616
  17. 07 8月, 2017 1 次提交
  18. 01 8月, 2017 2 次提交
    • F
      tcp: remove low_latency sysctl · b6690b14
      Florian Westphal 提交于
      Was only checked by the removed prequeue code.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6690b14
    • F
      tcp: remove prequeue support · e7942d06
      Florian Westphal 提交于
      prequeue is a tcp receive optimization that moves part of rx processing
      from bh to process context.
      
      This only works if the socket being processed belongs to a process that
      is blocked in recv on that socket.
      
      In practice, this doesn't happen anymore that often because nowadays
      servers tend to use an event driven (epoll) model.
      
      Even normal client applications (web browsers) commonly use many tcp
      connections in parallel.
      
      This has measureable impact only in netperf (which uses plain recv and
      thus allows prequeue use) from host to locally running vm (~4%), however,
      there were no changes when using netperf between two physical hosts with
      ixgbe interfaces.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7942d06
  19. 25 7月, 2017 1 次提交