1. 16 10月, 2015 2 次提交
  2. 14 10月, 2015 1 次提交
    • E
      tcp/dccp: fix behavior of stale SYN_RECV request sockets · 4bdc3d66
      Eric Dumazet 提交于
      When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
      become stale, meaning 3WHS can not complete.
      
      But current behavior is wrong :
      incoming packets finding such stale sockets are dropped.
      
      We need instead to cleanup the request socket and perform another
      lookup :
      - Incoming ACK will give a RST answer,
      - SYN rtx might find another listener if available.
      - We expedite cleanup of request sockets and old listener socket.
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bdc3d66
  3. 09 10月, 2015 1 次提交
  4. 05 10月, 2015 1 次提交
  5. 03 10月, 2015 1 次提交
    • E
      tcp/dccp: install syn_recv requests into ehash table · 079096f1
      Eric Dumazet 提交于
      In this patch, we insert request sockets into TCP/DCCP
      regular ehash table (where ESTABLISHED and TIMEWAIT sockets
      are) instead of using the per listener hash table.
      
      ACK packets find SYN_RECV pseudo sockets without having
      to find and lock the listener.
      
      In nominal conditions, this halves pressure on listener lock.
      
      Note that this will allow for SO_REUSEPORT refinements,
      so that we can select a listener using cpu/numa affinities instead
      of the prior 'consistent hash', since only SYN packets will
      apply this selection logic.
      
      We will shrink listen_sock in the following patch to ease
      code review.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ying Cai <ycai@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      079096f1
  6. 30 9月, 2015 4 次提交
  7. 26 9月, 2015 2 次提交
  8. 22 9月, 2015 1 次提交
    • E
      tcp/dccp: fix timewait races in timer handling · ed2e9239
      Eric Dumazet 提交于
      When creating a timewait socket, we need to arm the timer before
      allowing other cpus to find it. The signal allowing cpus to find
      the socket is setting tw_refcnt to non zero value.
      
      As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
      call inet_twsk_schedule() first.
      
      This also means we need to remove tw_refcnt changes from
      inet_twsk_schedule() and let the caller handle it.
      
      Note that because we use mod_timer_pinned(), we have the guarantee
      the timer wont expire before we set tw_refcnt as we run in BH context.
      
      To make things more readable I introduced inet_twsk_reschedule() helper.
      
      When rearming the timer, we can use mod_timer_pending() to make sure
      we do not rearm a canceled timer.
      
      Note: This bug can possibly trigger if packets of a flow can hit
      multiple cpus. This does not normally happen, unless flow steering
      is broken somehow. This explains this bug was spotted ~5 months after
      its introduction.
      
      A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
      but will be provided in a separate patch for proper tracking.
      
      Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYing Cai <ycai@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed2e9239
  9. 16 9月, 2015 1 次提交
  10. 27 7月, 2015 1 次提交
  11. 16 6月, 2015 1 次提交
  12. 24 4月, 2015 1 次提交
    • E
      inet: fix possible panic in reqsk_queue_unlink() · b357a364
      Eric Dumazet 提交于
      [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
       0000000000000080
      [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243
      
      There is a race when reqsk_timer_handler() and tcp_check_req() call
      inet_csk_reqsk_queue_unlink() on the same req at the same time.
      
      Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener
      timer"), listener spinlock was held and race could not happen.
      
      To solve this bug, we change reqsk_queue_unlink() to not assume req
      must be found, and we return a status, to conditionally release a
      refcount on the request sock.
      
      This also means tcp_check_req() in non fastopen case might or not
      consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
      to properly handle this.
      
      (Same remark for dccp_check_req() and its callers)
      
      inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
      called 4 times in tcp and 3 times in dccp.
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b357a364
  13. 14 4月, 2015 1 次提交
    • E
      tcp/dccp: get rid of central timewait timer · 789f558c
      Eric Dumazet 提交于
      Using a timer wheel for timewait sockets was nice ~15 years ago when
      memory was expensive and machines had a single processor.
      
      This does not scale, code is ugly and source of huge latencies
      (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)
      
      We can afford to use an extra 64 bytes per timewait sock and spread
      timewait load to all cpus to have better behavior.
      
      Tested:
      
      On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
      on the target (lpaa24)
      
      Before patch :
      
      lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
      419594
      
      lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
      437171
      
      While test is running, we can observe 25 or even 33 ms latencies.
      
      lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
      ...
      1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
      rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2
      
      lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
      ...
      1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
      rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2
      
      After patch :
      
      About 90% increase of throughput :
      
      lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
      810442
      
      lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
      800992
      
      And latencies are kept to minimal values during this load, even
      if network utilization is 90% higher :
      
      lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
      ...
      1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
      rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      789f558c
  14. 24 3月, 2015 4 次提交
    • F
      inet: fix double request socket freeing · c6973669
      Fan Du 提交于
      Eric Hugne reported following error :
      
      I'm hitting this warning on latest net-next when i try to SSH into a machine
      with eth0 added to a bridge (but i think the problem is older than that)
      
      Steps to reproduce:
      node2 ~ # brctl addif br0 eth0
      [  223.758785] device eth0 entered promiscuous mode
      node2 ~ # ip link set br0 up
      [  244.503614] br0: port 1(eth0) entered forwarding state
      [  244.505108] br0: port 1(eth0) entered forwarding state
      node2 ~ # [  251.160159] ------------[ cut here ]------------
      [  251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
      [  251.162077] Modules linked in:
      [  251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
      [  251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  251.164078]  ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
      [  251.165084]  0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
      [  251.166195]  ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
      [  251.167203] Call Trace:
      [  251.167533]  [<ffffffff8162ace4>] dump_stack+0x45/0x57
      [  251.168206]  [<ffffffff8104da85>] warn_slowpath_common+0x85/0xc0
      [  251.169239]  [<ffffffff8104db65>] warn_slowpath_null+0x15/0x20
      [  251.170271]  [<ffffffff81559d51>] tcp_v4_err+0x6b1/0x720
      [  251.171408]  [<ffffffff81630d03>] ? _raw_read_lock_irq+0x3/0x10
      [  251.172589]  [<ffffffff81534e20>] ? inet_del_offload+0x40/0x40
      [  251.173366]  [<ffffffff81569295>] icmp_socket_deliver+0x65/0xb0
      [  251.174134]  [<ffffffff815693a2>] icmp_unreach+0xc2/0x280
      [  251.174820]  [<ffffffff8156a82d>] icmp_rcv+0x2bd/0x3a0
      [  251.175473]  [<ffffffff81534ea2>] ip_local_deliver_finish+0x82/0x1e0
      [  251.176282]  [<ffffffff815354d8>] ip_local_deliver+0x88/0x90
      [  251.177004]  [<ffffffff815350f0>] ip_rcv_finish+0xf0/0x310
      [  251.177693]  [<ffffffff815357bc>] ip_rcv+0x2dc/0x390
      [  251.178336]  [<ffffffff814f5da3>] __netif_receive_skb_core+0x713/0xa20
      [  251.179170]  [<ffffffff814f7fca>] __netif_receive_skb+0x1a/0x80
      [  251.179922]  [<ffffffff814f97d4>] process_backlog+0x94/0x120
      [  251.180639]  [<ffffffff814f9612>] net_rx_action+0x1e2/0x310
      [  251.181356]  [<ffffffff81051267>] __do_softirq+0xa7/0x290
      [  251.182046]  [<ffffffff81051469>] run_ksoftirqd+0x19/0x30
      [  251.182726]  [<ffffffff8106cc23>] smpboot_thread_fn+0x153/0x1d0
      [  251.183485]  [<ffffffff8106cad0>] ? SyS_setgroups+0x130/0x130
      [  251.184228]  [<ffffffff8106935e>] kthread+0xee/0x110
      [  251.184871]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.185690]  [<ffffffff81631108>] ret_from_fork+0x58/0x90
      [  251.186385]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.187216] ---[ end trace c947fc7b24e42ea1 ]---
      [  259.542268] br0: port 1(eth0) entered forwarding state
      
      Remove the double calls to reqsk_put()
      
      [edumazet] :
      
      I got confused because reqsk_timer_handler() _has_ to call
      reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
      the timer handler holds a reference on req.
      Signed-off-by: NFan Du <fan.du@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NErik Hugne <erik.hugne@ericsson.com>
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6973669
    • E
      ipv6: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets · 52036a43
      Eric Dumazet 提交于
      dccp_v6_err() can restrict lookups to ehash table, and not to listeners.
      
      Note this patch creates the infrastructure, but this means that ICMP
      messages for request sockets are ignored until complete conversion.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52036a43
    • E
      ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets · 85645bab
      Eric Dumazet 提交于
      dccp_v4_err() can restrict lookups to ehash table, and not to listeners.
      
      Note this patch creates the infrastructure, but this means that ICMP
      messages for request sockets are ignored until complete conversion.
      
      New dccp_req_err() helper is exported so that we can use it in IPv6
      in following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85645bab
    • E
      inet: remove sk_listener parameter from syn_ack_timeout() · 42cb80a2
      Eric Dumazet 提交于
      It is not needed, and req->sk_listener points to the listener anyway.
      request_sock argument can be const.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42cb80a2
  15. 21 3月, 2015 2 次提交
    • E
      inet: get rid of central tcp/dccp listener timer · fa76ce73
      Eric Dumazet 提交于
      One of the major issue for TCP is the SYNACK rtx handling,
      done by inet_csk_reqsk_queue_prune(), fired by the keepalive
      timer of a TCP_LISTEN socket.
      
      This function runs for awful long times, with socket lock held,
      meaning that other cpus needing this lock have to spin for hundred of ms.
      
      SYNACK are sent in huge bursts, likely to cause severe drops anyway.
      
      This model was OK 15 years ago when memory was very tight.
      
      We now can afford to have a timer per request sock.
      
      Timer invocations no longer need to lock the listener,
      and can be run from all cpus in parallel.
      
      With following patch increasing somaxconn width to 32 bits,
      I tested a listener with more than 4 million active request sockets,
      and a steady SYNFLOOD of ~200,000 SYN per second.
      Host was sending ~830,000 SYNACK per second.
      
      This is ~100 times more what we could achieve before this patch.
      
      Later, we will get rid of the listener hash and use ehash instead.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa76ce73
    • E
      inet: drop prev pointer handling in request sock · 52452c54
      Eric Dumazet 提交于
      When request sock are put in ehash table, the whole notion
      of having a previous request to update dl_next is pointless.
      
      Also, following patch will get rid of big purge timer,
      so we want to delete a request sock without holding listener lock.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52452c54
  16. 19 3月, 2015 3 次提交
  17. 18 3月, 2015 1 次提交
  18. 15 3月, 2015 1 次提交
  19. 13 3月, 2015 2 次提交
  20. 12 3月, 2015 2 次提交
  21. 11 3月, 2015 1 次提交
  22. 03 3月, 2015 1 次提交
  23. 11 12月, 2014 1 次提交
  24. 24 11月, 2014 1 次提交
  25. 19 11月, 2014 3 次提交