1. 03 10月, 2015 2 次提交
  2. 30 9月, 2015 1 次提交
    • E
      tcp: prepare fastopen code for upcoming listener changes · 0536fcc0
      Eric Dumazet 提交于
      While auditing TCP stack for upcoming 'lockless' listener changes,
      I found I had to change fastopen_init_queue() to properly init the object
      before publishing it.
      
      Otherwise an other cpu could try to lock the spinlock before it gets
      properly initialized.
      
      Instead of adding appropriate barriers, just remove dynamic memory
      allocations :
      - Structure is 28 bytes on 64bit arches. Using additional 8 bytes
        for holding a pointer seems overkill.
      - Two listeners can share same cache line and performance would suffer.
      
      If we really want to save few bytes, we would instead dynamically allocate
      whole struct request_sock_queue in the future.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0536fcc0
  3. 11 8月, 2015 1 次提交
    • E
      inet: fix races with reqsk timers · 2235f2ac
      Eric Dumazet 提交于
      reqsk_queue_destroy() and reqsk_queue_unlink() should use
      del_timer_sync() instead of del_timer() before calling reqsk_put(),
      otherwise we could free a req still used by another cpu.
      
      But before doing so, reqsk_queue_destroy() must release syn_wait_lock
      spinlock or risk a dead lock, as reqsk_timer_handler() might
      need to take this same spinlock from reqsk_queue_unlink() (called from
      inet_csk_reqsk_queue_drop())
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2235f2ac
  4. 24 3月, 2015 1 次提交
  5. 21 3月, 2015 1 次提交
    • E
      inet: get rid of central tcp/dccp listener timer · fa76ce73
      Eric Dumazet 提交于
      One of the major issue for TCP is the SYNACK rtx handling,
      done by inet_csk_reqsk_queue_prune(), fired by the keepalive
      timer of a TCP_LISTEN socket.
      
      This function runs for awful long times, with socket lock held,
      meaning that other cpus needing this lock have to spin for hundred of ms.
      
      SYNACK are sent in huge bursts, likely to cause severe drops anyway.
      
      This model was OK 15 years ago when memory was very tight.
      
      We now can afford to have a timer per request sock.
      
      Timer invocations no longer need to lock the listener,
      and can be run from all cpus in parallel.
      
      With following patch increasing somaxconn width to 32 bits,
      I tested a listener with more than 4 million active request sockets,
      and a steady SYNFLOOD of ~200,000 SYN per second.
      Host was sending ~830,000 SYNACK per second.
      
      This is ~100 times more what we could achieve before this patch.
      
      Later, we will get rid of the listener hash and use ehash instead.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa76ce73
  6. 18 3月, 2015 1 次提交
  7. 17 3月, 2015 1 次提交
  8. 26 6月, 2014 1 次提交
  9. 14 2月, 2014 1 次提交
  10. 15 1月, 2013 1 次提交
  11. 01 9月, 2012 1 次提交
    • J
      tcp: TCP Fast Open Server - support TFO listeners · 8336886f
      Jerry Chu 提交于
      This patch builds on top of the previous patch to add the support
      for TFO listeners. This includes -
      
      1. allocating, properly initializing, and managing the per listener
      fastopen_queue structure when TFO is enabled
      
      2. changes to the inet_csk_accept code to support TFO. E.g., the
      request_sock can no longer be freed upon accept(), not until 3WHS
      finishes
      
      3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
      if it's a TFO socket
      
      4. properly closing a TFO listener, and a TFO socket before 3WHS
      finishes
      
      5. supporting TCP_FASTOPEN socket option
      
      6. modifying tcp_check_req() to use to check a TFO socket as well
      as request_sock
      
      7. supporting TCP's TFO cookie option
      
      8. adding a new SYN-ACK retransmit handler to use the timer directly
      off the TFO socket rather than the listener socket. Note that TFO
      server side will not retransmit anything other than SYN-ACK until
      the 3WHS is completed.
      
      The patch also contains an important function
      "reqsk_fastopen_remove()" to manage the somewhat complex relation
      between a listener, its request_sock, and the corresponding child
      socket. See the comment above the function for the detail.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8336886f
  12. 07 12月, 2011 1 次提交
  13. 03 12月, 2010 1 次提交
    • D
      tcp: Add timewait recycling bits to ipv6 connect code. · 493f377d
      David S. Miller 提交于
      This will also improve handling of ipv6 tcp socket request
      backlog when syncookies are not enabled.  When backlog
      becomes very deep, last quarter of backlog is limited to
      validated destinations.  Previously only ipv4 implemented
      this logic, but now ipv6 does too.
      
      Now we are only one step away from enabling timewait
      recycling for ipv6, and that step is simply filling in
      the implementation of tcp_v6_get_peer() and
      tcp_v6_tw_get_peer().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      493f377d
  14. 22 11月, 2010 1 次提交
  15. 26 7月, 2008 1 次提交
  16. 29 1月, 2008 1 次提交
  17. 15 11月, 2007 1 次提交
    • P
      [INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue · dab6ba36
      Pavel Emelyanov 提交于
      The request_sock_queue's listen_opt is either vmalloc-ed or
      kmalloc-ed depending on the number of table entries. Thus it 
      is expected to be handled properly on free, which is done in 
      the reqsk_queue_destroy().
      
      However the error path in inet_csk_listen_start() calls 
      the lite version of reqsk_queue_destroy, called 
      __reqsk_queue_destroy, which calls the kfree unconditionally. 
      
      Fix this and move the __reqsk_queue_destroy into a .c file as 
      it looks too big to be inline.
      
      As David also noticed, this is an error recovery path only,
      so no locking is required and the lopt is known to be not NULL.
      
      reqsk_queue_yank_listen_sk is also now only used in
      net/core/request_sock.c so we should move it there too.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dab6ba36
  18. 03 12月, 2006 1 次提交
    • E
      [NET]: Size listen hash tables using backlog hint · 72a3effa
      Eric Dumazet 提交于
      We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
      each LISTEN socket, regardless of various parameters (listen backlog for
      example)
      
      On x86_64, this means order-1 allocations (might fail), even for 'small'
      sockets, expecting few connections. On the contrary, a huge server wanting a
      backlog of 50000 is slowed down a bit because of this fixed limit.
      
      This patch makes the sizing of listen hash table a dynamic parameter,
      depending of :
      - net.core.somaxconn tunable (default is 128)
      - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
      - backlog value given by user application  (2nd parameter of listen())
      
      For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
      kmalloc().
      
      We still limit memory allocation with the two existing tunables (somaxconn &
      tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
      usage.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72a3effa
  19. 10 4月, 2006 1 次提交
  20. 27 3月, 2006 1 次提交
  21. 28 2月, 2006 1 次提交
  22. 30 8月, 2005 3 次提交
  23. 19 6月, 2005 3 次提交