1. 07 10月, 2015 1 次提交
  2. 03 10月, 2015 8 次提交
  3. 30 9月, 2015 2 次提交
  4. 26 9月, 2015 2 次提交
  5. 22 9月, 2015 1 次提交
  6. 14 8月, 2015 1 次提交
  7. 11 8月, 2015 1 次提交
    • E
      inet: fix races with reqsk timers · 2235f2ac
      Eric Dumazet 提交于
      reqsk_queue_destroy() and reqsk_queue_unlink() should use
      del_timer_sync() instead of del_timer() before calling reqsk_put(),
      otherwise we could free a req still used by another cpu.
      
      But before doing so, reqsk_queue_destroy() must release syn_wait_lock
      spinlock or risk a dead lock, as reqsk_timer_handler() might
      need to take this same spinlock from reqsk_queue_unlink() (called from
      inet_csk_reqsk_queue_drop())
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2235f2ac
  8. 22 5月, 2015 2 次提交
    • E
      tcp: improve REUSEADDR/NOREUSEADDR cohabitation · 946f9eb2
      Eric Dumazet 提交于
      inet_csk_get_port() randomization effort tends to spread
      sockets on all the available range (ip_local_port_range)
      
      This is unfortunate because SO_REUSEADDR sockets have
      less requirements than non SO_REUSEADDR ones.
      
      If an application uses SO_REUSEADDR hint, it is to try to
      allow source ports being shared.
      
      So instead of picking a random port number in ip_local_port_range,
      lets try first in first half of the range.
      
      This gives more chances to use upper half of the range for the
      sockets with strong requirements (not using SO_REUSEADDR)
      
      Note this patch does not add a new sysctl, and only changes
      the way we try to pick port number.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Cc: Flavio Leitner <fbl@redhat.com>
      Acked-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      946f9eb2
    • E
      inet_hashinfo: remove bsocket counter · f5af1f57
      Eric Dumazet 提交于
      We no longer need bsocket atomic counter, as inet_csk_get_port()
      calls bind_conflict() regardless of its value, after commit
      2b05ad33 ("tcp: bind() fix autoselection to share ports")
      
      This patch removes overhead of maintaining this counter and
      double inet_csk_get_port() calls under pressure.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Cc: Flavio Leitner <fbl@redhat.com>
      Acked-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5af1f57
  9. 24 4月, 2015 1 次提交
    • E
      inet: fix possible panic in reqsk_queue_unlink() · b357a364
      Eric Dumazet 提交于
      [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
       0000000000000080
      [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243
      
      There is a race when reqsk_timer_handler() and tcp_check_req() call
      inet_csk_reqsk_queue_unlink() on the same req at the same time.
      
      Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener
      timer"), listener spinlock was held and race could not happen.
      
      To solve this bug, we change reqsk_queue_unlink() to not assume req
      must be found, and we return a status, to conditionally release a
      refcount on the request sock.
      
      This also means tcp_check_req() in non fastopen case might or not
      consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
      to properly handle this.
      
      (Same remark for dccp_check_req() and its callers)
      
      inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
      called 4 times in tcp and 3 times in dccp.
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b357a364
  10. 04 4月, 2015 1 次提交
  11. 24 3月, 2015 4 次提交
  12. 21 3月, 2015 2 次提交
    • E
      inet: get rid of central tcp/dccp listener timer · fa76ce73
      Eric Dumazet 提交于
      One of the major issue for TCP is the SYNACK rtx handling,
      done by inet_csk_reqsk_queue_prune(), fired by the keepalive
      timer of a TCP_LISTEN socket.
      
      This function runs for awful long times, with socket lock held,
      meaning that other cpus needing this lock have to spin for hundred of ms.
      
      SYNACK are sent in huge bursts, likely to cause severe drops anyway.
      
      This model was OK 15 years ago when memory was very tight.
      
      We now can afford to have a timer per request sock.
      
      Timer invocations no longer need to lock the listener,
      and can be run from all cpus in parallel.
      
      With following patch increasing somaxconn width to 32 bits,
      I tested a listener with more than 4 million active request sockets,
      and a steady SYNFLOOD of ~200,000 SYN per second.
      Host was sending ~830,000 SYNACK per second.
      
      This is ~100 times more what we could achieve before this patch.
      
      Later, we will get rid of the listener hash and use ehash instead.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa76ce73
    • E
      inet: drop prev pointer handling in request sock · 52452c54
      Eric Dumazet 提交于
      When request sock are put in ehash table, the whole notion
      of having a previous request to update dl_next is pointless.
      
      Also, following patch will get rid of big purge timer,
      so we want to delete a request sock without holding listener lock.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52452c54
  13. 18 3月, 2015 3 次提交
    • E
      inet: avoid fastopen lock for regular accept() · e3d95ad7
      Eric Dumazet 提交于
      It is not because a TCP listener is FastOpen ready that
      all incoming sockets actually used FastOpen.
      
      Avoid taking queue->fastopenq->lock if not needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3d95ad7
    • E
      tcp: rename struct tcp_request_sock listener · 9439ce00
      Eric Dumazet 提交于
      The listener field in struct tcp_request_sock is a pointer
      back to the listener. We now have req->rsk_listener, so TCP
      only needs one boolean and not a full pointer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9439ce00
    • E
      inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() · cb7cf8a3
      Eric Dumazet 提交于
      I got the following trace with current net-next kernel :
      
      [14723.885290] WARNING: CPU: 26 PID: 22658 at kernel/sched/core.c:7285 __might_sleep+0x89/0xa0()
      [14723.885325] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810e8734>] prepare_to_wait_exclusive+0x34/0xa0
      [14723.885355] CPU: 26 PID: 22658 Comm: netserver Not tainted 4.0.0-dbg-DEV #1379
      [14723.885359]  ffffffff81a223a8 ffff881fae9e7ca8 ffffffff81650b5d 0000000000000001
      [14723.885364]  ffff881fae9e7cf8 ffff881fae9e7ce8 ffffffff810a72e7 0000000000000000
      [14723.885367]  ffffffff81a57620 000000000000093a 0000000000000000 ffff881fae9e7e64
      [14723.885371] Call Trace:
      [14723.885377]  [<ffffffff81650b5d>] dump_stack+0x4c/0x65
      [14723.885382]  [<ffffffff810a72e7>] warn_slowpath_common+0x97/0xe0
      [14723.885386]  [<ffffffff810a73e6>] warn_slowpath_fmt+0x46/0x50
      [14723.885390]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885393]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885396]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885399]  [<ffffffff810ccdc9>] __might_sleep+0x89/0xa0
      [14723.885403]  [<ffffffff81581846>] lock_sock_nested+0x36/0xb0
      [14723.885406]  [<ffffffff815829a3>] ? release_sock+0x173/0x1c0
      [14723.885411]  [<ffffffff815ea1f7>] inet_csk_accept+0x157/0x2a0
      [14723.885415]  [<ffffffff810e8900>] ? abort_exclusive_wait+0xc0/0xc0
      [14723.885419]  [<ffffffff8161b96d>] inet_accept+0x2d/0x150
      [14723.885424]  [<ffffffff8157db6f>] SYSC_accept4+0xff/0x210
      [14723.885428]  [<ffffffff8165a451>] ? retint_swapgs+0xe/0x44
      [14723.885431]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885437]  [<ffffffff81369c0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [14723.885441]  [<ffffffff8157ef40>] SyS_accept+0x10/0x20
      [14723.885444]  [<ffffffff81659872>] system_call_fastpath+0x12/0x17
      [14723.885447] ---[ end trace ff74cd83355b1873 ]---
      
      In commit 26cabd31
      Peter added a sched_annotate_sleep() in sk_wait_event()
      
      Is the following patch needed as well ?
      
      Alternative would be to use sk_wait_event() from inet_csk_wait_for_connect()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb7cf8a3
  14. 17 3月, 2015 1 次提交
  15. 12 3月, 2015 1 次提交
    • E
      net: add real socket cookies · 33cf7c90
      Eric Dumazet 提交于
      A long standing problem in netlink socket dumps is the use
      of kernel socket addresses as cookies.
      
      1) It is a security concern.
      
      2) Sockets can be reused quite quickly, so there is
         no guarantee a cookie is used once and identify
         a flow.
      
      3) request sock, establish sock, and timewait socks
         for a given flow have different cookies.
      
      Part of our effort to bring better TCP statistics requires
      to switch to a different allocator.
      
      In this patch, I chose to use a per network namespace 64bit generator,
      and to use it only in the case a socket needs to be dumped to netlink.
      (This might be refined later if needed)
      
      Note that I tried to carry cookies from request sock, to establish sock,
      then timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric Salo <salo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf7c90
  16. 15 5月, 2014 1 次提交
  17. 14 5月, 2014 1 次提交
    • L
      net: support marking accepting TCP sockets · 84f39b08
      Lorenzo Colitti 提交于
      When using mark-based routing, sockets returned from accept()
      may need to be marked differently depending on the incoming
      connection request.
      
      This is the case, for example, if different socket marks identify
      different networks: a listening socket may want to accept
      connections from all networks, but each connection should be
      marked with the network that the request came in on, so that
      subsequent packets are sent on the correct network.
      
      This patch adds a sysctl to mark TCP sockets based on the fwmark
      of the incoming SYN packet. If enabled, and an unmarked socket
      receives a SYN, then the SYN packet's fwmark is written to the
      connection's inet_request_sock, and later written back to the
      accepted socket when the connection is established.  If the
      socket already has a nonzero mark, then the behaviour is the same
      as it is today, i.e., the listening socket's fwmark is used.
      
      Black-box tested using user-mode linux:
      
      - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
        mark of the incoming SYN packet.
      - The socket returned by accept() is marked with the mark of the
        incoming SYN packet.
      - Tested with syncookies=1 and syncookies=2.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84f39b08
  18. 09 5月, 2014 1 次提交
  19. 15 1月, 2014 1 次提交
  20. 11 10月, 2013 1 次提交
  21. 10 10月, 2013 1 次提交
    • E
      inet: includes a sock_common in request_sock · 634fb979
      Eric Dumazet 提交于
      TCP listener refactoring, part 5 :
      
      We want to be able to insert request sockets (SYN_RECV) into main
      ehash table instead of the per listener hash table to allow RCU
      lookups and remove listener lock contention.
      
      This patch includes the needed struct sock_common in front
      of struct request_sock
      
      This means there is no more inet6_request_sock IPv6 specific
      structure.
      
      Following inet_request_sock fields were renamed as they became
      macros to reference fields from struct sock_common.
      Prefix ir_ was chosen to avoid name collisions.
      
      loc_port   -> ir_loc_port
      loc_addr   -> ir_loc_addr
      rmt_addr   -> ir_rmt_addr
      rmt_port   -> ir_rmt_port
      iif        -> ir_iif
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634fb979
  22. 04 10月, 2013 1 次提交
    • E
      inet: consolidate INET_TW_MATCH · 50805466
      Eric Dumazet 提交于
      TCP listener refactoring, part 2 :
      
      We can use a generic lookup, sockets being in whatever state, if
      we are sure all relevant fields are at the same place in all socket
      types (ESTABLISH, TIME_WAIT, SYN_RECV)
      
      This patch removes these macros :
      
       inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
      
      And adds :
      
       sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
      
      Then, INET_TW_MATCH() is really the same than INET_MATCH()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50805466
  23. 01 10月, 2013 1 次提交
  24. 18 3月, 2013 1 次提交
    • C
      tcp: Remove TCPCT · 1a2c6181
      Christoph Paasch 提交于
      TCPCT uses option-number 253, reserved for experimental use and should
      not be used in production environments.
      Further, TCPCT does not fully implement RFC 6013.
      
      As a nice side-effect, removing TCPCT increases TCP's performance for
      very short flows:
      
      Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
      for files of 1KB size.
      
      before this patch:
      	average (among 7 runs) of 20845.5 Requests/Second
      after:
      	average (among 7 runs) of 21403.6 Requests/Second
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2c6181