1. 18 3月, 2013 1 次提交
    • C
      tcp: Remove TCPCT · 1a2c6181
      Christoph Paasch 提交于
      TCPCT uses option-number 253, reserved for experimental use and should
      not be used in production environments.
      Further, TCPCT does not fully implement RFC 6013.
      
      As a nice side-effect, removing TCPCT increases TCP's performance for
      very short flows:
      
      Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
      for files of 1KB size.
      
      before this patch:
      	average (among 7 runs) of 20845.5 Requests/Second
      after:
      	average (among 7 runs) of 21403.6 Requests/Second
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2c6181
  2. 19 2月, 2013 2 次提交
  3. 12 1月, 2013 2 次提交
  4. 15 12月, 2012 1 次提交
    • C
      inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock · e337e24d
      Christoph Paasch 提交于
      If in either of the above functions inet_csk_route_child_sock() or
      __inet_inherit_port() fails, the newsk will not be freed:
      
      unreferenced object 0xffff88022e8a92c0 (size 1592):
        comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
        hex dump (first 32 bytes):
          0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
          02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
          [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
          [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
          [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
          [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
          [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
          [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
          [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
          [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
          [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
          [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
          [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
          [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
          [<ffffffff814cee68>] ip_rcv+0x217/0x267
          [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
          [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82
      
      This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
      a single sock_put() is not enough to free the memory. Additionally, things
      like xfrm, memcg, cookie_values,... may have been initialized.
      We have to free them properly.
      
      This is fixed by forcing a call to tcp_done(), ending up in
      inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
      because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
      xfrm,...
      
      Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
      force it entering inet_csk_destroy_sock. To avoid the warning in
      inet_csk_destroy_sock, inet_num has to be set to 0.
      As inet_csk_destroy_sock does a dec on orphan_count, we first have to
      increase it.
      
      Calling tcp_done() allows us to remove the calls to
      tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().
      
      A similar approach is taken for dccp by calling dccp_done().
      
      This is in the kernel since 093d2823 (tproxy: fix hash locking issue
      when using port redirection in __inet_inherit_port()), thus since
      version >= 2.6.37.
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e337e24d
  5. 04 11月, 2012 1 次提交
    • E
      tcp: better retrans tracking for defer-accept · e6c022a4
      Eric Dumazet 提交于
      For passive TCP connections using TCP_DEFER_ACCEPT facility,
      we incorrectly increment req->retrans each time timeout triggers
      while no SYNACK is sent.
      
      SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
      which we received the ACK from client). Only the last SYNACK is sent
      so that we can receive again an ACK from client, to move the req into
      accept queue. We plan to change this later to avoid the useless
      retransmit (and potential problem as this SYNACK could be lost)
      
      TCP_INFO later gives wrong information to user, claiming imaginary
      retransmits.
      
      Decouple req->retrans field into two independent fields :
      
      num_retrans : number of retransmit
      num_timeout : number of timeouts
      
      num_timeout is the counter that is incremented at each timeout,
      regardless of actual SYNACK being sent or not, and used to
      compute the exponential timeout.
      
      Introduce inet_rtx_syn_ack() helper to increment num_retrans
      only if ->rtx_syn_ack() succeeded.
      
      Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
      when we re-send a SYNACK in answer to a (retransmitted) SYN.
      Prior to this patch, we were not counting these retransmits.
      
      Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
      only if a synack packet was successfully queued.
      Reported-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6c022a4
  6. 16 8月, 2012 2 次提交
  7. 24 7月, 2012 1 次提交
    • D
      ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b
      David S. Miller 提交于
      Use inet_iif() consistently, and for TCP record the input interface of
      cached RX dst in inet sock.
      
      rt->rt_iif is going to be encoded differently, so that we can
      legitimately cache input routes in the FIB info more aggressively.
      
      When the input interface is "use SKB device index" the rt->rt_iif will
      be set to zero.
      
      This forces us to move the TCP RX dst cache installation into the ipv4
      specific code, and as well it should since doing the route caching for
      ipv6 is pointless at the moment since it is not inspected in the ipv6
      input paths yet.
      
      Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
      obsolete set to a non-zero value to force invocation of the check
      callback.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92101b3b
  8. 21 7月, 2012 1 次提交
  9. 17 7月, 2012 1 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
  10. 16 7月, 2012 2 次提交
  11. 12 7月, 2012 3 次提交
  12. 11 7月, 2012 1 次提交
  13. 05 7月, 2012 1 次提交
  14. 23 6月, 2012 1 次提交
  15. 16 6月, 2012 1 次提交
    • D
      ipv6: Handle PMTU in ICMP error handlers. · 81aded24
      David S. Miller 提交于
      One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
      to handle the error pass the 32-bit info cookie in network byte order
      whereas ipv4 passes it around in host byte order.
      
      Like the ipv4 side, we have two helper functions.  One for when we
      have a socket context and one for when we do not.
      
      ip6ip6 tunnels are not handled here, because they handle PMTU events
      by essentially relaying another ICMP packet-too-big message back to
      the original sender.
      
      This patch allows us to get rid of rt6_do_pmtu_disc().  It handles all
      kinds of situations that simply cannot happen when we do the PMTU
      update directly using a fully resolved route.
      
      In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
      likely be removed or changed into a BUG_ON() check.  We should never
      have a prefixed ipv6 route when we get there.
      
      Another piece of strange history here is that TCP and DCCP, unlike in
      ipv4, never invoke the update_pmtu() method from their ICMP error
      handlers.  This is incredibly astonishing since this is the context
      where we have the most accurate context in which to make a PMTU
      update, namely we have a fully connected socket and associated cached
      socket route.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81aded24
  16. 17 5月, 2012 1 次提交
  17. 21 4月, 2012 2 次提交
  18. 20 4月, 2012 1 次提交
  19. 16 4月, 2012 1 次提交
  20. 15 4月, 2012 1 次提交
  21. 04 3月, 2012 2 次提交
  22. 12 1月, 2012 1 次提交
  23. 20 12月, 2011 2 次提交
  24. 17 12月, 2011 1 次提交
  25. 12 12月, 2011 1 次提交
  26. 10 12月, 2011 2 次提交
  27. 07 12月, 2011 2 次提交
  28. 02 12月, 2011 2 次提交
    • D
      dccp: Fix compile warning in probe code. · d984e619
      David S. Miller 提交于
      Commit 1386be55 ("dccp: fix
      auto-loading of dccp(_probe)") fixed a bug but created a new
      compiler warning:
      
      net/dccp/probe.c: In function ‘dccpprobe_init’:
      net/dccp/probe.c:166:2: warning: the omitted middle operand in ?: will always be ‘true’, suggest explicit middle operand [-Wparentheses]
      
      try_then_request_module() is built for situations where the
      "existence" test is some lookup function that returns a non-NULL
      object on success, and with a reference count of some kind held.
      
      Here we're looking for a success return of zero from the jprobe
      registry.
      
      Instead of fighting the way try_then_request_module() works, simply
      open code what we want to happen in a local helper function.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d984e619
    • D
      dccp: Evaluate ip_hdr() only once in dccp_v4_route_skb(). · 898f7358
      David S. Miller 提交于
      This also works around a bogus gcc warning generated by an
      upcoming patch from Eric Dumazet that rearranges the layout
      of struct flowi4.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      898f7358