1. 29 4月, 2011 2 次提交
    • D
      ipv4: Get route daddr from flow key in inet_csk_route_req(). · 072d8c94
      David S. Miller 提交于
      Now that output route lookups update the flow with
      destination address selection, we can fetch it from
      fl4->daddr instead of rt->rt_dst
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      072d8c94
    • E
      inet: add RCU protection to inet->opt · f6d8bd05
      Eric Dumazet 提交于
      We lack proper synchronization to manipulate inet->opt ip_options
      
      Problem is ip_make_skb() calls ip_setup_cork() and
      ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
      without any protection against another thread manipulating inet->opt.
      
      Another thread can change inet->opt pointer and free old one under us.
      
      Use RCU to protect inet->opt (changed to inet->inet_opt).
      
      Instead of handling atomic refcounts, just copy ip_options when
      necessary, to avoid cache line dirtying.
      
      We cant insert an rcu_head in struct ip_options since its included in
      skb->cb[], so this patch is large because I had to introduce a new
      ip_options_rcu structure.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6d8bd05
  2. 14 4月, 2011 1 次提交
  3. 31 3月, 2011 1 次提交
  4. 13 3月, 2011 4 次提交
  5. 03 3月, 2011 1 次提交
  6. 02 3月, 2011 2 次提交
  7. 12 1月, 2011 1 次提交
  8. 10 12月, 2010 1 次提交
    • E
      net: optimize INET input path further · 68835aba
      Eric Dumazet 提交于
      Followup of commit b178bb3d (net: reorder struct sock fields)
      
      Optimize INET input path a bit further, by :
      
      1) moving sk_refcnt close to sk_lock.
      
      This reduces number of dirtied cache lines by one on 64bit arches (and
      64 bytes cache line size).
      
      2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
      
      (same cache line than hash / family / bound_dev_if / nulls_node)
      
      This reduces number of accessed cache lines in lookups by one, and dont
      increase size of inet and timewait socks.
      inet and tw sockets now share same place-holder for these fields.
      
      Before patch :
      
      offsetof(struct sock, sk_refcnt) = 0x10
      offsetof(struct sock, sk_lock) = 0x40
      offsetof(struct sock, sk_receive_queue) = 0x60
      offsetof(struct inet_sock, inet_daddr) = 0x270
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x274
      
      After patch :
      
      offsetof(struct sock, sk_refcnt) = 0x44
      offsetof(struct sock, sk_lock) = 0x48
      offsetof(struct sock, sk_receive_queue) = 0x68
      offsetof(struct inet_sock, inet_daddr) = 0x0
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x4
      
      compute_score() (udp or tcp) now use a single cache line per ignored
      item, instead of two.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68835aba
  9. 18 11月, 2010 1 次提交
  10. 13 7月, 2010 1 次提交
  11. 11 6月, 2010 1 次提交
  12. 16 5月, 2010 1 次提交
  13. 29 4月, 2010 1 次提交
  14. 23 4月, 2010 1 次提交
  15. 21 4月, 2010 1 次提交
  16. 18 1月, 2010 1 次提交
    • O
      tcp: account SYN-ACK timeouts & retransmissions · 72659ecc
      Octavian Purdila 提交于
      Currently we don't increment SYN-ACK timeouts & retransmissions
      although we do increment the same stats for SYN. We seem to have lost
      the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
      (commit 2248761e in the netdev-vger-cvs tree).
      
      This patch fixes this issue. In the process we also rename the v4/v6
      syn/ack retransmit functions for clarity. We also add a new
      request_socket operations (syn_ack_timeout) so we can keep code in
      inet_connection_sock.c protocol agnostic.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72659ecc
  17. 03 12月, 2009 1 次提交
  18. 26 11月, 2009 1 次提交
  19. 20 10月, 2009 1 次提交
  20. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  21. 08 10月, 2009 1 次提交
  22. 01 10月, 2009 1 次提交
  23. 02 2月, 2009 1 次提交
  24. 01 2月, 2009 1 次提交
    • S
      inet: Fix virt-manager regression due to bind(0) changes. · 5add3009
      Stephen Hemminger 提交于
      From: Stephen Hemminger <shemminger@vyatta.com>
      
      Fix regression introduced by a9d8f911
      ("inet: Allowing more than 64k connections and heavily optimize
      bind(0) time.")
      
      Based upon initial patches and feedback from Evegniy Polyakov and
      Eric Dumazet.
      
      From Eric Dumazet:
      --------------------
      Also there might be a problem at line 175
      
      if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) { 
      	spin_unlock(&head->lock);
      	goto again;
      
      If we entered inet_csk_get_port() with a non null snum, we can "goto again"
      while it was not expected.
      --------------------
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5add3009
  25. 22 1月, 2009 1 次提交
    • E
      inet: Allowing more than 64k connections and heavily optimize bind(0) time. · a9d8f911
      Evgeniy Polyakov 提交于
      With simple extension to the binding mechanism, which allows to bind more
      than 64k sockets (or smaller amount, depending on sysctl parameters),
      we have to traverse the whole bind hash table to find out empty bucket.
      And while it is not a problem for example for 32k connections, bind()
      completion time grows exponentially (since after each successful binding
      we have to traverse one bucket more to find empty one) even if we start
      each time from random offset inside the hash table.
      
      So, when hash table is full, and we want to add another socket, we have
      to traverse the whole table no matter what, so effectivelly this will be
      the worst case performance and it will be constant.
      
      Attached picture shows bind() time depending on number of already bound
      sockets.
      
      Green area corresponds to the usual binding to zero port process, which
      turns on kernel port selection as described above. Red area is the bind
      process, when number of reuse-bound sockets is not limited by 64k (or
      sysctl parameters). The same exponential growth (hidden by the green
      area) before number of ports reaches sysctl limit.
      
      At this time bind hash table has exactly one reuse-enbaled socket in a
      bucket, but it is possible that they have different addresses. Actually
      kernel selects the first port to try randomly, so at the beginning bind
      will take roughly constant time, but with time number of port to check
      after random start will increase. And that will have exponential growth,
      but because of above random selection, not every next port selection
      will necessary take longer time than previous. So we have to consider
      the area below in the graph (if you could zoom it, you could find, that
      there are many different times placed there), so area can hide another.
      
      Blue area corresponds to the port selection optimization.
      
      This is rather simple design approach: hashtable now maintains (unprecise
      and racely updated) number of currently bound sockets, and when number
      of such sockets becomes greater than predefined value (I use maximum
      port range defined by sysctls), we stop traversing the whole bind hash
      table and just stop at first matching bucket after random start. Above
      limit roughly corresponds to the case, when bind hash table is full and
      we turned on mechanism of allowing to bind more reuse-enabled sockets,
      so it does not change behaviour of other sockets.
      Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Tested-by: NDenys Fedoryschenko <denys@visp.net.lb>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9d8f911
  26. 30 12月, 2008 1 次提交
  27. 15 12月, 2008 1 次提交
  28. 02 12月, 2008 1 次提交
  29. 26 11月, 2008 1 次提交
  30. 12 11月, 2008 1 次提交
  31. 03 11月, 2008 1 次提交
  32. 09 10月, 2008 1 次提交
    • E
      inet: cleanup of local_port_range · 3c689b73
      Eric Dumazet 提交于
      I noticed sysctl_local_port_range[] and its associated seqlock
      sysctl_local_port_range_lock were on separate cache lines.
      Moreover, sysctl_local_port_range[] was close to unrelated
      variables, highly modified, leading to cache misses.
      
      Moving these two variables in a structure can help data
      locality and moving this structure to read_mostly section
      helps sharing of this data among cpus.
      
      Cleanup of extern declarations (moved in include file where
      they belong), and use of inet_get_local_port_range()
      accessor instead of direct access to ports values.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c689b73
  33. 01 10月, 2008 2 次提交
  34. 26 7月, 2008 1 次提交