1. 05 10月, 2010 1 次提交
  2. 04 10月, 2010 8 次提交
  3. 01 10月, 2010 3 次提交
    • E
      ipv4: rcu conversion in ip_route_output_slow · 0197aa38
      Eric Dumazet 提交于
      ip_route_output_slow() is enclosed in an rcu_read_lock() protected
      section, so that no references are taken/released on device, thanks to
      __ip_dev_find() & dev_get_by_index_rcu()
      
      Tested with ip route cache disabled, and a stress test :
      
      Before patch:
      
      elapsed time :
      
      real	1m38.347s
      user	0m11.909s
      sys	23m51.501s
      
      Profile:
      
      13788.00 22.7% ip_route_output_slow [kernel]
       7875.00 13.0% dst_destroy          [kernel]
       3925.00  6.5% fib_semantic_match   [kernel]
       3144.00  5.2% fib_rules_lookup     [kernel]
       3061.00  5.0% dst_alloc            [kernel]
       2276.00  3.7% rt_set_nexthop       [kernel]
       1762.00  2.9% fib_table_lookup     [kernel]
       1538.00  2.5% _raw_read_lock       [kernel]
       1358.00  2.2% ip_output            [kernel]
      
      After patch:
      
      real	1m28.808s
      user	0m13.245s
      sys	20m37.293s
      
      10950.00 17.2% ip_route_output_slow [kernel]
      10726.00 16.9% dst_destroy          [kernel]
       5170.00  8.1% fib_semantic_match   [kernel]
       3937.00  6.2% dst_alloc            [kernel]
       3635.00  5.7% rt_set_nexthop       [kernel]
       2900.00  4.6% fib_rules_lookup     [kernel]
       2240.00  3.5% fib_table_lookup     [kernel]
       1427.00  2.2% _raw_read_lock       [kernel]
       1157.00  1.8% kmem_cache_alloc     [kernel]
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0197aa38
    • E
      ipv4: introduce __ip_dev_find() · 82efee14
      Eric Dumazet 提交于
      ip_dev_find(net, addr) finds a device given an IPv4 source address and
      takes a reference on it.
      
      Introduce __ip_dev_find(), taking a third argument, to optionally take
      the device reference. Callers not asking the reference to be taken
      should be in an rcu_read_lock() protected section.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82efee14
    • E
      ipv4: __mkroute_output() speedup · dd28d1a0
      Eric Dumazet 提交于
      While doing stress tests with a disabled IP route cache, I found
      __mkroute_output() was touching three times in_device atomic refcount.
      
      Use RCU to touch it once to reduce cache line ping pongs.
      
      Before patch
      
      time to perform the test
      real	1m42.009s
      user	0m12.545s
      sys	25m0.726s
      
      Profile :
      
      16109.00 26.4% ip_route_output_slow   vmlinux
       7434.00 12.2% dst_destroy            vmlinux
       3280.00  5.4% fib_rules_lookup       vmlinux
       3252.00  5.3% fib_semantic_match     vmlinux
       2622.00  4.3% fib_table_lookup       vmlinux
       2535.00  4.1% dst_alloc              vmlinux
       1750.00  2.9% _raw_read_lock         vmlinux
       1532.00  2.5% rt_set_nexthop         vmlinux
      
      After patch
      
      real	1m36.503s
      user	0m12.977s
      sys	23m25.608s
      
      14234.00 22.4% ip_route_output_slow   vmlinux
       8717.00 13.7% dst_destroy            vmlinux
       4052.00  6.4% fib_rules_lookup       vmlinux
       3951.00  6.2% fib_semantic_match     vmlinux
       3191.00  5.0% dst_alloc              vmlinux
       1764.00  2.8% fib_table_lookup       vmlinux
       1692.00  2.7% _raw_read_lock         vmlinux
       1605.00  2.5% rt_set_nexthop         vmlinux
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd28d1a0
  4. 30 9月, 2010 6 次提交
  5. 29 9月, 2010 3 次提交
    • T
      ipv4: Allow configuring subnets as local addresses · 4465b469
      Tom Herbert 提交于
      This patch allows a host to be configured to respond to any address in
      a specified range as if it were local, without actually needing to
      configure the address on an interface.  This is done through routing
      table configuration.  For instance, to configure a host to respond
      to any address in 10.1/16 received on eth0 as a local address we can do:
      
      ip rule add from all iif eth0 lookup 200
      ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
      
      This host is now reachable by any 10.1/16 address (route lookup on
      input for packets received on eth0 can find the route).  On output, the
      rule will not be matched so that this host can still send packets to
      10.1/16 (not sent on loopback).  Presumably, external routing can be
      configured to make sense out of this.
      
      To make this work, we needed to modify the logic in finding the
      interface which is assigned a given source address for output
      (dev_ip_find).  We perform a normal fib_lookup instead of just a
      lookup on the local table, and in the lookup we ignore the input
      interface for matching.
      
      This patch is useful to implement IP-anycast for subnets of virtual
      addresses.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4465b469
    • D
      ip_gre: Fix dependencies wrt. ipv6. · 68c1f3a9
      David S. Miller 提交于
      The GRE tunnel driver needs to invoke icmpv6 helpers in the
      ipv6 stack when ipv6 support is enabled.
      
      Therefore if IPV6 is enabled, we have to enforce that GRE's
      enabling (modular or static) matches that of ipv6.
      Reported-by: NPatrick McHardy <kaber@trash.net>
      Reported-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c1f3a9
    • D
      net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out() · 4d22f7d3
      Damian Lukowski 提交于
      Fixes kernel Bugzilla Bug 18952
      
      This patch adds a syn_set parameter to the retransmits_timed_out()
      routine and updates its callers. If not set, TCP_RTO_MIN is taken
      as the calculation basis as before. If set, TCP_TIMEOUT_INIT is
      used instead, so that sysctl_syn_retries represents the actual
      amount of SYN retransmissions in case no SYNACKs are received when
      establishing a new connection.
      Signed-off-by: NDamian Lukowski <damian@tvk.rwth-aachen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d22f7d3
  6. 28 9月, 2010 5 次提交
  7. 27 9月, 2010 1 次提交
  8. 25 9月, 2010 1 次提交
    • E
      ip: take care of last fragment in ip_append_data · 59104f06
      Eric Dumazet 提交于
      While investigating a bit, I found ip_fragment() slow path was taken
      because ip_append_data() provides following layout for a send(MTU +
      N*(MTU - 20)) syscall :
      
      - one skb with 1500 (mtu) bytes
      - N fragments of 1480 (mtu-20) bytes (before adding IP header)
      last fragment gets 17 bytes of trail data because of following bit:
      
      	if (datalen == length + fraggap)
      		alloclen += rt->dst.trailer_len;
      
      Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
      another bug ?)
      
      In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
      so we take slow path, building another skb chain.
      
      In order to avoid taking slow path, we should correct ip_append_data()
      to make sure last fragment has real trail space, under mtu...
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59104f06
  9. 24 9月, 2010 1 次提交
  10. 23 9月, 2010 4 次提交
  11. 22 9月, 2010 1 次提交
    • E
      ip: fix truesize mismatch in ip fragmentation · 3d13008e
      Eric Dumazet 提交于
      Special care should be taken when slow path is hit in ip_fragment() :
      
      When walking through frags, we transfert truesize ownership from skb to
      frags. Then if we hit a slow_path condition, we must undo this or risk
      uncharging frags->truesize twice, and in the end, having negative socket
      sk_wmem_alloc counter, or even freeing socket sooner than expected.
      
      Many thanks to Nick Bowler, who provided a very clean bug report and
      test program.
      
      Thanks to Jarek for reviewing my first patch and providing a V2
      
      While Nick bisection pointed to commit 2b85a34e (net: No more
      expensive sock_hold()/sock_put() on each tx), underlying bug is older
      (2.6.12-rc5)
      
      A side effect is to extend work done in commit b2722b1c
      (ip_fragment: also adjust skb->truesize for packets not owned by a
      socket) to ipv6 as well.
      Reported-and-bisected-by: NNick Bowler <nbowler@elliptictech.com>
      Tested-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d13008e
  12. 21 9月, 2010 4 次提交
  13. 20 9月, 2010 1 次提交
  14. 16 9月, 2010 1 次提交