1. 12 10月, 2010 1 次提交
    • E
      net dst: use a percpu_counter to track entries · fc66f95c
      Eric Dumazet 提交于
      struct dst_ops tracks number of allocated dst in an atomic_t field,
      subject to high cache line contention in stress workload.
      
      Switch to a percpu_counter, to reduce number of time we need to dirty a
      central location. Place it on a separate cache line to avoid dirtying
      read only fields.
      
      Stress test :
      
      (Sending 160.000.000 UDP frames,
      IP route cache disabled, dual E5540 @2.53GHz,
      32bit kernel, FIB_TRIE, SLUB/NUMA)
      
      Before:
      
      real    0m51.179s
      user    0m15.329s
      sys     10m15.942s
      
      After:
      
      real	0m45.570s
      user	0m15.525s
      sys	9m56.669s
      
      With a small reordering of struct neighbour fields, subject of a
      following patch, (to separate refcnt from other read mostly fields)
      
      real	0m41.841s
      user	0m15.261s
      sys	8m45.949s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc66f95c
  2. 06 10月, 2010 2 次提交
    • E
      net neigh: RCU conversion of neigh hash table · d6bf7817
      Eric Dumazet 提交于
      David
      
      This is the first step for RCU conversion of neigh code.
      
      Next patches will convert hash_buckets[] and "struct neighbour" to RCU
      protected objects.
      
      Thanks
      
      [PATCH net-next] net neigh: RCU conversion of neigh hash table
      
      Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
      neigh_table", a new structure is defined :
      
      struct neigh_hash_table {
             struct neighbour        **hash_buckets;
             unsigned int            hash_mask;
             __u32                   hash_rnd;
             struct rcu_head         rcu;
      };
      
      And "struct neigh_table" has an RCU protected pointer to such a
      neigh_hash_table.
      
      This means the signature of (*hash)() function changed: We need to add a
      third parameter with the actual hash_rnd value, since this is not
      anymore a neigh_table field.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf7817
    • E
      net: add a core netdev->rx_dropped counter · caf586e5
      Eric Dumazet 提交于
      In various situations, a device provides a packet to our stack and we
      drop it before it enters protocol stack :
      - softnet backlog full (accounted in /proc/net/softnet_stat)
      - bad vlan tag (not accounted)
      - unknown/unregistered protocol (not accounted)
      
      We can handle a per-device counter of such dropped frames at core level,
      and automatically adds it to the device provided stats (rx_dropped), so
      that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
      
      This is a generalization of commit 8990f468 (net: rx_dropped
      accounting), thus reverting it.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caf586e5
  3. 05 10月, 2010 1 次提交
  4. 04 10月, 2010 1 次提交
  5. 30 9月, 2010 3 次提交
    • E
      ip6tnl: percpu stats accounting · 8560f226
      Eric Dumazet 提交于
      Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
      
      Other seldom used fields are kept in netdev->stats structure, possibly
      unsafe.
      
      This is a preliminary work to support lockless transmit path, and
      correct RX stats, that are already unsafe.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8560f226
    • E
      sit: enable lockless xmits · 8df40d10
      Eric Dumazet 提交于
      SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX
      
      Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
      10000000 UDP frames via one sit tunnel (size:220 bytes per frame)
      
      Before patch :
      
      real	3m15.399s
      user	0m9.185s
      sys	51m55.403s
      
      75029.00 87.5% _raw_spin_lock            vmlinux
       1090.00  1.3% dst_release               vmlinux
        902.00  1.1% dev_queue_xmit            vmlinux
        627.00  0.7% sock_wfree                vmlinux
        613.00  0.7% ip6_push_pending_frames   ipv6.ko
        505.00  0.6% __ip_route_output_key     vmlinux
      
      After patch:
      
      real	1m1.387s
      user	0m12.489s
      sys	15m58.868s
      
      28239.00 23.3% dst_release               vmlinux
      13570.00 11.2% ip6_push_pending_frames   ipv6.ko
      13118.00 10.8% ip6_append_data           ipv6.ko
       7995.00  6.6% __ip_route_output_key     vmlinux
       7924.00  6.5% sk_dst_check              vmlinux
       5015.00  4.1% udpv6_sendmsg             ipv6.ko
       3594.00  3.0% sock_alloc_send_pskb      vmlinux
       3135.00  2.6% sock_wfree                vmlinux
       3055.00  2.5% ip6_sk_dst_lookup         ipv6.ko
       2473.00  2.0% ip_finish_output          vmlinux
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8df40d10
    • E
      sit: fix percpu stats accounting · dd4080ee
      Eric Dumazet 提交于
      commit 15fc1f70 (sit: percpu stats accounting) forgot the fallback
      tunnel case (sit0), and can crash pretty fast.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd4080ee
  6. 29 9月, 2010 1 次提交
    • M
      ipv6: Implement Any-IP support for IPv6. · ab79ad14
      Maciej Żenczykowski 提交于
      AnyIP is the capability to receive packets and establish incoming
      connections on IPs we have not explicitly configured on the machine.
      
      An example use case is to configure a machine to accept all incoming
      traffic on eth0, and leave the policy of whether traffic for a given IP
      should be delivered to the machine up to the load balancer.
      
      Can be setup as follows:
        ip -6 rule from all iif eth0 lookup 200
        ip -6 route add local default dev lo table 200
      (in this case for all IPv6 addresses)
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab79ad14
  7. 28 9月, 2010 2 次提交
  8. 27 9月, 2010 1 次提交
    • N
      ipv6: add a missing unregister_pernet_subsys call · 2cc6d2bf
      Neil Horman 提交于
      Clean up a missing exit path in the ipv6 module init routines.  In
      addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
      for the ipv6_addr_label_ops structure.  But if module loading fails, or if the
      ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
      which leaves a now-bogus address on the pernet_list, leading to oopses in
      subsequent registrations.  This patch cleans up both the failed load path and
      the unload path.  Tested by myself with good results.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/net/addrconf.h |    1 +
       net/ipv6/addrconf.c    |   11 ++++++++---
       net/ipv6/addrlabel.c   |    5 +++++
       3 files changed, 14 insertions(+), 3 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cc6d2bf
  9. 24 9月, 2010 1 次提交
  10. 22 9月, 2010 1 次提交
    • E
      ip: fix truesize mismatch in ip fragmentation · 3d13008e
      Eric Dumazet 提交于
      Special care should be taken when slow path is hit in ip_fragment() :
      
      When walking through frags, we transfert truesize ownership from skb to
      frags. Then if we hit a slow_path condition, we must undo this or risk
      uncharging frags->truesize twice, and in the end, having negative socket
      sk_wmem_alloc counter, or even freeing socket sooner than expected.
      
      Many thanks to Nick Bowler, who provided a very clean bug report and
      test program.
      
      Thanks to Jarek for reviewing my first patch and providing a V2
      
      While Nick bisection pointed to commit 2b85a34e (net: No more
      expensive sock_hold()/sock_put() on each tx), underlying bug is older
      (2.6.12-rc5)
      
      A side effect is to extend work done in commit b2722b1c
      (ip_fragment: also adjust skb->truesize for packets not owned by a
      socket) to ipv6 as well.
      Reported-and-bisected-by: NNick Bowler <nbowler@elliptictech.com>
      Tested-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d13008e
  11. 21 9月, 2010 2 次提交
  12. 17 9月, 2010 1 次提交
  13. 16 9月, 2010 1 次提交
  14. 10 9月, 2010 1 次提交
  15. 09 9月, 2010 2 次提交
    • E
      udp: add rehash on connect() · 719f8358
      Eric Dumazet 提交于
      commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
      added a secondary hash on UDP, hashed on (local addr, local port).
      
      Problem is that following sequence :
      
      fd = socket(...)
      connect(fd, &remote, ...)
      
      not only selects remote end point (address and port), but also sets
      local address, while UDP stack stored in secondary hash table the socket
      while its local address was INADDR_ANY (or ipv6 equivalent)
      
      Sequence is :
       - autobind() : choose a random local port, insert socket in hash tables
                    [while local address is INADDR_ANY]
       - connect() : set remote address and port, change local address to IP
                    given by a route lookup.
      
      When an incoming UDP frame comes, if more than 10 sockets are found in
      primary hash table, we switch to secondary table, and fail to find
      socket because its local address changed.
      
      One solution to this problem is to rehash datagram socket if needed.
      
      We add a new rehash(struct socket *) method in "struct proto", and
      implement this method for UDP v4 & v6, using a common helper.
      
      This rehashing only takes care of secondary hash table, since primary
      hash (based on local port only) is not changed.
      Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      719f8358
    • E
      net: inet_add_protocol() can use cmpxchg() · e0386005
      Eric Dumazet 提交于
      Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and
      friends.
      
      inet_protos[] & inet6_protos[] are moved to read_mostly section
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0386005
  16. 08 9月, 2010 2 次提交
  17. 04 9月, 2010 2 次提交
  18. 02 9月, 2010 1 次提交
  19. 31 8月, 2010 2 次提交
  20. 24 8月, 2010 1 次提交
  21. 23 8月, 2010 1 次提交
  22. 18 8月, 2010 1 次提交
  23. 15 8月, 2010 1 次提交
  24. 02 8月, 2010 1 次提交
  25. 23 7月, 2010 2 次提交
  26. 19 7月, 2010 1 次提交
    • A
      IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input()) · d9a9dc66
      Arnaud Ebalard 提交于
      The input handler for Type 2 Routing Header (mip6_rthdr_input())
      checks if the CoA in the packet matches the CoA in the XFRM state.
      
      Current check is buggy: it compares the adddress in the Type 2
      Routing Header, i.e. the HoA, against the expected CoA in the state.
      The comparison should be made against the address in the destination
      field of the IPv6 header.
      
      The bug remained unnoticed because the main (and possibly only current)
      user of the code (UMIP MIPv6 Daemon) initializes the XFRM state with the
      unspecified address, i.e. explicitly allows everything.
      
      Yoshifuji-san, can you ack that one?
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9a9dc66
  27. 13 7月, 2010 1 次提交
  28. 12 7月, 2010 1 次提交
  29. 05 7月, 2010 2 次提交