1. 08 10月, 2009 1 次提交
    • E
      udp: dynamically size hash tables at boot time · f86dcc5a
      Eric Dumazet 提交于
      UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
      several setups.
      
      4000 active UDP sockets -> 32 sockets per chain in average. An
      incoming frame has to lookup all sockets to find best match, so long
      chains hurt latency.
      
      Instead of a fixed size hash table that cant be perfect for every
      needs, let UDP stack choose its table size at boot time like tcp/ip
      route, using alloc_large_system_hash() helper
      
      Add an optional boot parameter, uhash_entries=x so that an admin can
      force a size between 256 and 65536 if needed, like thash_entries and
      rhash_entries.
      
      dmesg logs two new lines :
      [    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
      [    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)
      
      Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
      debugging spinlocks.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f86dcc5a
  2. 15 9月, 2009 1 次提交
  3. 29 10月, 2008 2 次提交
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
  4. 18 7月, 2008 1 次提交
  5. 12 6月, 2008 1 次提交
  6. 29 3月, 2008 4 次提交
  7. 25 3月, 2008 3 次提交
  8. 23 3月, 2008 1 次提交
  9. 21 3月, 2008 1 次提交
  10. 07 3月, 2008 1 次提交
    • D
      [UDP]: Revert udplite and code split. · db8dac20
      David S. Miller 提交于
      This reverts commit db1ed684 ("[IPV6]
      UDP: Rename IPv6 UDP files."), commit
      8be8af8f ("[IPV4] UDP: Move
      IPv4-specific bits to other file.") and commit
      e898d4db ("[UDP]: Allow users to
      configure UDP-Lite.").
      
      First, udplite is of such small cost, and it is a core protocol just
      like TCP and normal UDP are.
      
      We spent enormous amounts of effort to make udplite share as much code
      with core UDP as possible.  All of that work is less valuable if we're
      just going to slap a config option on udplite support.
      
      It is also causing build failures, as reported on linux-next, showing
      that the changeset was not tested very well.  In fact, this is the
      second build failure resulting from the udplite change.
      
      Finally, the config options provided was a bool, instead of a modular
      option.  Meaning the udplite code does not even get build tested
      by allmodconfig builds, and furthermore the user is not presented
      with a reasonable modular build option which is particularly needed
      by distribution vendors.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db8dac20
  11. 06 3月, 2008 1 次提交
  12. 04 3月, 2008 1 次提交
  13. 29 1月, 2008 1 次提交
  14. 07 11月, 2007 1 次提交
  15. 11 10月, 2007 1 次提交
  16. 08 6月, 2007 1 次提交
  17. 11 5月, 2007 1 次提交
  18. 26 4月, 2007 1 次提交
    • H
      [UDP]: Clean up UDP-Lite receive checksum · 759e5d00
      Herbert Xu 提交于
      This patch eliminates some duplicate code for the verification of
      receive checksums between UDP-Lite and UDP.  It does this by
      introducing __skb_checksum_complete_head which is identical to
      __skb_checksum_complete_head apart from the fact that it takes
      a length parameter rather than computing the first skb->len bytes.
      
      As a result UDP-Lite will be able to use hardware checksum offload
      for packets which do not use partial coverage checksums.  It also
      means that UDP-Lite loopback no longer does unnecessary checksum
      verification.
      
      If any NICs start support UDP-Lite this would also start working
      automatically.
      
      This patch removes the assumption that msg_flags has MSG_TRUNC clear
      upon entry in recvmsg.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      759e5d00
  19. 03 12月, 2006 2 次提交
    • A
      [NET]: Possible cleanups. · f5b99bcd
      Adrian Bunk 提交于
      This patch contains the following possible cleanups:
      - make the following needlessly global functions statis:
        - ipv4/tcp.c: __tcp_alloc_md5sig_pool()
        - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
        - ipv4/udplite.c: udplite_rcv()
        - ipv4/udplite.c: udplite_err()
      - make the following needlessly global structs static:
        - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
        - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
        - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
      - net/ipv{4,6}/udplite.c: remove inline's from static functions
                                (gcc should know best when to inline them)
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b99bcd
    • G
      [NET]: Supporting UDP-Lite (RFC 3828) in Linux · ba4e58ec
      Gerrit Renker 提交于
      This is a revision of the previously submitted patch, which alters
      the way files are organized and compiled in the following manner:
      
      	* UDP and UDP-Lite now use separate object files
      	* source file dependencies resolved via header files
      	  net/ipv{4,6}/udp_impl.h
      	* order of inclusion files in udp.c/udplite.c adapted
      	  accordingly
      
      [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)
      
      This patch adds support for UDP-Lite to the IPv4 stack, provided as an
      extension to the existing UDPv4 code:
              * generic routines are all located in net/ipv4/udp.c
              * UDP-Lite specific routines are in net/ipv4/udplite.c
              * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
              * shared API with extensions for partial checksum coverage
      
      [NET/IPv6]: Extension for UDP-Lite over IPv6
      
      It extends the existing UDPv6 code base with support for UDP-Lite
      in the same manner as per UDPv4. In particular,
              * UDPv6 generic and shared code is in net/ipv6/udp.c
              * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
              * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
              * support for IPV6_ADDRFORM
              * aligned the coding style of protocol initialisation with af_inet6.c
              * made the error handling in udpv6_queue_rcv_skb consistent;
                to return `-1' on error on all error cases
              * consolidation of shared code
      
      [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support
      
      The UDP-Lite patch further provides
              * API documentation for UDP-Lite
              * basic xfrm support
              * basic netfilter support for IPv4 and IPv6 (LOG target)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba4e58ec