1. 02 11月, 2008 1 次提交
    • E
      udp: add a missing smp_wmb() in udp_lib_get_port() · c37ccc0d
      Eric Dumazet 提交于
      Corey Minyard spotted a missing memory barrier in udp_lib_get_port()
      
      We need to make sure a reader cannot read the new 'sk->sk_next' value
      and previous value of 'sk->sk_hash'. Or else, an item could be deleted
      from a chain, and inserted into another chain. If new chain was empty
      before the move, 'next' pointer is NULL, and lockless reader can
      not detect it missed following items in original chain.
      
      This patch is temporary, since we expect an upcoming patch
      to introduce another way of handling the problem.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c37ccc0d
  2. 31 10月, 2008 2 次提交
  3. 30 10月, 2008 2 次提交
  4. 29 10月, 2008 3 次提交
    • E
      udp: calculate udp_mem based on low memory instead of all memory · 8203efb3
      Eric Dumazet 提交于
      This patch mimics commit 57413ebc
      (tcp: calculate tcp_mem based on low memory instead of all memory)
      
      The udp_mem array which contains limits on the total amount of memory
      used by UDP sockets is calculated based on nr_all_pages.  On a 32 bits
      x86 system, we should base this on the number of lowmem pages.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8203efb3
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
  5. 14 10月, 2008 1 次提交
  6. 10 10月, 2008 1 次提交
  7. 09 10月, 2008 1 次提交
    • E
      udp: Improve port randomization · 9088c560
      Eric Dumazet 提交于
      Current UDP port allocation is suboptimal.
      We select the shortest chain to chose a port (out of 512)
      that will hash in this shortest chain.
      
      First, it can lead to give not so ramdom ports and ease
      give attackers more opportunities to break the system.
      
      Second, it can consume a lot of CPU to scan all table
      in order to find the shortest chain.
      
      Third, in some pathological cases we can fail to find
      a free port even if they are plenty of them.
      
      This patch zap the search for a short chain and only
      use one random seed. Problem of getting long chains
      should be addressed in another way, since we can
      obtain long chains with non random ports.
      
      Based on a report and patch from Vitaly Mayatskikh
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9088c560
  8. 08 10月, 2008 3 次提交
  9. 01 10月, 2008 1 次提交
  10. 16 9月, 2008 1 次提交
  11. 09 8月, 2008 1 次提交
  12. 22 7月, 2008 1 次提交
  13. 18 7月, 2008 1 次提交
  14. 17 7月, 2008 2 次提交
  15. 15 7月, 2008 2 次提交
  16. 06 7月, 2008 2 次提交
  17. 18 6月, 2008 1 次提交
    • E
      udp: sk_drops handling · cb61cb9b
      Eric Dumazet 提交于
      In commits 33c732c3 ([IPV4]: Add raw
      drops counter) and a92aa318 ([IPV6]:
      Add raw drops counter), Wang Chen added raw drops counter for
      /proc/net/raw & /proc/net/raw6
      
      This patch adds this capability to UDP sockets too (/proc/net/udp &
      /proc/net/udp6).
      
      This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
      be examined for each udp socket.
      
      # grep Udp: /proc/net/snmp
      Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
      Udp: 23971006 75 899420 16390693 146348 0
      
      # cat /proc/net/udp
       sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
      uid  timeout inode ref pointer drops
       75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2358 2 ffff81082a538c80 0
      111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2286 2 ffff81042dd35c80 146348
      
      In this example, only port 111 (0x006F) was flooded by messages that
      user program could not read fast enough. 146348 messages were lost.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb61cb9b
  18. 17 6月, 2008 3 次提交
  19. 15 6月, 2008 1 次提交
  20. 12 6月, 2008 1 次提交
  21. 05 6月, 2008 1 次提交
  22. 02 5月, 2008 1 次提交
  23. 24 4月, 2008 1 次提交
  24. 14 4月, 2008 1 次提交
  25. 01 4月, 2008 2 次提交
  26. 29 3月, 2008 3 次提交