1. 03 9月, 2009 1 次提交
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  2. 18 7月, 2009 1 次提交
  3. 13 7月, 2009 1 次提交
  4. 18 6月, 2009 1 次提交
  5. 03 6月, 2009 1 次提交
  6. 11 4月, 2009 1 次提交
    • V
      ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7
      Vlad Yasevich 提交于
      Commit b2f5e7cd
      (ipv6: Fix conflict resolutions during ipv6 binding)
      introduced a regression where time-wait sockets were
      not treated correctly.  This resulted in the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
      IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
      ...
      Call Trace:
      [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
      [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
      [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
      [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
      [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
      [<ffffffff8056ed49>] sys_bind+0x89/0x100
      [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
      [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
      Tested-by: NBrian Haley <brian.haley@hp.com>
      Tested-by: NEd Tomlinson <edt@aei.ca>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499923c7
  7. 25 3月, 2009 1 次提交
  8. 24 3月, 2009 1 次提交
    • V
      udp: Wrong locking code in udp seq_file infrastructure · 30842f29
      Vitaly Mayatskikh 提交于
      Reading zero bytes from /proc/net/udp or other similar files which use
      the same seq_file udp infrastructure panics kernel in that way:
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      -------------------------------------
      read/1985 is trying to release lock (&table->hash[i].lock) at:
      [<ffffffff81321d83>] udp_seq_stop+0x27/0x29
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by read/1985:
       #0:  (&p->lock){--..}, at: [<ffffffff810eefb6>] seq_read+0x38/0x348
      
      stack backtrace:
      Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
      Call Trace:
       [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
       [<ffffffff8106dab9>] print_unlock_inbalance_bug+0xd6/0xe1
       [<ffffffff8106db62>] lock_release_non_nested+0x9e/0x1c6
       [<ffffffff810ef030>] ? seq_read+0xb2/0x348
       [<ffffffff8106bdba>] ? mark_held_locks+0x68/0x86
       [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
       [<ffffffff8106dde7>] lock_release+0x15d/0x189
       [<ffffffff8137163c>] _spin_unlock_bh+0x1e/0x34
       [<ffffffff81321d83>] udp_seq_stop+0x27/0x29
       [<ffffffff810ef239>] seq_read+0x2bb/0x348
       [<ffffffff810eef7e>] ? seq_read+0x0/0x348
       [<ffffffff8111aedd>] proc_reg_read+0x90/0xaf
       [<ffffffff810d878f>] vfs_read+0xa6/0x103
       [<ffffffff8106bfac>] ? trace_hardirqs_on_caller+0x12f/0x153
       [<ffffffff810d88a2>] sys_read+0x45/0x69
       [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
      BUG: scheduling while atomic: read/1985/0xffffff00
      INFO: lockdep is turned off.
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s
      eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs
      rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan]
      Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
      Call Trace:
       [<ffffffff8106b456>] ? __debug_show_held_locks+0x1b/0x24
       [<ffffffff81043660>] __schedule_bug+0x7e/0x83
       [<ffffffff8136ede9>] schedule+0xce/0x838
       [<ffffffff810d7972>] ? fsnotify_access+0x5f/0x67
       [<ffffffff810112d0>] ? sysret_careful+0xb/0x37
       [<ffffffff8106be9c>] ? trace_hardirqs_on_caller+0x1f/0x153
       [<ffffffff8137127b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff810112f6>] sysret_careful+0x31/0x37
      read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6
      Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost
      during splitting global udp_hash_lock to subsequent spinlocks.
      
      Signed-off by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
      Acked-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30842f29
  9. 14 3月, 2009 1 次提交
  10. 16 2月, 2009 1 次提交
  11. 06 2月, 2009 2 次提交
  12. 03 2月, 2009 1 次提交
  13. 27 1月, 2009 1 次提交
  14. 26 11月, 2008 1 次提交
  15. 25 11月, 2008 1 次提交
    • E
      net: avoid a pair of dst_hold()/dst_release() in ip_append_data() · 2e77d89b
      Eric Dumazet 提交于
      We can reduce pressure on dst entry refcount that slowdown UDP transmit
      path on SMP machines. This pressure is visible on RTP servers when
      delivering content to mediagateways, especially big ones, handling
      thousand of streams. Several cpus send UDP frames to the same
      destination, hence use the same dst entry.
      
      This patch makes ip_append_data() eventually steal the refcount its
      callers had to take on the dst entry.
      
      This doesnt avoid all refcounting, but still gives speedups on SMP,
      on UDP/RAW transmit path
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e77d89b
  16. 20 11月, 2008 1 次提交
  17. 17 11月, 2008 1 次提交
  18. 02 11月, 2008 2 次提交
  19. 31 10月, 2008 2 次提交
  20. 30 10月, 2008 2 次提交
  21. 29 10月, 2008 3 次提交
    • E
      udp: calculate udp_mem based on low memory instead of all memory · 8203efb3
      Eric Dumazet 提交于
      This patch mimics commit 57413ebc
      (tcp: calculate tcp_mem based on low memory instead of all memory)
      
      The udp_mem array which contains limits on the total amount of memory
      used by UDP sockets is calculated based on nr_all_pages.  On a 32 bits
      x86 system, we should base this on the number of lowmem pages.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8203efb3
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
  22. 14 10月, 2008 1 次提交
  23. 10 10月, 2008 1 次提交
  24. 09 10月, 2008 1 次提交
    • E
      udp: Improve port randomization · 9088c560
      Eric Dumazet 提交于
      Current UDP port allocation is suboptimal.
      We select the shortest chain to chose a port (out of 512)
      that will hash in this shortest chain.
      
      First, it can lead to give not so ramdom ports and ease
      give attackers more opportunities to break the system.
      
      Second, it can consume a lot of CPU to scan all table
      in order to find the shortest chain.
      
      Third, in some pathological cases we can fail to find
      a free port even if they are plenty of them.
      
      This patch zap the search for a short chain and only
      use one random seed. Problem of getting long chains
      should be addressed in another way, since we can
      obtain long chains with non random ports.
      
      Based on a report and patch from Vitaly Mayatskikh
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9088c560
  25. 08 10月, 2008 3 次提交
  26. 01 10月, 2008 1 次提交
  27. 16 9月, 2008 1 次提交
  28. 09 8月, 2008 1 次提交
  29. 22 7月, 2008 1 次提交
  30. 18 7月, 2008 1 次提交
  31. 17 7月, 2008 2 次提交