1. 16 12月, 2008 1 次提交
    • Y
      ipv6: fix the outgoing interface selection order in udpv6_sendmsg() · 9f690db7
      Yang Hongyang 提交于
      1.When no interface is specified in an IPV6_PKTINFO ancillary data
        item, the interface specified in an IPV6_PKTINFO sticky optionis 
        is used.
      
      RFC3542:
      6.7.  Summary of Outgoing Interface Selection
      
         This document and [RFC-3493] specify various methods that affect the
         selection of the packet's outgoing interface.  This subsection
         summarizes the ordering among those in order to ensure deterministic
         behavior.
      
         For a given outgoing packet on a given socket, the outgoing interface
         is determined in the following order:
      
         1. if an interface is specified in an IPV6_PKTINFO ancillary data
            item, the interface is used.
      
         2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
            option, the interface is used.
      Signed-off-by: NYang Hongyang <yanghy@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f690db7
  2. 26 11月, 2008 1 次提交
  3. 17 11月, 2008 1 次提交
  4. 03 11月, 2008 2 次提交
    • W
      udp: Fix the SNMP counter of UDP_MIB_INERRORS · 0856f939
      Wei Yongjun 提交于
      UDP packets received in udpv6_recvmsg() are not only IPv6 UDP packets, but
      also have IPv4 UDP packets, so when do the counter of UDP_MIB_INERRORS in
      udpv6_recvmsg(), we should check whether the packet is a IPv6 UDP packet
      or a IPv4 UDP packet.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0856f939
    • W
      udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS · f26ba175
      Wei Yongjun 提交于
      If UDP echo is sent to xinetd/echo-dgram, the UDP reply will be received
      at the sender. But the SNMP counter of UDP_MIB_INDATAGRAMS will be not
      increased, UDP6_MIB_INDATAGRAMS will be increased instead.
      
        Endpoint A                      Endpoint B
        UDP Echo request ----------->
        (IPv4, Dst port=7)
                         <----------    UDP Echo Reply
                                        (IPv4, Src port=7)
      
      This bug is come from this patch cb75994e.
      
      It do counter UDP[6]_MIB_INDATAGRAMS until udp[v6]_recvmsg. Because
      xinetd used IPv6 socket to receive UDP messages, thus, when received
      UDP packet, the UDP6_MIB_INDATAGRAMS will be increased in function
      udpv6_recvmsg() even if the packet is a IPv4 UDP packet.
      
      This patch fixed the problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f26ba175
  5. 02 11月, 2008 1 次提交
  6. 30 10月, 2008 1 次提交
    • E
      udp: introduce sk_for_each_rcu_safenext() · 96631ed1
      Eric Dumazet 提交于
      Corey Minyard found a race added in commit 271b72c7
      (udp: RCU handling for Unicast packets.)
      
       "If the socket is moved from one list to another list in-between the
       time the hash is calculated and the next field is accessed, and the
       socket has moved to the end of the new list, the traversal will not
       complete properly on the list it should have, since the socket will
       be on the end of the new list and there's not a way to tell it's on a
       new list and restart the list traversal.  I think that this can be
       solved by pre-fetching the "next" field (with proper barriers) before
       checking the hash."
      
      This patch corrects this problem, introducing a new
      sk_for_each_rcu_safenext() macro.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96631ed1
  7. 29 10月, 2008 2 次提交
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
  8. 08 10月, 2008 2 次提交
  9. 09 8月, 2008 1 次提交
  10. 06 7月, 2008 2 次提交
  11. 18 6月, 2008 1 次提交
    • E
      udp: sk_drops handling · cb61cb9b
      Eric Dumazet 提交于
      In commits 33c732c3 ([IPV4]: Add raw
      drops counter) and a92aa318 ([IPV6]:
      Add raw drops counter), Wang Chen added raw drops counter for
      /proc/net/raw & /proc/net/raw6
      
      This patch adds this capability to UDP sockets too (/proc/net/udp &
      /proc/net/udp6).
      
      This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
      be examined for each udp socket.
      
      # grep Udp: /proc/net/snmp
      Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
      Udp: 23971006 75 899420 16390693 146348 0
      
      # cat /proc/net/udp
       sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
      uid  timeout inode ref pointer drops
       75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2358 2 ffff81082a538c80 0
      111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2286 2 ffff81042dd35c80 146348
      
      In this example, only port 111 (0x006F) was flooded by messages that
      user program could not read fast enough. 146348 messages were lost.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb61cb9b
  12. 17 6月, 2008 3 次提交
  13. 15 6月, 2008 1 次提交
  14. 12 6月, 2008 1 次提交
  15. 05 6月, 2008 3 次提交
  16. 12 4月, 2008 1 次提交
    • B
      [IPv6]: Change IPv6 unspecified destination address to ::1 for raw and un-connected sockets · 876c7f41
      Brian Haley 提交于
      This patch fixes a difference between IPv4 and IPv6 when sending packets
      to the unspecified address (either 0.0.0.0 or ::) when using raw or
      un-connected UDP sockets.  There are two cases where IPv6 either fails
      to send anything, or sends with the destination address set to ::.  For
      example:
      
      --> ping -c1 0.0.0.0
      PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data.
      64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms
      
      --> ping6 -c1 ::
      PING ::(::) 56 data bytes
      ping: sendmsg: Invalid argument
      
      Doing a sendto("0.0.0.0") reveals:
      
      10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100
      
      Doing a sendto("::") reveals:
      
      10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100
      
      If you issue a connect() first in the UDP case, it will be sent to ::1,
      similar to what happens with TCP.
      
      This restores the BSD-ism.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      876c7f41
  17. 29 3月, 2008 4 次提交
  18. 26 3月, 2008 3 次提交
  19. 25 3月, 2008 1 次提交
  20. 23 3月, 2008 1 次提交
  21. 21 3月, 2008 1 次提交
  22. 08 3月, 2008 1 次提交
  23. 07 3月, 2008 1 次提交
    • D
      [UDP]: Revert udplite and code split. · db8dac20
      David S. Miller 提交于
      This reverts commit db1ed684 ("[IPV6]
      UDP: Rename IPv6 UDP files."), commit
      8be8af8f ("[IPV4] UDP: Move
      IPv4-specific bits to other file.") and commit
      e898d4db ("[UDP]: Allow users to
      configure UDP-Lite.").
      
      First, udplite is of such small cost, and it is a core protocol just
      like TCP and normal UDP are.
      
      We spent enormous amounts of effort to make udplite share as much code
      with core UDP as possible.  All of that work is less valuable if we're
      just going to slap a config option on udplite support.
      
      It is also causing build failures, as reported on linux-next, showing
      that the changeset was not tested very well.  In fact, this is the
      second build failure resulting from the udplite change.
      
      Finally, the config options provided was a bool, instead of a modular
      option.  Meaning the udplite code does not even get build tested
      by allmodconfig builds, and furthermore the user is not presented
      with a reasonable modular build option which is particularly needed
      by distribution vendors.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db8dac20
  24. 04 3月, 2008 2 次提交
  25. 01 2月, 2008 1 次提交
  26. 29 1月, 2008 1 次提交