1. 18 2月, 2009 1 次提交
    • D
      net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce
      David S. Miller 提交于
      A long time ago we had bugs, primarily in TCP, where we would modify
      skb->truesize (for TSO queue collapsing) in ways which would corrupt
      the socket memory accounting.
      
      skb_truesize_check() was added in order to try and catch this error
      more systematically.
      
      However this debugging check has morphed into a Frankenstein of sorts
      and these days it does nothing other than catch false-positives.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a0acce
  2. 13 2月, 2009 1 次提交
    • A
      net: don't use in_atomic() in gfp_any() · 99709372
      Andrew Morton 提交于
      The problem is that in_atomic() will return false inside spinlocks if
      CONFIG_PREEMPT=n.  This will lead to deadlockable GFP_KERNEL allocations
      from spinlocked regions.
      
      Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
      will instead use GFP_ATOMIC from this callsite.  Hence we won't get the
      might_sleep() debugging warnings which would have informed us of the buggy
      callsites.
      
      Solve both these problems by switching to in_interrupt().  Now, if someone
      runs a gfp_any() allocation from inside spinlock we will get the warning
      if CONFIG_PREEMPT=y.
      
      I reviewed all callsites and most of them were too complex for my little
      brain and none of them documented their interface requirements.  I have no
      idea what this patch will do.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99709372
  3. 26 11月, 2008 2 次提交
  4. 17 11月, 2008 1 次提交
  5. 14 11月, 2008 1 次提交
    • I
      lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.c · e8f6fbf6
      Ingo Molnar 提交于
      fix this warning:
      
        net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
        net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used
      
      this is a lockdep macro problem in the !LOCKDEP case.
      
      We cannot convert it to an inline because the macro works on multiple types,
      but we can mark the parameter used.
      
      [ also clean up a misaligned tab in sock_lock_init_class_and_name() ]
      
      [ also remove #ifdefs from around af_family_clock_key strings - which
        were certainly added to get rid of the ugly build warnings. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8f6fbf6
  6. 13 11月, 2008 1 次提交
  7. 12 11月, 2008 1 次提交
    • I
      lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.c · e25cf3db
      Ingo Molnar 提交于
      fix this warning:
      
        net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
        net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used
      
      this is a lockdep macro problem in the !LOCKDEP case.
      
      We cannot convert it to an inline because the macro works on multiple types,
      but we can mark the parameter used.
      
      [ also clean up a misaligned tab in sock_lock_init_class_and_name() ]
      
      [ also remove #ifdefs from around af_family_clock_key strings - which
        were certainly added to get rid of the ugly build warnings. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e25cf3db
  8. 05 11月, 2008 1 次提交
  9. 31 10月, 2008 1 次提交
    • R
      net: delete excess kernel-doc notation · ad1d967c
      Randy Dunlap 提交于
      Remove excess kernel-doc function parameters from networking header
      & driver files:
      
      Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
      Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
      Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad1d967c
  10. 30 10月, 2008 1 次提交
    • E
      udp: introduce sk_for_each_rcu_safenext() · 96631ed1
      Eric Dumazet 提交于
      Corey Minyard found a race added in commit 271b72c7
      (udp: RCU handling for Unicast packets.)
      
       "If the socket is moved from one list to another list in-between the
       time the hash is calculated and the next field is accessed, and the
       socket has moved to the end of the new list, the traversal will not
       complete properly on the list it should have, since the socket will
       be on the end of the new list and there's not a way to tell it's on a
       new list and restart the list traversal.  I think that this can be
       solved by pre-fetching the "next" field (with proper barriers) before
       checking the hash."
      
      This patch corrects this problem, introducing a new
      sk_for_each_rcu_safenext() macro.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96631ed1
  11. 29 10月, 2008 3 次提交
    • E
      udp: RCU handling for Unicast packets. · 271b72c7
      Eric Dumazet 提交于
      Goals are :
      
      1) Optimizing handling of incoming Unicast UDP frames, so that no memory
       writes should happen in the fast path.
      
       Note: Multicasts and broadcasts still will need to take a lock,
       because doing a full lockless lookup in this case is difficult.
      
      2) No expensive operations in the socket bind/unhash phases :
        - No expensive synchronize_rcu() calls.
      
        - No added rcu_head in socket structure, increasing memory needs,
        but more important, forcing us to use call_rcu() calls,
        that have the bad property of making sockets structure cold.
        (rcu grace period between socket freeing and its potential reuse
         make this socket being cold in CPU cache).
        David did a previous patch using call_rcu() and noticed a 20%
        impact on TCP connection rates.
        Quoting Cristopher Lameter :
         "Right. That results in cacheline cooldown. You'd want to recycle
          the object as they are cache hot on a per cpu basis. That is screwed
          up by the delayed regular rcu processing. We have seen multiple
          regressions due to cacheline cooldown.
          The only choice in cacheline hot sensitive areas is to deal with the
          complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
      
        - Because udp sockets are allocated from dedicated kmem_cache,
        use of SLAB_DESTROY_BY_RCU can help here.
      
      Theory of operation :
      ---------------------
      
      As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
      special attention must be taken by readers and writers.
      
      Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
      reused, inserted in a different chain or in worst case in the same chain
      while readers could do lookups in the same time.
      
      In order to avoid loops, a reader must check each socket found in a chain
      really belongs to the chain the reader was traversing. If it finds a
      mismatch, lookup must start again at the begining. This *restart* loop
      is the reason we had to use rdlock for the multicast case, because
      we dont want to send same message several times to the same socket.
      
      We use RCU only for fast path.
      Thus, /proc/net/udp still takes spinlocks.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b72c7
    • E
      udp: introduce struct udp_table and multiple spinlocks · 645ca708
      Eric Dumazet 提交于
      UDP sockets are hashed in a 128 slots hash table.
      
      This hash table is protected by *one* rwlock.
      
      This rwlock is readlocked each time an incoming UDP message is handled.
      
      This rwlock is writelocked each time a socket must be inserted in
      hash table (bind time), or deleted from this table (close time)
      
      This is not scalable on SMP machines :
      
      1) Even in read mode, lock() and unlock() are atomic operations and
       must dirty a contended cache line, shared by all cpus.
      
      2) A writer might be starved if many readers are 'in flight'. This can
       happen on a machine with some NIC receiving many UDP messages. User
       process can be delayed a long time at socket creation/dismantle time.
      
      This patch prepares RCU migration, by introducing 'struct udp_table
      and struct udp_hslot', and using one spinlock per chain, to reduce
      contention on central rwlock.
      
      Introducing one spinlock per chain reduces latencies, for port
      randomization on heavily loaded UDP servers. This also speedup
      bindings to specific ports.
      
      udp_lib_unhash() was uninlined, becoming to big.
      
      Some cleanups were done to ease review of following patch
      (RCUification of UDP Unicast lookups)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      645ca708
    • A
      net: reduce structures when XFRM=n · def8b4fa
      Alexey Dobriyan 提交于
      ifdef out
      * struct sk_buff::sp		(pointer)
      * struct dst_entry::xfrm	(pointer)
      * struct sock::sk_policy	(2 pointers)
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def8b4fa
  12. 08 10月, 2008 2 次提交
  13. 28 8月, 2008 1 次提交
  14. 17 7月, 2008 1 次提交
  15. 18 6月, 2008 2 次提交
    • D
      net: Add sk_set_socket() helper. · 972692e0
      David S. Miller 提交于
      In order to more easily grep for all things that set
      sk->sk_socket, add sk_set_socket() helper inline function.
      
      Suggested (although only half-seriously) by Evgeniy Polyakov.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      972692e0
    • E
      udp: sk_drops handling · cb61cb9b
      Eric Dumazet 提交于
      In commits 33c732c3 ([IPV4]: Add raw
      drops counter) and a92aa318 ([IPV6]:
      Add raw drops counter), Wang Chen added raw drops counter for
      /proc/net/raw & /proc/net/raw6
      
      This patch adds this capability to UDP sockets too (/proc/net/udp &
      /proc/net/udp6).
      
      This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
      be examined for each udp socket.
      
      # grep Udp: /proc/net/snmp
      Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
      Udp: 23971006 75 899420 16390693 146348 0
      
      # cat /proc/net/udp
       sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
      uid  timeout inode ref pointer drops
       75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2358 2 ffff81082a538c80 0
      111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
        0        0 2286 2 ffff81042dd35c80 146348
      
      In this example, only port 111 (0x006F) was flooded by messages that
      user program could not read fast enough. 146348 messages were lost.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb61cb9b
  16. 17 6月, 2008 1 次提交
  17. 15 6月, 2008 1 次提交
  18. 16 4月, 2008 1 次提交
  19. 10 4月, 2008 1 次提交
  20. 01 4月, 2008 1 次提交
  21. 29 3月, 2008 4 次提交
  22. 26 3月, 2008 2 次提交
  23. 23 3月, 2008 2 次提交
  24. 22 3月, 2008 1 次提交
  25. 21 3月, 2008 1 次提交
    • P
      [NET]: Add per-connection option to set max TSO frame size · 82cc1a7a
      Peter P Waskiewicz Jr 提交于
      Update: My mailer ate one of Jarek's feedback mails...  Fixed the
      parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
      whitespace issue due to a patch import botch.  Changed the types from
      u32 to unsigned int to be more consistent with other variables in the
      area.  Also brought the patch up to the latest net-2.6.26 tree.
      
      Update: Made gso_max_size container 32 bits, not 16.  Moved the
      location of gso_max_size within netdev to be less hotpath.  Made more
      consistent names between the sock and netdev layers, and added a
      define for the max GSO size.
      
      Update: Respun for net-2.6.26 tree.
      
      Update: changed max_gso_frame_size and sk_gso_max_size from signed to
      unsigned - thanks Stephen!
      
      This patch adds the ability for device drivers to control the size of
      the TSO frames being sent to them, per TCP connection.  By setting the
      netdevice's gso_max_size value, the socket layer will set the GSO
      frame size based on that value.  This will propogate into the TCP
      layer, and send TSO's of that size to the hardware.
      
      This can be desirable to help tune the bursty nature of TSO on a
      per-adapter basis, where one may have 1 GbE and 10 GbE devices
      coexisting in a system, one running multiqueue and the other not, etc.
      
      This can also be desirable for devices that cannot support full 64 KB
      TSO's, but still want to benefit from some level of segmentation
      offloading.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82cc1a7a
  26. 01 3月, 2008 1 次提交
  27. 19 2月, 2008 1 次提交
  28. 03 2月, 2008 1 次提交
    • A
      [SOCK] proto: Add hashinfo member to struct proto · ab1e0a13
      Arnaldo Carvalho de Melo 提交于
      This way we can remove TCP and DCCP specific versions of
      
      sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
      sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                             a specific version to deal with mapped sockets
      sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly
      
      struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
      that inet_csk_get_port can find the per family routine.
      
      Now only the lookup routines receive as a parameter a struct inet_hashtable.
      
      With this we further reuse code, reducing the difference among INET transport
      protocols.
      
      Eventually work has to be done on UDP and SCTP to make them share this
      infrastructure and get as a bonus inet_diag interfaces so that iproute can be
      used with these protocols.
      
      net-2.6/net/ipv4/inet_hashtables.c:
        struct proto			     |   +8
        struct inet_connection_sock_af_ops |   +8
       2 structs changed
        __inet_hash_nolisten               |  +18
        __inet_hash                        | -210
        inet_put_port                      |   +8
        inet_bind_bucket_create            |   +1
        __inet_hash_connect                |   -8
       5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
      
      net-2.6/net/core/sock.c:
        proto_seq_show                     |   +3
       1 function changed, 3 bytes added, diff: +3
      
      net-2.6/net/ipv4/inet_connection_sock.c:
        inet_csk_get_port                  |  +15
       1 function changed, 15 bytes added, diff: +15
      
      net-2.6/net/ipv4/tcp.c:
        tcp_set_state                      |   -7
       1 function changed, 7 bytes removed, diff: -7
      
      net-2.6/net/ipv4/tcp_ipv4.c:
        tcp_v4_get_port                    |  -31
        tcp_v4_hash                        |  -48
        tcp_v4_destroy_sock                |   -7
        tcp_v4_syn_recv_sock               |   -2
        tcp_unhash                         | -179
       5 functions changed, 267 bytes removed, diff: -267
      
      net-2.6/net/ipv6/inet6_hashtables.c:
        __inet6_hash |   +8
       1 function changed, 8 bytes added, diff: +8
      
      net-2.6/net/ipv4/inet_hashtables.c:
        inet_unhash                        | +190
        inet_hash                          | +242
       2 functions changed, 432 bytes added, diff: +432
      
      vmlinux:
       16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
      
      /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
        tcp_v6_get_port                    |  -31
        tcp_v6_hash                        |   -7
        tcp_v6_syn_recv_sock               |   -9
       3 functions changed, 47 bytes removed, diff: -47
      
      /home/acme/git/net-2.6/net/dccp/proto.c:
        dccp_destroy_sock                  |   -7
        dccp_unhash                        | -179
        dccp_hash                          |  -49
        dccp_set_state                     |   -7
        dccp_done                          |   +1
       5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
      
      /home/acme/git/net-2.6/net/dccp/ipv4.c:
        dccp_v4_get_port                   |  -31
        dccp_v4_request_recv_sock          |   -2
       2 functions changed, 33 bytes removed, diff: -33
      
      /home/acme/git/net-2.6/net/dccp/ipv6.c:
        dccp_v6_get_port                   |  -31
        dccp_v6_hash                       |   -7
        dccp_v6_request_recv_sock          |   +5
       3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab1e0a13
  29. 01 2月, 2008 1 次提交
  30. 29 1月, 2008 1 次提交