1. 20 6月, 2006 1 次提交
  2. 18 6月, 2006 1 次提交
    • H
      [NET]: Add netif_tx_lock · 932ff279
      Herbert Xu 提交于
      Various drivers use xmit_lock internally to synchronise with their
      transmission routines.  They do so without setting xmit_lock_owner.
      This is fine as long as netpoll is not in use.
      
      With netpoll it is possible for deadlocks to occur if xmit_lock_owner
      isn't set.  This is because if a printk occurs while xmit_lock is held
      and xmit_lock_owner is not set can cause netpoll to attempt to take
      xmit_lock recursively.
      
      While it is possible to resolve this by getting netpoll to use
      trylock, it is suboptimal because netpoll's sole objective is to
      maximise the chance of getting the printk out on the wire.  So
      delaying or dropping the message is to be avoided as much as possible.
      
      So the only alternative is to always set xmit_lock_owner.  The
      following patch does this by introducing the netif_tx_lock family of
      functions that take care of setting/unsetting xmit_lock_owner.
      
      I renamed xmit_lock to _xmit_lock to indicate that it should not be
      used directly.  I didn't provide irq versions of the netif_tx_lock
      functions since xmit_lock is meant to be a BH-disabling lock.
      
      This is pretty much a straight text substitution except for a small
      bug fix in winbond.  It currently uses
      netif_stop_queue/spin_unlock_wait to stop transmission.  This is
      unsafe as an IRQ can potentially wake up the queue.  So it is safer to
      use netif_tx_disable.
      
      The hamradio bits used spin_lock_irq but it is unnecessary as
      xmit_lock must never be taken in an IRQ handler.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      932ff279
  3. 13 5月, 2006 1 次提交
    • S
      [NEIGH]: Fix IP-over-ATM and ARP interaction. · bd89efc5
      Simon Kelley 提交于
      The classical IP over ATM code maintains its own IPv4 <-> <ATM stuff>
      ARP table, using the standard neighbour-table code. The
      neigh_table_init function adds this neighbour table to a linked list
      of all neighbor tables which is used by the functions neigh_delete()
      neigh_add() and neightbl_set(), all called by the netlink code.
      
      Once the ATM neighbour table is added to the list, there are two
      tables with family == AF_INET there, and ARP entries sent via netlink
      go into the first table with matching family. This is indeterminate
      and often wrong.
      
      To see the bug, on a kernel with CLIP enabled, create a standard IPv4
      ARP entry by pinging an unused address on a local subnet. Then attempt
      to complete that entry by doing
      
      ip neigh replace <ip address> lladdr <some mac address> nud reachable
      
      Looking at the ARP tables by using 
      
      ip neigh show
      
      will reveal two ARP entries for the same address. One of these can be
      found in /proc/net/arp, and the other in /proc/net/atm/arp.
      
      This patch adds a new function, neigh_table_init_no_netlink() which
      does everything the neigh_table_init() does, except add the table to
      the netlink all-arp-tables chain. In addition neigh_table_init() has a
      check that all tables on the chain have a distinct address family.
      The init call in clip.c is changed to call
      neigh_table_init_no_netlink().
      
      Since ATM ARP tables are rather more complicated than can currently be
      handled by the available rtattrs in the netlink protocol, no
      functionality is lost by this patch, and non-ATM ARP manipulation via
      netlink is rescued. A more complete solution would involve a rtattr
      for ATM ARP entries and some way for the netlink code to give
      neigh_add and friends more information than just address family with
      which to find the correct ARP table.
      
      [ I've changed the assertion checking in neigh_table_init() to not
        use BUG_ON() while holding neigh_tbl_lock.  Instead we remember that
        we found an existing tbl with the same family, and after dropping
        the lock we'll give a diagnostic kernel log message and a stack dump.
        -DaveM ]
      Signed-off-by: NSimon Kelley <simon@thekelleys.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd89efc5
  4. 15 4月, 2006 6 次提交
  5. 21 3月, 2006 3 次提交
  6. 05 3月, 2006 1 次提交
  7. 14 2月, 2006 1 次提交
  8. 12 1月, 2006 2 次提交
  9. 11 1月, 2006 1 次提交
  10. 07 1月, 2006 1 次提交
  11. 04 1月, 2006 1 次提交
    • E
      [NET]: move struct proto_ops to const · 90ddc4f0
      Eric Dumazet 提交于
      I noticed that some of 'struct proto_ops' used in the kernel may share
      a cache line used by locks or other heavily modified data. (default
      linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
      least)
      
      This patch makes sure a 'struct proto_ops' can be declared as const,
      so that all cpus can share all parts of it without false sharing.
      
      This is not mandatory : a driver can still use a read/write structure
      if it needs to (and eventually a __read_mostly)
      
      I made a global stubstitute to change all existing occurences to make
      them const.
      
      This should reduce the possibility of false sharing on SMP, and
      speedup some socket system calls.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90ddc4f0
  12. 30 11月, 2005 5 次提交
  13. 09 10月, 2005 1 次提交
  14. 08 10月, 2005 1 次提交
  15. 07 10月, 2005 1 次提交
  16. 05 10月, 2005 1 次提交
  17. 04 10月, 2005 2 次提交
    • H
      [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl · e5ed6399
      Herbert Xu 提交于
      The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
      introduces __in_dev_get_rcu() to cover the second case.
      
      1) RCU with refcnt should use in_dev_get().
      2) RCU without refcnt should use __in_dev_get_rcu().
      3) All others must hold RTNL and use __in_dev_get_rtnl().
      
      There is one exception in net/ipv4/route.c which is in fact a pre-existing
      race condition.  I've marked it as such so that we remember to fix it.
      
      This patch is based on suggestions and prior work by Suzanne Wood and
      Paul McKenney.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5ed6399
    • E
      [INET]: speedup inet (tcp/dccp) lookups · 81c3d547
      Eric Dumazet 提交于
      Arnaldo and I agreed it could be applied now, because I have other
      pending patches depending on this one (Thank you Arnaldo)
      
      (The other important patch moves skc_refcnt in a separate cache line,
      so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
      
      1) First some performance data :
      --------------------------------
      
      tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
      
      The most time critical code is :
      
      sk_for_each(sk, node, &head->chain) {
           if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
               goto hit; /* You sunk my battleship! */
      }
      
      The sk_for_each() does use prefetch() hints but only the begining of
      "struct sock" is prefetched.
      
      As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
      away from the begining of "struct sock", it has to bring into CPU
      cache cold cache line. Each iteration has to use at least 2 cache
      lines.
      
      This can be problematic if some chains are very long.
      
      2) The goal
      -----------
      
      The idea I had is to change things so that INET_MATCH() may return
      FALSE in 99% of cases only using the data already in the CPU cache,
      using one cache line per iteration.
      
      3) Description of the patch
      ---------------------------
      
      Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
      filling a 32 bits hole on 64 bits platform.
      
      struct sock_common {
      	unsigned short		skc_family;
      	volatile unsigned char	skc_state;
      	unsigned char		skc_reuse;
      	int			skc_bound_dev_if;
      	struct hlist_node	skc_node;
      	struct hlist_node	skc_bind_node;
      	atomic_t		skc_refcnt;
      +	unsigned int		skc_hash;
      	struct proto		*skc_prot;
      };
      
      Store in this 32 bits field the full hash, not masked by (ehash_size -
      1) Using this full hash as the first comparison done in INET_MATCH
      permits us immediatly skip the element without touching a second cache
      line in case of a miss.
      
      Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
      sk_hash and tw_hash) already contains the slot number if we mask with
      (ehash_size - 1)
      
      File include/net/inet_hashtables.h
      
      64 bits platforms :
      #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))
           ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
           ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      32bits platforms:
      #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))                 &&  \
           (inet_sk(__sk)->daddr          == (__saddr))   &&  \
           (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      
      - Adds a prefetch(head->chain.first) in 
      __inet_lookup_established()/__tcp_v4_check_established() and 
      __inet6_lookup_established()/__tcp_v6_check_established() and 
      __dccp_v4_check_established() to bring into cache the first element of the 
      list, before the {read|write}_lock(&head->lock);
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c3d547
  18. 30 9月, 2005 2 次提交
  19. 29 9月, 2005 3 次提交
  20. 10 9月, 2005 1 次提交
  21. 06 9月, 2005 1 次提交
  22. 30 8月, 2005 1 次提交
  23. 20 7月, 2005 2 次提交