1. 14 11月, 2011 1 次提交
    • E
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet 提交于
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      >
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      >
      > Early answer, build fails.
      >
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      >
      > Thanks.
      
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      
      [PATCH V5 net-next] neigh: new unresolved queue limits
      
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      
      $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
      8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5c171b
  2. 27 7月, 2011 1 次提交
  3. 18 7月, 2011 1 次提交
  4. 17 7月, 2011 4 次提交
  5. 14 7月, 2011 1 次提交
    • D
      net: Embed hh_cache inside of struct neighbour. · f6b72b62
      David S. Miller 提交于
      Now that there is a one-to-one correspondance between neighbour
      and hh_cache entries, we no longer need:
      
      1) dynamic allocation
      2) attachment to dst->hh
      3) refcounting
      
      Initialization of the hh_cache entry is indicated by hh_len
      being non-zero, and such initialization is always done with
      the neighbour's lock held as a writer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6b72b62
  6. 11 7月, 2011 1 次提交
    • D
      neigh: Store hash shift instead of mask. · cd089336
      David S. Miller 提交于
      And mask the hash function result by simply shifting
      down the "->hash_shift" most significant bits.
      
      Currently which bits we use is arbitrary since jhash
      produces entropy evenly across the whole hash function
      result.
      
      But soon we'll be using universal hashing functions,
      and in those cases more entropy exists in the higher
      bits than the lower bits, because they use multiplies.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd089336
  7. 19 11月, 2010 1 次提交
  8. 12 11月, 2010 1 次提交
  9. 12 10月, 2010 2 次提交
    • E
      neigh: reorder struct neighbour fields · e37ef961
      Eric Dumazet 提交于
      Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
      > Here is the followup patch.
      >
      > Thanks !
      >
      
      Oops, this was an old version, the up2date ones also took care of "used"
      field.
      
      I guess its time for a sleep, sorry again.
      
      [PATCH net-next V2] neigh: reorder struct neighbour fields
      
      (refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
      be placed on a separate cache lines.
      
      refcnt can be often written, while other fields are mostly read.
      
      This gave me good result on stress test :
      
      before:
      
      real    0m45.570s
      user    0m15.525s
      sys     9m56.669s
      
      After:
      
      real    0m41.841s
      user    0m15.261s
      sys     8m45.949s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e37ef961
    • E
      neigh: Protect neigh->ha[] with a seqlock · 0ed8ddf4
      Eric Dumazet 提交于
      Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
      dirtying neighbour in stress situation (many different flows / dsts)
      
      Dirtying takes place because of read_lock(&n->lock) and n->used writes.
      
      Switching to a seqlock, and writing n->used only on jiffies changes
      permits less dirtying.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed8ddf4
  10. 07 10月, 2010 1 次提交
    • E
      neigh: RCU conversion of struct neighbour · 767e97e1
      Eric Dumazet 提交于
      This is the second step for neighbour RCU conversion.
      
      (first was commit d6bf7817 : RCU conversion of neigh hash table)
      
      neigh_lookup() becomes lockless, but still take a reference on found
      neighbour. (no more read_lock()/read_unlock() on tbl->lock)
      
      struct neighbour gets an additional rcu_head field and is freed after an
      RCU grace period.
      
      Future work would need to eventually not take a reference on neighbour
      for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
      use a noref bit like we did for skb->_dst.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767e97e1
  11. 06 10月, 2010 1 次提交
    • E
      net neigh: RCU conversion of neigh hash table · d6bf7817
      Eric Dumazet 提交于
      David
      
      This is the first step for RCU conversion of neigh code.
      
      Next patches will convert hash_buckets[] and "struct neighbour" to RCU
      protected objects.
      
      Thanks
      
      [PATCH net-next] net neigh: RCU conversion of neigh hash table
      
      Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
      neigh_table", a new structure is defined :
      
      struct neigh_hash_table {
             struct neighbour        **hash_buckets;
             unsigned int            hash_mask;
             __u32                   hash_rnd;
             struct rcu_head         rcu;
      };
      
      And "struct neigh_table" has an RCU protected pointer to such a
      neigh_hash_table.
      
      This means the signature of (*hash)() function changed: We need to add a
      third parameter with the actual hash_rnd value, since this is not
      anymore a neigh_table field.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf7817
  12. 01 10月, 2010 1 次提交
  13. 01 7月, 2010 1 次提交
  14. 15 4月, 2010 1 次提交
    • B
      netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT · e179e632
      Bart De Schuymer 提交于
      - fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
      neigh_hh_output() or dst->neighbour->output() overwrite the complete
      Ethernet header, although we only need the destination MAC address.
      For encapsulated packets, they ended up overwriting the encapsulating
      header. The new code copies the Ethernet source MAC address and
      protocol number before calling dst->neighbour->output(). The Ethernet
      source MAC and protocol number are copied back in place in
      br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
      more transparent because in the old scheme the source MAC of the
      bridge was copied into the source address in the Ethernet header. We
      also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
      execution of the PF_INET resp. PF_INET6 hooks.
      
      - Speed up IP DNAT by calling neigh_hh_bridge() instead of
      neigh_hh_output(): if dst->hh is available, we already know the MAC
      address so we can just copy it.
      Signed-off-by: NBart De Schuymer <bdschuym@pandora.be>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      e179e632
  15. 17 2月, 2010 2 次提交
    • T
      percpu: add __percpu sparse annotations to net · 7d720c3e
      Tejun Heo 提交于
      Add __percpu sparse annotations to net.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      
      The macro and type tricks around snmp stats make things a bit
      interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
      as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
      snmp_mib_*() users which used to cast the argument to (void **) are
      updated to cast it to (void __percpu **).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d720c3e
    • E
      net neigh: Decouple per interface neighbour table controls from binary sysctls · 54716e3b
      Eric W. Biederman 提交于
      Stop computing the number of neighbour table settings we have by
      counting the number of binary sysctls.  This behaviour was silly
      and meant that we could not add another neighbour table setting
      without also adding another binary sysctl.
      
      Don't pass the binary sysctl path for neighour table entries
      into neigh_sysctl_register.  These parameters are no longer
      used and so are just dead code.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54716e3b
  16. 12 11月, 2009 1 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f
  17. 04 11月, 2009 1 次提交
  18. 03 10月, 2009 1 次提交
  19. 02 9月, 2009 1 次提交
  20. 03 8月, 2009 1 次提交
    • E
      neigh: Convert garbage collection from softirq to workqueue · e4c4e448
      Eric Dumazet 提交于
      Current neigh_periodic_timer() function is fired by timer IRQ, and
      scans one hash bucket each round (very litle work in fact)
      
      As we are supposed to scan whole hash table in 15 seconds, this means
      neigh_periodic_timer() can be fired very often. (depending on the number
      of concurrent hash entries we stored in this table)
      
      Converting this to a workqueue permits scanning whole table, minimizing
      icache pollution, and firing this work every 15 seconds, independantly
      of hash table size.
      
      This 15 seconds delay is not a hard number, as work is a deferrable one.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4c4e448
  21. 12 11月, 2008 2 次提交
  22. 17 7月, 2008 1 次提交
    • N
      core: add stat to track unresolved discards in neighbor cache · 9a6d276e
      Neil Horman 提交于
      in __neigh_event_send, if we have a neighbour entry which is in
      NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
      to the neighbours arp_queue, which is default capped to a length of 3
      skbs.  If that queue exceeds its set length, it will drop an skb on
      the queue to enqueue the newly arrived skb.  This results in a drop
      for which we have no statistics incremented.  This patch adds an
      unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
      lost frames.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a6d276e
  23. 26 3月, 2008 1 次提交
  24. 25 3月, 2008 1 次提交
    • P
      [NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3). · fa86d322
      Pavel Emelyanov 提交于
      Proxy neighbors do not have any reference counting, so any caller
      of pneigh_lookup (unless it's a netlink triggered add/del routine)
      should _not_ perform any actions on the found proxy entry. 
      
      There's one exception from this rule - the ipv6's ndisc_recv_ns() 
      uses found entry to check the flags for NTF_ROUTER.
      
      This creates a race between the ndisc and pneigh_delete - after 
      the pneigh is returned to the caller, the nd_tbl.lock is dropped 
      and the deleting procedure may proceed.
      
      One of the fixes would be to add a reference counting, but this
      problem exists for ndisc only. Besides such a patch would be too 
      big for -rc4.
      
      So I propose to introduce a __pneigh_lookup() which is supposed
      to be called with the lock held and use it in ndisc code to check
      the flags on alive pneigh entry.
      
      
      Changes from v2:
      As David noticed, Exported the __pneigh_lookup() to ipv6 module. 
      The checkpatch generates a warning on it, since the EXPORT_SYMBOL 
      does not follow the symbol itself, but in this file all the 
      exports come at the end, so I decided no to break this harmony.
      
      Changes from v1:
      Fixed comments from YOSHIFUJI - indentation of prototype in header
      and the pndisc_check_router() name - and a compilation fix, pointed
      by Daniel - the is_routed was (falsely) considered as uninitialized
      by gcc.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa86d322
  25. 04 3月, 2008 1 次提交
  26. 29 1月, 2008 5 次提交
  27. 26 4月, 2007 1 次提交
  28. 26 3月, 2007 1 次提交
    • A
      [NET]: Fix neighbour destructor handling. · ecbb4169
      Alexey Kuznetsov 提交于
      ->neigh_destructor() is killed (not used), replaced with
      ->neigh_cleanup(), which is called when neighbor entry goes to dead
      state. At this point everything is still valid: neigh->dev,
      neigh->parms etc.
      
      The device should guarantee that dead neighbor entries (neigh->dead !=
      0) do not get private part initialized, otherwise nobody will cleanup
      it.
      
      I think this is enough for ipoib which is the only user of this thing.
      Initialization private part of neighbor entries happens in ipib
      start_xmit routine, which is not reached when device is down.  But it
      would be better to add explicit test for neigh->dead in any case.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecbb4169
  29. 09 12月, 2006 1 次提交
  30. 08 12月, 2006 1 次提交