1. 06 12月, 2011 1 次提交
  2. 27 11月, 2011 3 次提交
  3. 14 11月, 2011 1 次提交
    • E
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet 提交于
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      >
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      >
      > Early answer, build fails.
      >
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      >
      > Thanks.
      
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      
      [PATCH V5 net-next] neigh: new unresolved queue limits
      
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      
      $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
      8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5c171b
  4. 01 11月, 2011 1 次提交
  5. 02 8月, 2011 1 次提交
  6. 27 7月, 2011 1 次提交
  7. 18 7月, 2011 3 次提交
  8. 17 7月, 2011 2 次提交
  9. 02 7月, 2011 1 次提交
  10. 17 6月, 2011 1 次提交
  11. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  12. 08 5月, 2011 1 次提交
  13. 03 5月, 2011 1 次提交
    • E
      net: dont hold rtnl mutex during netlink dump callbacks · e67f88dd
      Eric Dumazet 提交于
      Four years ago, Patrick made a change to hold rtnl mutex during netlink
      dump callbacks.
      
      I believe it was a wrong move. This slows down concurrent dumps, making
      good old /proc/net/ files faster than rtnetlink in some situations.
      
      This occurred to me because one "ip link show dev ..." was _very_ slow
      on a workload adding/removing network devices in background.
      
      All dump callbacks are able to use RCU locking now, so this patch does
      roughly a revert of commits :
      
      1c2d670f : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
      6313c1e0 : [RTNETLINK]: Remove unnecessary locking in dump callbacks
      
      This let writers fight for rtnl mutex and readers going full speed.
      
      It also takes care of phonet : phonet_route_get() is now called from rcu
      read section. I renamed it to phonet_route_get_rcu()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e67f88dd
  14. 29 4月, 2011 2 次提交
  15. 18 4月, 2011 1 次提交
  16. 17 4月, 2011 1 次提交
  17. 13 3月, 2011 2 次提交
  18. 03 3月, 2011 1 次提交
  19. 02 3月, 2011 1 次提交
  20. 18 2月, 2011 1 次提交
  21. 27 1月, 2011 1 次提交
    • D
      net: Implement read-only protection and COW'ing of metrics. · 62fa8a84
      David S. Miller 提交于
      Routing metrics are now copy-on-write.
      
      Initially a route entry points it's metrics at a read-only location.
      If a routing table entry exists, it will point there.  Else it will
      point at the all zero metric place-holder called 'dst_default_metrics'.
      
      The writeability state of the metrics is stored in the low bits of the
      metrics pointer, we have two bits left to spare if we want to store
      more states.
      
      For the initial implementation, COW is implemented simply via kmalloc.
      However future enhancements will change this to place the writable
      metrics somewhere else, in order to increase sharing.  Very likely
      this "somewhere else" will be the inetpeer cache.
      
      Note also that this means that metrics updates may transiently fail
      if we cannot COW the metrics successfully.
      
      But even by itself, this patch should decrease memory usage and
      increase cache locality especially for routing workloads.  In those
      cases the read-only metric copies stay in place and never get written
      to.
      
      TCP workloads where metrics get updated, and those rare cases where
      PMTU triggers occur, will take a very slight performance hit.  But
      that hit will be alleviated when the long-term writable metrics
      move to a more sharable location.
      
      Since the metrics storage went from a u32 array of RTAX_MAX entries to
      what is essentially a pointer, some retooling of the dst_entry layout
      was necessary.
      
      Most importantly, we need to preserve the alignment of the reference
      count so that it doesn't share cache lines with the read-mostly state,
      as per Eric Dumazet's alignment assertion checks.
      
      The only non-trivial bit here is the move of the 'flags' member into
      the writeable cacheline.  This is OK since we are always accessing the
      flags around the same moment when we made a modification to the
      reference count.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62fa8a84
  22. 20 1月, 2011 1 次提交
  23. 15 12月, 2010 1 次提交
  24. 14 12月, 2010 1 次提交
    • D
      net: Abstract default ADVMSS behind an accessor. · 0dbaee3b
      David S. Miller 提交于
      Make all RTAX_ADVMSS metric accesses go through a new helper function,
      dst_metric_advmss().
      
      Leave the actual default metric as "zero" in the real metric slot,
      and compute the actual default value dynamically via a new dst_ops
      AF specific callback.
      
      For stacked IPSEC routes, we use the advmss of the path which
      preserves existing behavior.
      
      Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
      advmss on pmtu updates.  This inconsistency in advmss handling
      results in more raw metric accesses than I wish we ended up with.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dbaee3b
  25. 10 12月, 2010 1 次提交
    • D
      net: Abstract away all dst_entry metrics accesses. · defb3519
      David S. Miller 提交于
      Use helper functions to hide all direct accesses, especially writes,
      to dst_entry metrics values.
      
      This will allow us to:
      
      1) More easily change how the metrics are stored.
      
      2) Implement COW for metrics.
      
      In particular this will help us put metrics into the inetpeer
      cache if that is what we end up doing.  We can make the _metrics
      member a pointer instead of an array, initially have it point
      at the read-only metrics in the FIB, and then on the first set
      grab an inetpeer entry and point the _metrics member there.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      defb3519
  26. 29 11月, 2010 1 次提交
  27. 18 11月, 2010 1 次提交
  28. 12 11月, 2010 1 次提交
  29. 11 11月, 2010 1 次提交
  30. 09 11月, 2010 1 次提交
  31. 02 11月, 2010 1 次提交
  32. 12 10月, 2010 1 次提交
    • E
      net dst: use a percpu_counter to track entries · fc66f95c
      Eric Dumazet 提交于
      struct dst_ops tracks number of allocated dst in an atomic_t field,
      subject to high cache line contention in stress workload.
      
      Switch to a percpu_counter, to reduce number of time we need to dirty a
      central location. Place it on a separate cache line to avoid dirtying
      read only fields.
      
      Stress test :
      
      (Sending 160.000.000 UDP frames,
      IP route cache disabled, dual E5540 @2.53GHz,
      32bit kernel, FIB_TRIE, SLUB/NUMA)
      
      Before:
      
      real    0m51.179s
      user    0m15.329s
      sys     10m15.942s
      
      After:
      
      real	0m45.570s
      user	0m15.525s
      sys	9m56.669s
      
      With a small reordering of struct neighbour fields, subject of a
      following patch, (to separate refcnt from other read mostly fields)
      
      real	0m41.841s
      user	0m15.261s
      sys	8m45.949s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc66f95c
  33. 06 10月, 2010 1 次提交
    • E
      net neigh: RCU conversion of neigh hash table · d6bf7817
      Eric Dumazet 提交于
      David
      
      This is the first step for RCU conversion of neigh code.
      
      Next patches will convert hash_buckets[] and "struct neighbour" to RCU
      protected objects.
      
      Thanks
      
      [PATCH net-next] net neigh: RCU conversion of neigh hash table
      
      Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
      neigh_table", a new structure is defined :
      
      struct neigh_hash_table {
             struct neighbour        **hash_buckets;
             unsigned int            hash_mask;
             __u32                   hash_rnd;
             struct rcu_head         rcu;
      };
      
      And "struct neigh_table" has an RCU protected pointer to such a
      neigh_hash_table.
      
      This means the signature of (*hash)() function changed: We need to add a
      third parameter with the actual hash_rnd value, since this is not
      anymore a neigh_table field.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf7817