提交 · 21efcfa0ff27776902a8a15e810147be4d937d69 · openanolis / cloud-kernel

09 6月, 2011 2 次提交

inetpeer: lower false sharing effect · 2b77bdde

由 Eric Dumazet 提交于 6月 08, 2011

Profiles show false sharing in addr_compare() because refcnt/dtime
changes dirty the first inet_peer cache line, where are lying the keys
used at lookup time. If many cpus are calling inet_getpeer() and
inet_putpeer(), or need frag ids, addr_compare() is in 2nd position in
"perf top".

Before patch, my udpflood bench (16 threads) on my 2x4x2 machine :

             5784.00  9.7% csum_partial_copy_generic [kernel]
             3356.00  5.6% addr_compare              [kernel]
             2638.00  4.4% fib_table_lookup          [kernel]
             2625.00  4.4% ip_fragment               [kernel]
             1934.00  3.2% neigh_lookup              [kernel]
             1617.00  2.7% udp_sendmsg               [kernel]
             1608.00  2.7% __ip_route_output_key     [kernel]
             1480.00  2.5% __ip_append_data          [kernel]
             1396.00  2.3% kfree                     [kernel]
             1195.00  2.0% kmem_cache_free           [kernel]
             1157.00  1.9% inet_getpeer              [kernel]
             1121.00  1.9% neigh_resolve_output      [kernel]
             1012.00  1.7% dev_queue_xmit            [kernel]
# time ./udpflood.sh

real	0m44.511s
user	0m20.020s
sys	11m22.780s

# time ./udpflood.sh

real	0m44.099s
user	0m20.140s
sys	11m15.870s

After patch, no more addr_compare() in profiles :

             4171.00 10.7% csum_partial_copy_generic   [kernel]
             1787.00  4.6% fib_table_lookup            [kernel]
             1756.00  4.5% ip_fragment                 [kernel]
             1234.00  3.2% udp_sendmsg                 [kernel]
             1191.00  3.0% neigh_lookup                [kernel]
             1118.00  2.9% __ip_append_data            [kernel]
             1022.00  2.6% kfree                       [kernel]
              993.00  2.5% __ip_route_output_key       [kernel]
              841.00  2.2% neigh_resolve_output        [kernel]
              816.00  2.1% kmem_cache_free             [kernel]
              658.00  1.7% ia32_sysenter_target        [kernel]
              632.00  1.6% kmem_cache_alloc_node       [kernel]

# time ./udpflood.sh

real	0m41.587s
user	0m19.190s
sys	10m36.370s

# time ./udpflood.sh

real	0m41.486s
user	0m19.290s
sys	10m33.650s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b77bdde

inetpeer: remove unused list · 4b9d9be8

由 Eric Dumazet 提交于 6月 08, 2011

Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b9d9be8

23 4月, 2011 1 次提交

inet: constify ip headers and in6_addr · b71d1d42

由 Eric Dumazet 提交于 4月 22, 2011

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71d1d42

11 2月, 2011 2 次提交

inetpeer: Add redirect and PMTU discovery cached info. · ddd4aa42

由 David S. Miller 提交于 2月 09, 2011

Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst->expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddd4aa42

inetpeer: Abstract address representation further. · 7a71ed89

由 David S. Miller 提交于 2月 09, 2011

Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the ->daddr.family member,
replicating the family in ever address we store is entirely
redundant.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a71ed89

05 2月, 2011 1 次提交

inetpeer: Move ICMP rate limiting state into inet_peer entries. · 92d86829

由 David S. Miller 提交于 2月 04, 2011

Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92d86829

28 1月, 2011 2 次提交

inetpeer: Mark metrics as "new" in fresh inetpeer entries. · 144001bd

由 David S. Miller 提交于 1月 27, 2011

Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

144001bd

D
inetpeer: Add metrics storage to inetpeer entries. · 60659823
由 David S. Miller 提交于 1月 26, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
60659823

02 12月, 2010 2 次提交
- D
  inetpeer: Fix incorrect comment about inetpeer struct size. · 4399ce40
  由 David S. Miller 提交于 12月 01, 2010
```
Now with ipv6 support it is no longer less than 64 bytes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4399ce40
- D
  inetpeer: Kill use of inet_peer_address_t typedef. · 8790ca17
  由 David S. Miller 提交于 12月 01, 2010
```
They are verboten these days.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  8790ca17
01 12月, 2010 3 次提交

inetpeer: Add inet_getpeer_v6() · 672f007d

由 David S. Miller 提交于 11月 30, 2010

Now that all of the infrastructure is in place, we can add
the ipv6 shorthand for peer creation.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

672f007d

D
inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer. · b534ecf1
由 David S. Miller 提交于 11月 30, 2010
```
And make an inet_getpeer_v4() helper, update callers.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b534ecf1

inetpeer: Introduce inet_peer_address_t. · 582a72da

由 David S. Miller 提交于 11月 30, 2010

Currently only the v4 aspect is used, but this will change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

582a72da

28 10月, 2010 1 次提交

inetpeer: __rcu annotations · b914c4ea

由 Eric Dumazet 提交于 10月 25, 2010

Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b914c4ea

17 6月, 2010 1 次提交

inetpeer: restore small inet_peer structures · 317fe0e6

由 Eric Dumazet 提交于 6月 16, 2010

Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.

Thats a bit unfortunate, since old size was exactly 64 bytes.

This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.

Add a inet_peer_refcheck() function to check this assertion for a while.

We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

317fe0e6

16 6月, 2010 1 次提交

inetpeer: RCU conversion · aa1039e7

由 Eric Dumazet 提交于 6月 15, 2010

inetpeer currently uses an AVL tree protected by an rwlock.

It's possible to make most lookups use RCU

1) Add a struct rcu_head to struct inet_peer

2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().

3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.

4) add an smp_wmb() in link_to_pool() right before node insert.

5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.

6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.

7) inet_getpeer() first attempts lockless lookup.
   Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
   If this attemps fails, lock is taken a regular lookup is performed
again.

8) convert peers.lock from rwlock to a spinlock

9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1039e7

14 11月, 2009 1 次提交

inetpeer: Optimize inet_getid() · 2c1409a0

由 Eric Dumazet 提交于 11月 12, 2009

While investigating for network latencies, I found inet_getid() was a
contention point for some workloads, as inet_peer_idlock is shared
by all inet_getid() users regardless of peers.

One way to fix this is to make ip_id_count an atomic_t instead
of __u16, and use atomic_add_return().

In order to keep sizeof(struct inet_peer) = 64 on 64bit arches
tcp_ts_stamp is also converted to __u32 instead of "unsigned long".
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c1409a0

04 11月, 2009 1 次提交

net: cleanup include/net · fd2c3ef7

由 Eric Dumazet 提交于 11月 03, 2009

This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd2c3ef7

12 6月, 2008 1 次提交

net: remove CVS keywords · 0b040829

由 Adrian Bunk 提交于 6月 10, 2008

This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b040829

13 11月, 2007 1 次提交

[INET]: Use list_head-s in inetpeer.c · d71209de

由 Pavel Emelyanov 提交于 11月 12, 2007

The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71209de

20 10月, 2006 1 次提交

[IPV4] inet_peer: Group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS · 78d79423

由 Eric Dumazet 提交于 10月 20, 2006

Lot of routers/embedded devices still use CPUS with 16/32 bytes cache
lines.  (486, Pentium, ...  PIII) It makes sense to group together
fields used at lookup time so they fit in one cache line.  This reduce
cache footprint and speedup lookups.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78d79423

16 10月, 2006 1 次提交
- E
  [NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire() · 4663afe2
  由 Eric Dumazet 提交于 10月 12, 2006
```
1) shrink struct inet_peer on 64 bits platforms.
```
  4663afe2
29 9月, 2006 1 次提交

[IPV4]: inetpeer annotations · 53576d9b

由 Al Viro 提交于 9月 26, 2006

This one is interesting - we use net-endian value as search key, but
order the tree by *host-endian* comparisons of keys.  OK since we only
care about lookups.  Annotated inet_getpeer() and friends.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53576d9b

04 1月, 2006 1 次提交

[IPV4]: Safer reassembly · 89cee8b1

由 Herbert Xu 提交于 12月 13, 2005

Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.

(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)

This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NArthur Kepner <akepner@sgi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89cee8b1

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功