提交 · 952e57ba3769d6fc6139b8a99c32ea2bb63f23e9 · openanolis / cloud-kernel

14 6月, 2009 1 次提交

net: use a deferred timer in rt_check_expire · 125bb8f5

由 Eric Dumazet 提交于 6月 11, 2009

For the sake of power saver lovers, use a deferrable timer to fire
rt_check_expire()

As some big routers cache equilibrium depends on garbage collection
done in time, we take into account elapsed time between two
rt_check_expire() invocations to adjust the amount of slots we have to
check.

Based on an initial idea and patch from Tero Kristo
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NTero Kristo <tero.kristo@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

125bb8f5

03 6月, 2009 2 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

net: skb->rtable accessor · 511c3f92

由 Eric Dumazet 提交于 6月 02, 2009

Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

511c3f92

21 5月, 2009 2 次提交

net: fix rtable leak in net/ipv4/route.c · 1ddbcb00

由 Eric Dumazet 提交于 5月 19, 2009

Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: NAlexander V. Lukyanov <lav@yar.ru>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ddbcb00

net: fix length computation in rt_check_expire() · cf8da764

由 Eric Dumazet 提交于 5月 19, 2009

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf8da764

27 4月, 2009 1 次提交

ipv4: Limit size of route cache hash table · c9503e0f

由 Anton Blanchard 提交于 4月 27, 2009

Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:

IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.

With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.

With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:

IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)

With this patch rt_worker_func now takes 0.460 ms on the same system.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9503e0f

25 2月, 2009 1 次提交

alloc_percpu: add align argument to __alloc_percpu, fix · 0dcec8c2

由 Ingo Molnar 提交于 2月 25, 2009

Impact: build fix

API was changed, but not all usage sites were converted:

 net/ipv4/route.c: In function ‘ip_rt_init’:
 net/ipv4/route.c:3379: error: too few arguments to function ‘__alloc_percpu’

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0dcec8c2

01 2月, 2009 1 次提交

net: replace uses of __constant_{endian} · 09640e63

由 Harvey Harrison 提交于 2月 01, 2009

Base versions handle constant folding now.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09640e63

23 1月, 2009 1 次提交

netns: ipmr: enable namespace support in ipv4 multicast routing code · 4feb88e5

由 Benjamin Thery 提交于 1月 22, 2009

This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv4 multicast routing code.

This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or
mfc_caches depending on the routines' contexts.

Some routines receive a new 'struct net' parameter to propagate the current
netns:
* vif_add/vif_delete
* ipmr_new_tunnel
* mroute_clean_tables
* ipmr_cache_find
* ipmr_cache_report
* ipmr_cache_unresolved
* ipmr_mfc_add/ipmr_mfc_delete
* ipmr_get_route
* rt_fill_info (in route.c)
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4feb88e5

30 12月, 2008 1 次提交

cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net · 0f23174a

由 Rusty Russell 提交于 12月 29, 2008

In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in iterators
and other comparisons.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f23174a

26 11月, 2008 1 次提交

netns xfrm: lookup in netns · 52479b62

由 Alexey Dobriyan 提交于 11月 25, 2008

Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52479b62

12 11月, 2008 1 次提交

net: remove struct dst_entry::entry_size · 6bb3ce25

由 Alexey Dobriyan 提交于 11月 11, 2008

Unused after kmem_cache_zalloc() conversion.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb3ce25

04 11月, 2008 1 次提交

net: '&' redux · 6d9f239a

由 Alexey Dobriyan 提交于 11月 03, 2008

I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.

So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d9f239a

31 10月, 2008 1 次提交

net: replace NIPQUAD() in net/ipv4/ net/ipv6/ · 673d57e7

由 Harvey Harrison 提交于 10月 31, 2008

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

673d57e7

29 10月, 2008 2 次提交

net: don't use INIT_RCU_HEAD · 93adcc80

由 Alexey Dobriyan 提交于 10月 28, 2008

call_rcu() will unconditionally rewrite RCU head anyway.
Applies to 
	struct neigh_parms
	struct neigh_table
	struct net
	struct cipso_v4_doi
	struct in_ifaddr
	struct in_device
	rt->u.dst
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93adcc80

net: reduce structures when XFRM=n · def8b4fa

由 Alexey Dobriyan 提交于 10月 28, 2008

ifdef out
* struct sk_buff::sp		(pointer)
* struct dst_entry::xfrm	(pointer)
* struct sock::sk_policy	(2 pointers)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def8b4fa

28 10月, 2008 1 次提交

net: implement emergency route cache rebulds when gc_elasticity is exceeded · 1080d709

由 Neil Horman 提交于 10月 27, 2008

This is a patch to provide on demand route cache rebuilding. Currently, our
route cache is rebulid periodically regardless of need. This introduced
unneeded periodic latency. This patch offers a better approach. Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire. Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace. This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected. This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all

Tested successfully by me.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1080d709

17 10月, 2008 2 次提交

ipv4: Add a missing rcu_assign_pointer() in routing cache. · 00269b54

由 Eric Dumazet 提交于 10月 16, 2008

rt_intern_hash() is doing an update of a RCU guarded hash chain
without using rcu_assign_pointer() or equivalent barrier.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00269b54

sysctl: simplify ->strategy · f221e726

由 Alexey Dobriyan 提交于 10月 15, 2008

name and nlen parameters passed to ->strategy hook are unused, remove
them.  In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ]
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f221e726

01 10月, 2008 1 次提交

ipv4: Loosen source address check on IPv4 output · a210d01a

由 Julian Anastasov 提交于 10月 01, 2008

ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.

This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a210d01a

28 8月, 2008 1 次提交

ip: speedup /proc/net/rt_cache handling · a6272665

由 Eric Dumazet 提交于 8月 28, 2008

When scanning route cache hash table, we can avoid taking locks for
empty buckets.  Both /proc/net/rt_cache and NETLINK RTM_GETROUTE
interface are taken into account.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6272665

27 8月, 2008 1 次提交

ipv4: mode 0555 in ipv4_skeleton · d994af0d

由 Hugh Dickins 提交于 8月 27, 2008

vpnc on today's kernel says Cannot open "/proc/sys/net/ipv4/route/flush":
d--------- 0 root root 0 2008-08-26 11:32 /proc/sys/net/ipv4/route
d--------- 0 root root 0 2008-08-26 19:16 /proc/sys/net/ipv4/neigh
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d994af0d

26 8月, 2008 1 次提交

ipv4: sysctl fixes · 2f4520d3

由 Al Viro 提交于 8月 25, 2008

net.ipv4.neigh should be a part of skeleton to avoid ordering problems
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f4520d3

16 8月, 2008 1 次提交

ipv4: Disable route secret interval on zero interval · c6153b5b

由 Herbert Xu 提交于 8月 15, 2008

Let me first state that disabling the route cache hash rebuild
should not be done without extensive analysis on the risk profile
and careful deliberation.

However, there are times when this can be done safely or for
testing.  For example, when you have mechanisms for ensuring
that offending parties do not exist in your network.

This patch lets the user disable the rebuild if the interval is
set to zero.  This also incidentally fixes a divide-by-zero error
with name-spaces.

In addition, this patch makes the effect of an interval change
immediate rather than it taking effect at the next rebuild as
is currently the case.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6153b5b

07 8月, 2008 1 次提交
- D
  ipv4: Fix over-ifdeffing of ip_static_sysctl_init. · 11d46123
  由 David S. Miller 提交于 8月 06, 2008
```
Noticed by Paulius Zaleckas.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  11d46123
06 8月, 2008 1 次提交

ipv4: replace dst_metric() with dst_mtu() in net/ipv4/route.c. · 6d273f8d

由 Rami Rosen 提交于 8月 06, 2008

This patch replaces dst_metric() with dst_mtu() in net/ipv4/route.c.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d273f8d

01 8月, 2008 2 次提交

A
[PATCH] ipv4_static_sysctl_init() should be under CONFIG_SYSCTL · a1bc6eb4
由 Al Viro 提交于 7月 30, 2008
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a1bc6eb4

net/ipv4/route.c: fix build error · 8a9204db

由 Ingo Molnar 提交于 7月 31, 2008

fix:

net/ipv4/route.c: In function 'ip_static_sysctl_init':
net/ipv4/route.c:3225: error: 'ipv4_route_path' undeclared (first use in this function)
net/ipv4/route.c:3225: error: (Each undeclared identifier is reported only once
net/ipv4/route.c:3225: error: for each function it appears in.)
net/ipv4/route.c:3225: error: 'ipv4_route_table' undeclared (first use in this function)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a9204db

28 7月, 2008 1 次提交

missing bits of net-namespace / sysctl · eeb61f71

由 Al Viro 提交于 7月 27, 2008

Piss-poor sysctl registration API strikes again, film at 11...

What we really need is _pathname_ required to be present in already
registered table, so that kernel could warn about bad order.  That's the
next target for sysctl stuff (and generally saner and more explicit
order of initialization of ipv[46] internals wouldn't hurt either).

For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and make sure
that sufficient skeleton is there before we start registering per-net
sysctls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eeb61f71

27 7月, 2008 2 次提交

net: missing bits of net-namespace / sysctl · 6f9f489a

由 Al Viro 提交于 7月 27, 2008

Piss-poor sysctl registration API strikes again, film at 11...
What we really need is _pathname_ required to be present in
already registered table, so that kernel could warn about bad
order.  That's the next target for sysctl stuff (and generally
saner and more explicit order of initialization of ipv[46]
internals wouldn't hurt either).

For the time being, here are full fixups required by ..._rotable()
stuff; we make per-net sysctl sets descendents of "ro" one and
make sure that sufficient skeleton is there before we start registering
per-net sysctls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9f489a

netns: fix ip_rt_frag_needed rt_is_expired · 6c3b8fc6

由 Hugh Dickins 提交于 7月 26, 2008

Running recent kernels, and using a particular vpn gateway, I've been
having to edit my mails down to get them accepted by the smtp server.

Git bisect led to commit e84f84f2 -
netns: place rt_genid into struct net.  The conversion from a != test
to rt_is_expired() put one negative too many: and now my mail works.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c3b8fc6

17 7月, 2008 1 次提交

mib: add net to IP_INC_STATS_BH · 7c73a6fa

由 Pavel Emelyanov 提交于 7月 16, 2008

Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c73a6fa

08 7月, 2008 1 次提交

ipv4: remove flush_mutex from ipv4_sysctl_rtcache_flush · 81c684d1

由 Denis V. Lunev 提交于 7月 08, 2008

It is possible to avoid locking at all in ipv4_sysctl_rtcache_flush by
defining local ctl_table on the stack.

The patch is based on the suggestion from Eric W. Biederman.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81c684d1

06 7月, 2008 7 次提交

netns: selective flush of rt_cache · 32cb5b4e

由 Denis V. Lunev 提交于 7月 05, 2008

dst cache is marked as expired on the per/namespace basis by previous
path. Right now we have to implement selective cache shrinking. This
procedure has been ported from older OpenVz codebase.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32cb5b4e

netns: place rt_genid into struct net · e84f84f2

由 Denis V. Lunev 提交于 7月 05, 2008

Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e84f84f2

ipv4: pass current value of rt_genid into rt_hash · b00180de

由 Denis V. Lunev 提交于 7月 05, 2008

Basically, there is no difference to atomic_read internally or pass it as
a parameter as rt_hash is inline.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b00180de

D
netns: add struct net parameter to rt_cache_invalidate · 86c657f6
由 Denis V. Lunev 提交于 7月 05, 2008
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
86c657f6
D
netns: make rt_secret_rebuild timer per namespace · 9f5e97e5
由 Denis V. Lunev 提交于 7月 05, 2008
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9f5e97e5
D
netns: register net.ipv4.route.flush in each namespace · 39a23e75
由 Denis V. Lunev 提交于 7月 05, 2008
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
39a23e75

ipv4: remove static flush_delay variable · 639e104f

由 Denis V. Lunev 提交于 7月 05, 2008

flush delay is used as an external storage for net.ipv4.route.flush sysctl
entry. It is write-only.

The ctl_table->data for this entry is used once. Fix this case to point
to the stack to remove global variable. Do this to avoid additional
variable on struct net in the next patch.

Possible race (as it was before) accessing this local variable is removed
using flush_mutex.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

639e104f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功