提交 · 9c150e82ac50a611237bbebd508d17f6347d577c · openeuler / raspberrypi-kernel

27 1月, 2011 1 次提交

net: Implement read-only protection and COW'ing of metrics. · 62fa8a84

由 David S. Miller 提交于 1月 26, 2011

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62fa8a84

14 1月, 2011 1 次提交

netfilter: fix Kconfig dependencies · c7066f70

由 Patrick McHardy 提交于 1月 14, 2011

Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.

Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.
Reported-by: NVladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

c7066f70

15 12月, 2010 1 次提交

net: Abstract default MTU metric calculation behind an accessor. · d33e4553

由 David S. Miller 提交于 12月 14, 2010

Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.

Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d33e4553

14 12月, 2010 1 次提交

net: Abstract default ADVMSS behind an accessor. · 0dbaee3b

由 David S. Miller 提交于 12月 13, 2010

Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().

Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.

For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.

Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates.  This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dbaee3b

13 12月, 2010 2 次提交

ipv4: Don't pre-seed hoplimit metric. · 323e126f

由 David S. Miller 提交于 12月 12, 2010

Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

This allowed several simplifications:

1) The interim dst_metric_hoplimit() can go as it's no longer
   userd.

2) The sysctl_ip_default_ttl entry no longer needs to use
   ipv4_doint_and_flush, since the sysctl is not cached in
   routing cache metrics any longer.

3) ipv4_doint_and_flush no longer needs to be exported and
   therefore can be marked static.

When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.

We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.h
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

323e126f

D
net: Abstract RTAX_HOPLIMIT metric accesses behind helper. · 5170ae82
由 David S. Miller 提交于 12月 12, 2010
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
5170ae82

10 12月, 2010 1 次提交

net: Abstract away all dst_entry metrics accesses. · defb3519

由 David S. Miller 提交于 12月 08, 2010

Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

defb3519

09 11月, 2010 1 次提交

decnet: RCU conversion and get rid of dev_base_lock · fc766e4c

由 Eric Dumazet 提交于 10月 29, 2010

While tracking dev_base_lock users, I found decnet used it in
dnet_select_source(), but for a wrong purpose:

Writers only hold RTNL, not dev_base_lock, so readers must use RCU if
they cannot use RTNL.

Adds an rcu_head in struct dn_ifaddr and handle proper RCU management.

Adds __rcu annotation in dn_route as well.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc766e4c

28 10月, 2010 1 次提交

ipv4: add __rcu annotations to routes.c · 1c31720a

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c31720a

04 10月, 2010 1 次提交

net: introduce DST_NOCACHE flag · c7d4426a

由 Eric Dumazet 提交于 10月 03, 2010

While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.

When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.

Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.

With the stress test bench, sending 160000000 frames on one neighbour,
results are :

Before patch:

real	2m28.406s
user	0m11.781s
sys	36m17.964s


After patch:

real	1m26.532s
user	0m12.185s
sys	20m3.903s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7d4426a

28 9月, 2010 1 次提交

tunnels: prepare percpu accounting · 290b895e

由 Eric Dumazet 提交于 9月 27, 2010

Tunnels are going to use percpu for their accounting.

They are going to use a new tstats field in net_device.

skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

290b895e

27 9月, 2010 1 次提交

net: reset skb queue mapping when rx'ing over tunnel · 693019e9

由 Tom Herbert 提交于 9月 23, 2010

Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

693019e9

05 6月, 2010 1 次提交

net: check for refcount if pop a stacked dst_entry · 8764ab2c

由 Steffen Klassert 提交于 6月 04, 2010

xfrm triggers a warning if dst_pop() drops a refcount
on a noref dst. This patch changes dst_pop() to
skb_dst_pop(). skb_dst_pop() drops the refcnt only
on a refcounted dst. Also we don't clone the child
dst_entry, so it is not refcounted and we can use
skb_dst_set_noref() in xfrm_output_one().
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8764ab2c

18 5月, 2010 2 次提交

net: Introduce skb_tunnel_rx() helper · d19d56dd

由 Eric Dumazet 提交于 5月 17, 2010

skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d19d56dd

net: add a noref bit on skb dst · 7fee226a

由 Eric Dumazet 提交于 5月 11, 2010

Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fee226a

13 4月, 2010 1 次提交

net: sk_dst_cache RCUification · b6c6712a

由 Eric Dumazet 提交于 4月 08, 2010

With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6c6712a

24 12月, 2009 1 次提交

net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926

由 laurent chavey 提交于 12月 15, 2009

Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.
Signed-off-by: NLaurent Chavey <chavey@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31d12926

16 12月, 2009 1 次提交

tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11

由 David S. Miller 提交于 12月 15, 2009

It creates a regression, triggering badness for SYN_RECV
sockets, for example:

[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c

But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong.  They try to use the
listening socket's 'dst' to probe the route settings.  The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.

This stuff really isn't ready, so the best thing to do is a
full revert.  This reverts the following commits:

f55017a9
022c3f7d
1aba721e
cda42ebd
345cda2f
dc343475
05eaade2
6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>

bb5b7c11

05 11月, 2009 1 次提交

tcp: Use defaults when no route options are available · 6a2a2d6b

由 Gilad Ben-Yossef 提交于 11月 04, 2009

Trying to parse the option of a SYN packet that we have
no route entry for should just use global wide defaults
for route entry options.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Tested-by: Valdis.Kletnieks@vt.edu
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a2a2d6b

04 11月, 2009 1 次提交

net: cleanup include/net · fd2c3ef7

由 Eric Dumazet 提交于 11月 03, 2009

This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd2c3ef7

29 10月, 2009 1 次提交

Add dst_feature to query route entry features · 0c3adfb8

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Adding an accessor to existing  dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c3adfb8

21 10月, 2009 1 次提交

net: Fix for dst_negative_advice · ea94ff3b

由 Krishna Kumar 提交于 10月 19, 2009

dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea94ff3b

02 9月, 2009 1 次提交

netns: embed ip6_dst_ops directly · 86393e52

由 Alexey Dobriyan 提交于 8月 29, 2009

struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86393e52

03 6月, 2009 1 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

26 11月, 2008 1 次提交

netns xfrm: lookup in netns · 52479b62

由 Alexey Dobriyan 提交于 11月 25, 2008

Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52479b62

17 11月, 2008 1 次提交

net: make sure struct dst_entry refcount is aligned on 64 bytes · 5635c10d

由 Eric Dumazet 提交于 11月 16, 2008

As found in the past (commit f1dd9c37
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5635c10d

12 11月, 2008 1 次提交

net: remove struct dst_entry::entry_size · 6bb3ce25

由 Alexey Dobriyan 提交于 11月 11, 2008

Unused after kmem_cache_zalloc() conversion.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb3ce25

29 10月, 2008 1 次提交

net: reduce structures when XFRM=n · def8b4fa

由 Alexey Dobriyan 提交于 10月 28, 2008

ifdef out
* struct sk_buff::sp		(pointer)
* struct dst_entry::xfrm	(pointer)
* struct sock::sk_policy	(2 pointers)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def8b4fa

05 8月, 2008 1 次提交

net: Kill plain NET_XMIT_BYPASS. · cc6533e9

由 David S. Miller 提交于 8月 04, 2008

dst_input() was doing something completely absurd, looping
on skb->dst->input() if NET_XMIT_BYPASS was seen, but these
functions never return such an error.

And as a result plain ole' NET_XMIT_BYPASS has no more
references and can be completely killed off.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc6533e9

19 7月, 2008 1 次提交

tcp: RTT metrics scaling · c1e20f7c

由 Stephen Hemminger 提交于 7月 18, 2008

Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
kernel units (jiffies) and this leaks out through the netlink API to
user space where the units for jiffies are unknown.

This patches changes the kernel to convert to/from milliseconds. This
changes the ABI, but milliseconds seemed like the most natural unit
for these parameters.  Values available via syscall in
/proc/net/rt_cache and netlink will be in milliseconds.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1e20f7c

28 3月, 2008 1 次提交

[NET]: uninline dst_release · 8d330868

由 Ilpo Järvinen 提交于 3月 27, 2008

Codiff stats (allyesconfig, v2.6.24-mm1):
-16420  187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
-7257  186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
dst_release                   |  +40
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d330868

13 3月, 2008 1 次提交

[NET]: Fix tbench regression in 2.6.25-rc1 · f1dd9c37

由 Zhang Yanmin 提交于 3月 12, 2008

Comparing with kernel 2.6.24, tbench result has regression with
2.6.25-rc1.

1) On 2 quad-core processor stoakley: 4%.
2) On 4 quad-core processor tigerton: more than 30%.

bisect located below patch.

b4ce9277 is first bad commit
commit b4ce9277
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Nov 13 21:33:32 2007 -0800

    [IPV6]: Move nfheader_len into rt6_info

    The dst member nfheader_len is only used by IPv6.  It's also currently
    creating a rather ugly alignment hole in struct dst.  Therefore this patch
    moves it from there into struct rt6_info.

Above patch changes the cache line alignment, especially member
__refcnt. I did a testing by adding 2 unsigned long pading before
lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
cache line. The performance is recovered.

I created a patch to rearrange the members in struct dst_entry.

With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
   sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
   tested many patches on my 16-core tigerton by moving tclassid to
   different place. It looks like tclassid could also have impact on
   performance.  If moving tclassid before metrics, or just don't move
   tclassid, the performance isn't good. So I move it behind metrics.

2) Add comments before __refcnt.

On 16-core tigerton:

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
better than the one without the patch;

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
better than the one without the patch.

With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
introduce regression.

Thank Eric, Valdis, and David!
Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1dd9c37

29 1月, 2008 8 次提交

[DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64 · 69a73829

由 Eric Dumazet 提交于 1月 22, 2008

On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to
0x180 bytes by SLAB allocator.

We can reduce this to exactly 0x140 bytes, without alignment overhead,
and store 12 struct rtable per PAGE instead of 10.

rate_tokens is currently defined as an "unsigned long", while its
content should not exceed 6*HZ. It can safely be converted to an
unsigned int.

Moving tclassid right after rate_tokens to fill the 4 bytes hole
permits to save 8 bytes on 'struct dst_entry', which finally permits
to save 8 bytes on 'struct rtable'
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69a73829

[NETNS][DST]: Add the network namespace pointer in dst_ops · d4fa26ff

由 Daniel Lezcano 提交于 1月 18, 2008

The network namespace pointer can be stored into the dst_ops structure.
This is usefull when there are multiple instances of the dst_ops for a
protocol. When there are no several instances, this field will be never
used in the protocol. So there is no impact for the protocols which do
implement the network namespaces.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4fa26ff

[NETNS][DST] dst: pass the dst_ops as parameter to the gc functions · 569d3645

由 Daniel Lezcano 提交于 1月 18, 2008

The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

569d3645

[NET]: Remove unused member of dst_entry · 3becd578

由 Rami Rosen 提交于 1月 07, 2008

The info placeholder member of dst_entry seems to be unused in the
network stack.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3becd578

[IPSEC]: Add ICMP host relookup support · 8b7817f3

由 Herbert Xu 提交于 12月 12, 2007

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b7817f3

[IPSEC]: Make xfrm_lookup flags argument a bit-field · 815f4e57

由 Herbert Xu 提交于 12月 12, 2007

This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

815f4e57

[IPSEC]: Merge most of the output path · 862b82c6

由 Herbert Xu 提交于 11月 13, 2007

As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

862b82c6

[NET]: Eliminate duplicate copies of dst_discard · 352e512c

由 Herbert Xu 提交于 11月 13, 2007

We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.

This patch deletes all of them except dst_discard and points all the
users to it.

The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors.  So they should
either all return errors or all return zero.  For now I've stuck with
the majority and picked zero as the return value.

It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

352e512c