提交 · 9eb43e765368f835d92c93844ebce30da7efeb84 · openeuler / raspberrypi-kernel

04 8月, 2012 1 次提交

ipv4: Introduce IN_DEV_NET_ROUTE_LOCALNET · 9eb43e76

由 Eric Dumazet 提交于 8月 03, 2012

performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET()
call done in ip_route_input_slow(), because of multiple dereferences,
even if cache lines are clean and available in cpu caches.

Since we already have the 'net' pointer, introduce
IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences
(dev_net(in_dev->dev))

Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr
or/and daddr are loopback addresse.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb43e76

02 8月, 2012 1 次提交

ipv4: route.c cleanup · e33cdac0

由 Eric Dumazet 提交于 8月 01, 2012

Remove unused includes after IP cache removal
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e33cdac0

01 8月, 2012 4 次提交

ipv4: Properly purge netdev references on uncached routes. · caacf05e

由 David S. Miller 提交于 7月 31, 2012

When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caacf05e

D
ipv4: Cache routes in nexthop exception entries. · c5038a83
由 David S. Miller 提交于 7月 31, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c5038a83

ipv4: percpu nh_rth_output cache · d26b3a7c

由 Eric Dumazet 提交于 7月 31, 2012

Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26b3a7c

ipv4: Restore old dst_free() behavior. · 54764bb6

由 Eric Dumazet 提交于 7月 31, 2012

commit 404e0a8b (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54764bb6

31 7月, 2012 1 次提交

net: ipv4: fix RCU races on dst refcounts · 404e0a8b

由 Eric Dumazet 提交于 7月 29, 2012

commit c6cffba4 (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.

crashes happen on tcp workloads if routes are added/deleted at the same
time.

The dst_free() calls from free_fib_info_rcu() are clearly racy.

We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :

Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst

Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)

rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.

With this patch, my machines survive various benchmarks.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

404e0a8b

27 7月, 2012 1 次提交

ipv4: Fix input route performance regression. · c6cffba4

由 David S. Miller 提交于 7月 26, 2012

With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.
Reported-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6cffba4

26 7月, 2012 1 次提交

ipv4: rt_cache_valid must check expired routes · 4331debc

由 Eric Dumazet 提交于 7月 25, 2012

commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
introduced rt_cache_valid() helper. It unfortunately doesn't check if
route is expired before caching it.

I noticed sk_setup_caps() was constantly called on a tcp workload.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4331debc

24 7月, 2012 3 次提交

ipv4: Change rt->rt_iif encoding. · 13378cad

由 David S. Miller 提交于 7月 23, 2012

On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.

Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13378cad

ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b

由 David S. Miller 提交于 7月 23, 2012

Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.

rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.

When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.

This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.

Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92101b3b

ipv4: Remove all RTCF_DIRECTSRC handliing. · fe3edf45

由 David S. Miller 提交于 7月 23, 2012

The last and final kernel user, ICMP address replies,
has been removed.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe3edf45

21 7月, 2012 17 次提交

ipv4: Kill rt->fi · 2860583f

由 David S. Miller 提交于 7月 17, 2012

It's not really needed.

We only grabbed a reference to the fib_info for the sake of fib_info
local metrics.

However, fib_info objects are freed using RCU, as are therefore their
private metrics (if any).

We would have triggered a route cache flush if we eliminated a
reference to a fib_info object in the routing tables.

Therefore, any existing cached routes will first check and see that
they have been invalidated before an errant reference to these
metric values would occur.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2860583f

ipv4: Turn rt->rt_route_iif into rt->rt_is_input. · 9917e1e8

由 David S. Miller 提交于 7月 17, 2012

That is this value's only use, as a boolean to indicate whether
a route is an input route or not.

So implement it that way, using a u16 gap present in the struct
already.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9917e1e8

ipv4: Kill rt->rt_oif · 4fd551d7

由 David S. Miller 提交于 7月 17, 2012

Never actually used.

It was being set on output routes to the original OIF specified in the
flow key used for the lookup.

Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fd551d7

ipv4: Dirty less cache lines in route caching paths. · 93ac5341

由 David S. Miller 提交于 7月 17, 2012

Don't bother incrementing dst->__use and setting dst->lastuse,
they are completely pointless and just slow things down.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93ac5341

D
ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. · ba3f7f04
由 David S. Miller 提交于 7月 17, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ba3f7f04

ipv4: Cache input routes in fib_info nexthops. · d2d68ba9

由 David S. Miller 提交于 7月 17, 2012

Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2d68ba9

ipv4: Cache output routes in fib_info nexthops. · f2bb4bed

由 David S. Miller 提交于 7月 17, 2012

If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2bb4bed

ipv4: Kill routes during PMTU/redirect updates. · ceb33206

由 David S. Miller 提交于 7月 17, 2012

Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb33206

net: Document dst->obsolete better. · f5b0a874

由 David S. Miller 提交于 7月 19, 2012

Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5b0a874

ipv4: Adjust semantics of rt->rt_gateway. · f8126f1d

由 David S. Miller 提交于 7月 13, 2012

In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>

f8126f1d

D
ipv4: Remove 'rt_dst' from 'struct rtable' · f1ce3062
由 David S. Miller 提交于 7月 12, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f1ce3062
D
ipv4: Remove 'rt_mark' from 'struct rtable' · b4869889
由 David Miller 提交于 7月 01, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b4869889
D
ipv4: Kill 'rt_src' from 'struct rtable' · d6c0a4f6
由 David Miller 提交于 7月 01, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d6c0a4f6

ipv4: Remove rt_key_{src,dst,tos} from struct rtable. · 1a00fee4

由 David Miller 提交于 7月 01, 2012

They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a00fee4

ipv4: Kill ip_route_input_noref(). · 38a424e4

由 David Miller 提交于 7月 01, 2012

The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38a424e4

ipv4: Delete routing cache. · 89aef892

由 David S. Miller 提交于 7月 17, 2012

The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89aef892

ipv4: show pmtu in route list · 521f5490

由 Julian Anastasov 提交于 7月 20, 2012

Override the metrics with rt_pmtu
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

521f5490

20 7月, 2012 2 次提交

ipv4: Fix again the time difference calculation · f31fd383

由 Julian Anastasov 提交于 7月 19, 2012

	Fix again the diff value in rt_bind_exception
after collision of two latest patches, my original commit
actually fixed the same problem.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f31fd383

ipv4: use seqlock for nh_exceptions · aee06da6

由 Julian Anastasov 提交于 7月 18, 2012

Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aee06da6

19 7月, 2012 1 次提交
- D
  ipv4: Fix time difference calculation in rt_bind_exception(). · 7fed84f6
  由 David S. Miller 提交于 7月 19, 2012
```
Reported-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  7fed84f6
18 7月, 2012 2 次提交

ipv4: fix rcu splat · 5abf7f7e

由 Eric Dumazet 提交于 7月 17, 2012

free_nh_exceptions() should use rcu_dereference_protected(..., 1)
since its called after one RCU grace period.

Also add some const-ification in recent code.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5abf7f7e

D
ipv4: Fix nexthop exception hash computation. · d3a25c98
由 David S. Miller 提交于 7月 17, 2012
```
Need to mask it with (FNHE_HASH_SIZE - 1).
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d3a25c98

17 7月, 2012 2 次提交

ipv4: Add FIB nexthop exceptions. · 4895c771

由 David S. Miller 提交于 7月 17, 2012

In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.

This is implemented here via nexthop exceptions.

The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5.  A more sophisticated scheme can be
devised if that proves necessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4895c771

net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270

由 David S. Miller 提交于 7月 17, 2012

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6700c270

13 7月, 2012 1 次提交

ipv4: Don't store a rule pointer in fib_result. · 85b91b03

由 David S. Miller 提交于 7月 13, 2012

We only use it to fetch the rule's tclassid, so just store the
tclassid there instead.

This also decreases the size of fib_result by a full 8 bytes on
64-bit.  On 32-bits it's a wash.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85b91b03

12 7月, 2012 3 次提交

D
ipv4: Fix warnings in ip_do_redirect() for some configurations. · 99ee038d
由 David S. Miller 提交于 7月 12, 2012
```
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
99ee038d
D
net: Add dummy dst_ops->redirect method where needed. · b587ee3b
由 David S. Miller 提交于 7月 12, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b587ee3b

ipv4: Kill ip_rt_redirect(). · 1f42539d

由 David S. Miller 提交于 7月 11, 2012

No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f42539d