提交 · db138908ccff404b9920f18f6244f4bff2368c04 · openeuler / Kernel

13 3月, 2011 2 次提交

D
ipv6: Convert to use flowi6 where applicable. · 4c9483b2
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4c9483b2

net: Put flowi_* prefix on AF independent members of struct flowi · 1d28f42c

由 David S. Miller 提交于 3月 12, 2011

I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d28f42c

10 3月, 2011 1 次提交

ipv6: Don't create clones of host routes. · 7343ff31

由 David S. Miller 提交于 3月 09, 2011

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462

In commit d80bc0fd ("ipv6: Always
clone offlink routes.") we forced the kernel to always clone offlink
routes.

The reason we do that is to make sure we never bind an inetpeer to a
prefixed route.

The logic turned on here has existed in the tree for many years,
but was always off due to a protecting CPP define.  So perhaps
it's no surprise that there is a logic bug here.

The problem is that we canot clone a route that is already a
host route (ie. has DST_HOST set).  Because if we do, an identical
entry already exists in the routing tree and therefore the
ip6_rt_ins() call is going to fail.

This sets off a series of failures and high cpu usage, because when
ip6_rt_ins() fails we loop retrying this operation a few times in
order to handle a race between two threads trying to clone and insert
the same host route at the same time.

Fix this by simply using the route as-is when DST_HOST is set.

Reported-by: slash@ac.auone-net.jp
Reported-by: NErnst Sjöstrand <ernstp@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7343ff31

04 3月, 2011 1 次提交
- D
  ipv6: Use ERR_CAST in addrconf_dst_alloc. · 29546a64
  由 David S. Miller 提交于 3月 03, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  29546a64
02 3月, 2011 2 次提交

xfrm: Handle blackhole route creation via afinfo. · 2774c131

由 David S. Miller 提交于 3月 01, 2011

That way we don't have to potentially do this in every xfrm_lookup()
caller.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2774c131

ipv6: Normalize arguments to ip6_dst_blackhole(). · 69ead7af

由 David S. Miller 提交于 3月 01, 2011

Return a dst pointer which is potentitally error encoded.

Don't pass original dst pointer by reference, pass a struct net
instead of a socket, and elide the flow argument since it is
unnecessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ead7af

26 2月, 2011 2 次提交

H
ipv6: variable next is never used in this function · e9476e95
由 Hagen Paul Pfeifer 提交于 2月 25, 2011
```
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e9476e95

sysctl: ipv6: use correct net in ipv6_sysctl_rtcache_flush · c486da34

由 Lucian Adrian Grijincu 提交于 2月 24, 2011

Before this patch issuing these commands:

  fd = open("/proc/sys/net/ipv6/route/flush")
  unshare(CLONE_NEWNET)
  write(fd, "stuff")

would flush the newly created net, not the original one.

The equivalent ipv4 code is correct (stores the net inside ->extra1).
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c486da34

19 2月, 2011 1 次提交

net: provide default_advmss() methods to blackhole dst_ops · 214f45c9

由 Eric Dumazet 提交于 2月 18, 2011

Commit 0dbaee3b (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()
Reported-by: NGeorge Spelvin <linux@horizon.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

214f45c9

18 2月, 2011 1 次提交

net: Add initial_ref arg to dst_alloc(). · 3c7bd1a1

由 David S. Miller 提交于 2月 16, 2011

This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c7bd1a1

11 2月, 2011 1 次提交

inet: Create a mechanism for upward inetpeer propagation into routes. · 6431cbc2

由 David S. Miller 提交于 2月 07, 2011

If we didn't have a routing cache, we would not be able to properly
propagate certain kinds of dynamic path attributes, for example
PMTU information and redirects.

The reason is that if we didn't have a routing cache, then there would
be no way to lookup all of the active cached routes hanging off of
sockets, tunnels, IPSEC bundles, etc.

Consider the case where we created a cached route, but no inetpeer
entry existed and also we were not asked to pre-COW the route metrics
and therefore did not force the creation a new inetpeer entry.

If we later get a PMTU message, or a redirect, and store this
information in a new inetpeer entry, there is no way to teach that
cached route about the newly existing inetpeer entry.

The facilities implemented here handle this problem.

First we create a generation ID.  When we create a cached route of any
kind, we remember the generation ID at the time of attachment.  Any
time we force-create an inetpeer entry in response to new path
information, we bump that generation ID.

The dst_ops->check() callback is where the knowledge of this event
is propagated.  If the global generation ID does not equal the one
stored in the cached route, and the cached route has not attached
to an inetpeer yet, we look it up and attach if one is found.  Now
that we've updated the cached route's information, we update the
route's generation ID too.

This clears the way for implementing PMTU and redirects directly in
the inetpeer cache.  There is absolutely no need to consult cached
route information in order to maintain this information.

At this point nothing bumps the inetpeer genids, that comes in the
later changes which handle PMTUs and redirects using inetpeers.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6431cbc2

09 2月, 2011 1 次提交

net: Kill NETEVENT_PMTU_UPDATE. · 8d13a2a9

由 David S. Miller 提交于 2月 08, 2011

Nobody actually does anything in response to the event,
so just kill it off.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d13a2a9

01 2月, 2011 1 次提交

net: Add default_mtu() methods to blackhole dst_ops · ec831ea7

由 Roland Dreier 提交于 1月 31, 2011

When an IPSEC SA is still being set up, __xfrm_lookup() will return
-EREMOTE and so ip_route_output_flow() will return a blackhole route.
This can happen in a sndmsg call, and after d33e4553 ("net: Abstract
default MTU metric calculation behind an accessor.") this leads to a
crash in ip_append_data() because the blackhole dst_ops have no
default_mtu() method and so dst_mtu() calls a NULL pointer.

Fix this by adding default_mtu() methods (that simply return 0, matching
the old behavior) to the blackhole dst_ops.

The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
looks to be needed as well.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec831ea7

28 1月, 2011 2 次提交

net: Store ipv4/ipv6 COW'd metrics in inetpeer cache. · 06582540

由 David S. Miller 提交于 1月 27, 2011

Please note that the IPSEC dst entry metrics keep using
the generic metrics COW'ing mechanism using kmalloc/kfree.

This gives the IPSEC routes an opportunity to use metrics
which are unique to their encapsulated paths.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06582540

ipv6: Remove route peer binding assertions. · 8f2771f2

由 David S. Miller 提交于 1月 27, 2011

They are bogus.  The basic idea is that I wanted to make sure
that prefixed routes never bind to peers.

The test I used was whether RTF_CACHE was set.

But first of all, the RTF_CACHE flag is set at different spots
depending upon which ip6_rt_copy() caller you're talking about.

I've validated all of the code paths, and even in the future
where we bind peers more aggressively (for route metric COW'ing)
we never bind to prefix'd routes, only fully specified ones.
This even applies when addrconf or icmp6 routes are allocated.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f2771f2

27 1月, 2011 1 次提交

net: Implement read-only protection and COW'ing of metrics. · 62fa8a84

由 David S. Miller 提交于 1月 26, 2011

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62fa8a84

25 1月, 2011 1 次提交

ipv6: Always clone offlink routes. · d80bc0fd

由 David S. Miller 提交于 1月 24, 2011

Do not handle PMTU vs. route lookup creation any differently
wrt. offlink routes, always clone them.
Reported-by: NPK <runningdoglackey@yahoo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d80bc0fd

19 12月, 2010 1 次提交

ipv6: fib6_ifdown cleanup · bc3ef660

由 stephen hemminger 提交于 12月 16, 2010

Remove (unnecessary) casts to make code cleaner.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc3ef660

17 12月, 2010 1 次提交

ipv6: delete expired route in ip6_pmtu_deliver · d3052b55

由 Andrey Vagin 提交于 12月 11, 2010

The first big packets sent to a "low-MTU" client correctly
triggers the creation of a temporary route containing the reduced MTU.

But after the temporary route has expired, new ICMP6 "packet too big"
will be sent, rt6_pmtu_discovery will find the previous EXPIRED route
check that its mtu isn't bigger then in icmp packet and do nothing
before the temporary route will not deleted by gc.

I make the simple experiment:
while :; do
    time ( dd if=/dev/zero bs=10K count=1 | ssh hostname dd of=/dev/null ) || break;
done

The "time" reports real 0m0.197s if a temporary route isn't expired, but
it reports real 0m52.837s (!!!!) immediately after a temporare route has
expired.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3052b55

15 12月, 2010 1 次提交

net: Abstract default MTU metric calculation behind an accessor. · d33e4553

由 David S. Miller 提交于 12月 14, 2010

Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.

Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d33e4553

14 12月, 2010 1 次提交

net: Abstract default ADVMSS behind an accessor. · 0dbaee3b

由 David S. Miller 提交于 12月 13, 2010

Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().

Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.

For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.

Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates.  This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dbaee3b

13 12月, 2010 3 次提交

ipv6: Demark default hoplimit as zero. · a02e4b7d

由 David S. Miller 提交于 12月 12, 2010

This is for consistency with ipv4.  Using "-1" makes
no sense.

It was made this way a long time ago merely to be consistent
with how the ipv6 socket hoplimit "default" is stored.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a02e4b7d

D
net: Abstract RTAX_HOPLIMIT metric accesses behind helper. · 5170ae82
由 David S. Miller 提交于 12月 12, 2010
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
5170ae82
D
ipv6: Use ip6_dst_hoplimit() instead of direct dst_metric() calls. · abbf46ae
由 David S. Miller 提交于 12月 12, 2010
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
abbf46ae

10 12月, 2010 1 次提交

net: Abstract away all dst_entry metrics accesses. · defb3519

由 David S. Miller 提交于 12月 08, 2010

Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

defb3519

01 12月, 2010 1 次提交
- D
  ipv6: Add infrastructure to bind inet_peer objects to routes. · b3419363
  由 David S. Miller 提交于 11月 30, 2010
```
They are only allowed on cached ipv6 routes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b3419363
29 11月, 2010 1 次提交

ipv6: kill two unused macro definition · d3c15cab

由 Shan Wei 提交于 11月 24, 2010

1. IPV6_TLV_TEL_DST_SIZE
This has not been using for several years since created.

2. RT6_INFO_LEN
commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

commit 33120b30
Author: Alexey Dobriyan <adobriyan@sw.ru>
Date:   Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3c15cab

18 11月, 2010 1 次提交

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

13 11月, 2010 1 次提交

ipv6: Warn users if maximum number of routes is reached. · 40385653

由 Ben Greear 提交于 11月 08, 2010

This gives users at least some clue as to what the problem
might be and how to go about fixing it.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40385653

04 11月, 2010 1 次提交

net dst: fix percpu_counter list corruption and poison overwritten · 41bb78b4

由 Xiaotian Feng 提交于 11月 02, 2010

There're some percpu_counter list corruption and poison overwritten warnings
in recent kernel, which is resulted by fc66f95c.

commit fc66f95c switches to use percpu_counter, in ip6_route_net_init, kernel
init the percpu_counter for dst entries, but, the percpu_counter is never destroyed
in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter
is still on the list, then if we insert/remove other percpu_counter, list corruption
resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of
the freed value, the poison overwritten is resulted then.

With the following patch, the percpu_counter list corruption and poison overwritten
warnings disappeared.
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41bb78b4

12 10月, 2010 1 次提交

net dst: use a percpu_counter to track entries · fc66f95c

由 Eric Dumazet 提交于 10月 08, 2010

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real	0m45.570s
user	0m15.525s
sys	9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real	0m41.841s
user	0m15.261s
sys	8m45.949s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc66f95c

04 10月, 2010 1 次提交

net: Fix IPv6 PMTU disc. w/ asymmetric routes · ae878ae2

由 Maciej Żenczykowski 提交于 10月 03, 2010

Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae878ae2

29 9月, 2010 1 次提交

ipv6: Implement Any-IP support for IPv6. · ab79ad14

由 Maciej Żenczykowski 提交于 9月 27, 2010

AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.

An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.

Can be setup as follows:
  ip -6 rule from all iif eth0 lookup 200
  ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab79ad14

28 9月, 2010 1 次提交

ipv6: add IPv6 to neighbour table overflow warning · 7e1b33e5

由 Ulrich Weber 提交于 9月 27, 2010

IPv4 and IPv6 have separate neighbour tables, so
the warning messages should be distinguishable.

[ Add a suitable message prefix on the ipv4 side as well -DaveM ]
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e1b33e5

24 9月, 2010 1 次提交

net: return operator cleanup · a02cec21

由 Eric Dumazet 提交于 9月 22, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a02cec21

15 8月, 2010 1 次提交

ipv6: remove sysctl jiffies conversion on gc_elasticity and min_adv_mss · f3d3f616

由 Min Zhang 提交于 8月 14, 2010

sysctl output ipv6 gc_elasticity and min_adv_mss as values divided by
HZ. However, they are not in unit of jiffies, since ip6_rt_min_advmss
refers to packet size and ip6_rt_fc_elasticity is used as scaler as in
expire>>ip6_rt_gc_elasticity, so replace the jiffies conversion
handler will regular handler for them.

This has impact on scripts that are currently working assuming the
divide by HZ, will yield different results with this patch in place.
Signed-off-by: NMin Zhang <mzhang@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3d3f616

15 6月, 2010 1 次提交

ipv6: RCU changes in ipv6_get_mtu() and ip6_dst_hoplimit() · c68f24cc

由 Eric Dumazet 提交于 6月 14, 2010

Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c68f24cc

11 6月, 2010 1 次提交

net-next: remove useless union keyword · d8d1f30b

由 Changli Gao 提交于 6月 10, 2010

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8d1f30b

29 5月, 2010 1 次提交

IPv6: fix Mobile IPv6 regression · 6057fd78

由 Brian Haley 提交于 5月 28, 2010

Commit f4f914b5 (net: ipv6 bind to device issue) caused
a regression with Mobile IPv6 when it changed the meaning
of fl->oif to become a strict requirement of the route
lookup.  Instead, only force strict mode when
sk->sk_bound_dev_if is set on the calling socket, getting
the intended behavior and fixing the regression.
Tested-by: NArnaud Ebalard <arno@natisbad.org>
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6057fd78

18 5月, 2010 1 次提交

net: Remove unnecessary returns from void function()s · 3fa21e07

由 Joe Perches 提交于 5月 17, 2010

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fa21e07

openeuler / Kernel 12 个月 前同步成功

openeuler / Kernel
12 个月前同步成功