提交 · 9a9c56cb34e65000d1f0a4b7553399bfcf7c5a52 · openeuler / raspberrypi-kernel

22 3月, 2013 1 次提交

rtnetlink: Remove passing of attributes into rtnl_doit functions · 661d2967

由 Thomas Graf 提交于 3月 21, 2013

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

661d2967

20 2月, 2013 1 次提交

net: ipv4: fix waring -Wunused-variable · 082c7ca4

由 Gao feng 提交于 2月 19, 2013

the vars ip_rt_gc_timeout is used only when
CONFIG_SYSCTL is selected.

move these vars into CONFIG_SYSCTL.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

082c7ca4

19 2月, 2013 1 次提交

net: proc: change proc_net_fops_create to proc_create · d4beaa66

由 Gao feng 提交于 2月 18, 2013

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4beaa66

23 1月, 2013 1 次提交

ipv4: Fix route refcount on pmtu discovery · b44108db

由 Steffen Klassert 提交于 1月 22, 2013

git commit 9cb3a50c (ipv4: Invalidate the socket cached route on
pmtu events if possible) introduced a refcount problem. We don't
get a refcount on the route if we get it from__sk_dst_get(), but
we need one if we want to reuse this route because __sk_dst_set()
releases the refcount of the old route. This patch adds proper
refcount handling for that case. We introduce a 'new' flag to
indicate that we are going to use a new route and we release the
old route only if we replace it by a new one.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b44108db

22 1月, 2013 1 次提交

ipv4: Invalidate the socket cached route on pmtu events if possible · 9cb3a50c

由 Steffen Klassert 提交于 1月 21, 2013

The route lookup in ipv4_sk_update_pmtu() might return a route
different from the route we cached at the socket. This is because
standart routes are per cpu, so each cpu has it's own struct rtable.
This means that we do not invalidate the socket cached route if the
NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
uses. As a result, the cached route reused until we disconnect.

With this patch we invalidate the socket cached route if possible.
If the socket is owened by the user, we can't update the cached
route directly. A followup patch will implement socket release
callback functions for datagram sockets to handle this case.
Reported-by: NYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cb3a50c

17 1月, 2013 2 次提交

ipv4: Don't update the pmtu on mtu locked routes · fa1e492a

由 Steffen Klassert 提交于 1月 16, 2013

Routes with locked mtu should not use learned pmtu informations,
so do not update the pmtu on these routes.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa1e492a

ipv4: Remove output route check in ipv4_mtu · 38d523e2

由 Steffen Klassert 提交于 1月 16, 2013

The output route check was introduced with git commit 261663b0
(ipv4: Don't use the cached pmtu informations for input routes)
during times when we cached the pmtu informations on the
inetpeer. Now the pmtu informations are back in the routes,
so this check is obsolete. It also had some unwanted side effects,
as reported by Timo Teras and Lukas Tribus.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38d523e2

08 12月, 2012 1 次提交

ipv4/route/rtnl: get mcast attributes when dst is multicast · 8caaf7b6

由 Nicolas Dichtel 提交于 12月 04, 2012

Commit f1ce3062 (ipv4: Remove 'rt_dst' from 'struct rtable') removes the
call to ipmr_get_route(), which will get multicast parameters of the route.

I revert the part of the patch that remove this call. I think the goal was only
to get rid of rt_dst field.

The patch is only compiled-tested. My first idea was to remove ipmr_get_route()
because rt_fill_info() was the only user, but it seems the previous patch cleans
the code a bit too much ;-)
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8caaf7b6

23 11月, 2012 1 次提交

ipv4: do not cache looped multicasts · 63617421

由 Julian Anastasov 提交于 11月 22, 2012

	Starting from 3.6 we cache output routes for
multicasts only when using route to 224/4. For local receivers
we can set RTCF_LOCAL flag depending on the membership but
in such case we use maddr and saddr which are not caching
keys as before. Additionally, we can not use same place to
cache routes that differ in RTCF_LOCAL flag value.

	Fix it by caching only RTCF_MULTICAST entries
without RTCF_LOCAL (send-only, no loopback). As a side effect,
we avoid unneeded lookup for fnhe when not caching because
multicasts are not redirected and they do not learn PMTU.

	Thanks to Maxime Bizon for showing the caching
problems in __mkroute_output for 3.6 kernels: different
RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
ip_output call and the visible problem is that traffic can
not reach local receivers via loopback.
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Tested-by: NMaxime Bizon <mbizon@freebox.fr>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63617421

19 11月, 2012 1 次提交

net: Don't export sysctls to unprivileged users · 464dc801

由 Eric W. Biederman 提交于 11月 16, 2012

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464dc801

13 11月, 2012 1 次提交

xfrm: Fix the gc threshold value for ipv4 · 703fb94e

由 Steffen Klassert 提交于 11月 13, 2012

The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

703fb94e

19 10月, 2012 1 次提交

ipv4: Fix flushing of cached routing informations · 13d82bf5

由 Steffen Klassert 提交于 10月 17, 2012

Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13d82bf5

11 10月, 2012 1 次提交

ipv4: fix route mark sparse warning · 68aaed54

由 stephen hemminger 提交于 10月 10, 2012

Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_mark
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68aaed54

09 10月, 2012 7 次提交

ipv4: Add FLOWI_FLAG_KNOWN_NH · c92b9655

由 Julian Anastasov 提交于 10月 08, 2012

Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92b9655

ipv4: introduce rt_uses_gateway · 155e8336

由 Julian Anastasov 提交于 10月 08, 2012

Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

155e8336

ipv4: make sure nh_pcpu_rth_output is always allocated · f8a17175

由 Julian Anastasov 提交于 10月 08, 2012

Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8a17175

ipv4: fix sending of redirects · e81da0e1

由 Julian Anastasov 提交于 10月 08, 2012

After "Cache input routes in fib_info nexthops" (commit
d2d68ba9) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b8) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.

	As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.

	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.

	After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e81da0e1

ipv4: Don't report stale pmtu values to userspace · ee9a8f7a

由 Steffen Klassert 提交于 10月 08, 2012

We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee9a8f7a

ipv4: Don't create nh exeption when the device mtu is smaller than the reported pmtu · 7f92d334

由 Steffen Klassert 提交于 10月 07, 2012

When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f92d334

ipv4: Always invalidate or update the route on pmtu events · d851c12b

由 Steffen Klassert 提交于 10月 07, 2012

Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.

With this patch we always invalidate or update the route on pmtu events.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d851c12b

19 9月, 2012 3 次提交

netns: move net->ipv4.rt_genid to net->rt_genid · b42664f8

由 Nicolas Dichtel 提交于 9月 10, 2012

This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b42664f8

net: rt_cache_flush() cleanup · 2885da72

由 Eric Dumazet 提交于 9月 07, 2012

We dont use jhash anymore since route cache removal,
so we can get rid of get_random_bytes() calls for rt_genid
changes.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2885da72

ipv4/route: arg delay is useless in rt_cache_flush() · bafa6d9d

由 Nicolas Dichtel 提交于 9月 07, 2012

Since route cache deletion (89aef892), delay is no
more used. Remove it.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bafa6d9d

11 9月, 2012 1 次提交

netlink: Rename pid to portid to avoid confusion · 15e47304

由 Eric W. Biederman 提交于 9月 07, 2012

It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e47304

08 9月, 2012 2 次提交

net: rt_cache_flush() cleanup · ba8bd0ea

由 Eric Dumazet 提交于 9月 07, 2012

We dont use jhash anymore since route cache removal,
so we can get rid of get_random_bytes() calls for rt_genid
changes.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba8bd0ea

ipv4/route: arg delay is useless in rt_cache_flush() · 4ccfe6d4

由 Nicolas Dichtel 提交于 9月 07, 2012

Since route cache deletion (89aef892), delay is no
more used. Remove it.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccfe6d4

01 9月, 2012 1 次提交

ipv4: Minor logic clean-up in ipv4_mtu · 98d75c37

由 Alexander Duyck 提交于 8月 27, 2012

In ipv4_mtu there is some logic where we are testing for a non-zero value
and a timer expiration, then setting the value to zero, and then testing if
the value is zero we set it to a value based on the dst. Instead of
bothering with the extra steps it is easier to just cleanup the logic so
that we set it to the dst based value if it is zero or if the timer has
expired.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>

98d75c37

31 8月, 2012 1 次提交

ipv4: must use rcu protection while calling fib_lookup · c5ae7d41

由 Eric Dumazet 提交于 8月 28, 2012

Following lockdep splat was reported by Pavel Roskin :

[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
[ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
[ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
[ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
[ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
[ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
[ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
[ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
[ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
[ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
[ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
[ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
[ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NPavel Roskin <proski@gnu.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5ae7d41

24 8月, 2012 1 次提交

ipv4: take rt_uncached_lock only if needed · 78df76a0

由 Eric Dumazet 提交于 8月 24, 2012

Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.

This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.

Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78df76a0

23 8月, 2012 1 次提交

ipv4: properly update pmtu · 9b04f350

由 Eric Dumazet 提交于 8月 21, 2012

Sylvain Munault reported following info :

 - TCP connection get "stuck" with data in send queue when doing
   "large" transfers ( like typing 'ps ax' on a ssh connection )
 - Only happens on path where the PMTU is lower than the MTU of
   the interface
 - Is not present right after boot, it only appears 10-20min after
   boot or so. (and that's inside the _same_ TCP connection, it works
   fine at first and then in the same ssh session, it'll get stuck)
 - Definitely seems related to fragments somehow since I see a router
   sending ICMP message saying fragmentation is needed.
 - Exact same setup works fine with kernel 3.5.1

Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.

ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.

It seems we want to set the expires field to a new value, regardless
of prior one.

With help from Julian Anastasov.
Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
CC: Julian Anastasov <ja@ssi.bg>
Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b04f350

15 8月, 2012 1 次提交

ipv4: Cache local output routes · 7bd86cc2

由 Yan, Zheng 提交于 8月 12, 2012

Commit caacf05e causes big drop of UDP loop back performance.
The cause of the regression is that we do not cache the local output
routes. Each time we send a datagram from unconnected UDP socket,
the kernel allocates a dst_entry and adds it to the rt_uncached_list.
It creates lock contention on the rt_uncached_lock.
Reported-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bd86cc2

10 8月, 2012 1 次提交

net: Loopback ifindex is constant now · 1fb9489b

由 Pavel Emelyanov 提交于 8月 08, 2012

As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fb9489b

04 8月, 2012 1 次提交

ipv4: Introduce IN_DEV_NET_ROUTE_LOCALNET · 9eb43e76

由 Eric Dumazet 提交于 8月 03, 2012

performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET()
call done in ip_route_input_slow(), because of multiple dereferences,
even if cache lines are clean and available in cpu caches.

Since we already have the 'net' pointer, introduce
IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences
(dev_net(in_dev->dev))

Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr
or/and daddr are loopback addresse.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb43e76

02 8月, 2012 1 次提交

ipv4: route.c cleanup · e33cdac0

由 Eric Dumazet 提交于 8月 01, 2012

Remove unused includes after IP cache removal
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e33cdac0

01 8月, 2012 4 次提交

ipv4: Properly purge netdev references on uncached routes. · caacf05e

由 David S. Miller 提交于 7月 31, 2012

When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caacf05e

D
ipv4: Cache routes in nexthop exception entries. · c5038a83
由 David S. Miller 提交于 7月 31, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c5038a83

ipv4: percpu nh_rth_output cache · d26b3a7c

由 Eric Dumazet 提交于 7月 31, 2012

Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26b3a7c

ipv4: Restore old dst_free() behavior. · 54764bb6

由 Eric Dumazet 提交于 7月 31, 2012

commit 404e0a8b (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54764bb6

31 7月, 2012 1 次提交

net: ipv4: fix RCU races on dst refcounts · 404e0a8b

由 Eric Dumazet 提交于 7月 29, 2012

commit c6cffba4 (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.

crashes happen on tcp workloads if routes are added/deleted at the same
time.

The dst_free() calls from free_fib_info_rcu() are clearly racy.

We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :

Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst

Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)

rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.

With this patch, my machines survive various benchmarks.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

404e0a8b

27 7月, 2012 1 次提交

ipv4: Fix input route performance regression. · c6cffba4

由 David S. Miller 提交于 7月 26, 2012

With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.
Reported-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6cffba4