提交 · deed49df7390d5239024199e249190328f1651e7 · openanolis / cloud-kernel

19 2月, 2016 1 次提交

route: check and remove route cache when we get route · deed49df

由 Xin Long 提交于 2月 18, 2016

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deed49df

09 1月, 2016 1 次提交

ipv4: eliminate endianness warnings in ip_fib.h · 0797cbd8

由 Lance Richardson 提交于 1月 06, 2016

fib_multipath_hash() computes a hash using __be32 values, force
cast these to u32 to pacify sparse.
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0797cbd8

02 11月, 2015 1 次提交

ipv4: fix to not remove local route on link down · 4f823def

由 Julian Anastasov 提交于 10月 30, 2015

When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ifconfig dummy0 down
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0  proto kernel  scope host  src 192.168.168.1

Fixes: 8a3d0316 ("net: track link-status of ipv4 nexthops")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f823def

07 10月, 2015 1 次提交

net: Refactor path selection in __ip_route_output_key_hash · 3ce58d84

由 David Ahern 提交于 10月 05, 2015

VRF device needs the same path selection following lookup to set source
address. Rather than duplicating code, move existing code into a
function that is exported to modules.

Code move only; no functional change.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ce58d84

05 10月, 2015 1 次提交

ipv4: L3 hash-based multipath · 0e884c78

由 Peter Nørlund 提交于 9月 30, 2015

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.
Signed-off-by: NPeter Nørlund <pch@ordbogen.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e884c78

21 9月, 2015 1 次提交

net: Fix behaviour of unreachable, blackhole and prohibit routes · 0315e382

由 Nikola Forró 提交于 9月 17, 2015

Man page of ip-route(8) says following about route types:

  unreachable - these destinations are unreachable.  Packets are dis‐
  carded and the ICMP message host unreachable is generated.  The local
  senders get an EHOSTUNREACH error.

  blackhole - these destinations are unreachable.  Packets are dis‐
  carded silently.  The local senders get an EINVAL error.

  prohibit - these destinations are unreachable.  Packets are discarded
  and the ICMP message communication administratively prohibited is
  generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.
Signed-off-by: NNikola Forró <nforro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0315e382

25 7月, 2015 1 次提交

ipv4: consider TOS in fib_select_default · 2392debc

由 Julian Anastasov 提交于 7月 22, 2015

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2392debc

22 7月, 2015 1 次提交

ipv4: support for fib route lwtunnel encap attributes · 571e7226

由 Roopa Prabhu 提交于 7月 21, 2015

This patch adds support in ipv4 fib functions to parse user
provided encap attributes and attach encap state data to fib_nh
and rtable.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

571e7226

24 6月, 2015 2 次提交

net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f

由 Andy Gospodarek 提交于 6月 23, 2015

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected.   The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eeb075f

net: track link-status of ipv4 nexthops · 8a3d0316

由 Andy Gospodarek 提交于 6月 23, 2015

Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

v5: Whitespace and variable declaration fixups suggested by Dave.

v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a3d0316

12 3月, 2015 1 次提交

ipv4: FIB Local/MAIN table collapse · 0ddcf43d

由 Alexander Duyck 提交于 3月 06, 2015

This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ddcf43d

06 3月, 2015 1 次提交

switchdev: don't support custom ip rules, for now · 104616e7

由 Scott Feldman 提交于 3月 05, 2015

Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

104616e7

05 3月, 2015 1 次提交

fib_trie: Make fib_table rcu safe · a7e53531

由 Alexander Duyck 提交于 3月 04, 2015

The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7e53531

28 2月, 2015 1 次提交

fib_trie: Convert fib_alias to hlist from list · 56315f9e

由 Alexander Duyck 提交于 2月 25, 2015

There isn't any advantage to having it as a list and by making it an hlist
we make the fib_alias more compatible with the list_info in terms of the
type of list used.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56315f9e

01 1月, 2015 1 次提交

fib_trie: Push rcu_read_lock/unlock to callers · 345e9b54

由 Alexander Duyck 提交于 12月 31, 2014

This change is to start cleaning up some of the rcu_read_lock/unlock
handling.  I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.

A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

345e9b54

09 12月, 2014 1 次提交

fib_trie: Fix /proc/net/fib_trie when CONFIG_IP_MULTIPLE_TABLES is not defined · a5a519b2

由 Alexander Duyck 提交于 12月 02, 2014

In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result
when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple
times. I found that the problem line of code was in the function
fib_trie_seq_next. Specifically the line below caused the indexes to go in
the opposite direction of our traversal:

h = tb->tb_id & (FIB_TABLE_HASHSZ - 1);

This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID
255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254
with a TABLE_MAIN_INDEX of 1. This means that the above line will return 1
for the local table and 0 for main. The result is that fib_trie_seq_next
will return NULL at the end of the local table, fib_trie_seq_start will
return the start of the main table, and then fib_trie_seq_next will loop on
main forever as h will always return 0.

The fix for this is to reverse the ordering of the two tables. It has the
advantage of making it so that the tables now print in the same order
regardless of if multiple tables are enabled or not. In order to make the
definition consistent with the multiple tables case I simply masked the to
RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1). This way the two table
layouts should always stay consistent.

Fixes: 93456b6d ("[IPV4]: Unify access to the routing tables")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5a519b2

06 9月, 2014 2 次提交

ipv4: harden fnhe_hashfun() · d546c621

由 Eric Dumazet 提交于 9月 04, 2014

Lets make this hash function a bit secure, as ICMP attacks are still
in the wild.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d546c621

ipv4: fix a race in update_or_create_fnhe() · caa41527

由 Eric Dumazet 提交于 9月 03, 2014

nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh->nh_exceptions(),
we need a proper memory barrier.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caa41527

22 9月, 2013 1 次提交

ip*.h: Remove extern from function prototypes · 5c3a0fd7

由 Joe Perches 提交于 9月 21, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c3a0fd7

02 7月, 2013 1 次提交

ipv4: remove fib_update_nh_saddrs() declaration. · b48410b4

由 Rami Rosen 提交于 7月 01, 2013

This patch removes the fib_update_nh_saddrs() declaration from
include/net/ip_fib.h, as the fib_update_nh_saddrs() method was removed in
coomit 436c3b66 ("ipv4: Invalidate nexthop cache nh_saddr more correctly").
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b48410b4

29 6月, 2013 1 次提交

ipv4: use next hop exceptions also for input routes · 2ffae99d

由 Timo Teräs 提交于 6月 27, 2013

Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ffae99d

03 6月, 2013 1 次提交

ipv4: use separate genid for next hop exceptions · 5aad1de5

由 Timo Teräs 提交于 5月 27, 2013

commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.

However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.

While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5aad1de5

13 3月, 2013 1 次提交

ipv4: fix definition of FIB_TABLE_HASHSZ · 5b9e12db

由 Denis V. Lunev 提交于 3月 13, 2013

a long time ago by the commit

  commit 93456b6d
  Author: Denis V. Lunev <den@openvz.org>
  Date:   Thu Jan 10 03:23:38 2008 -0800

    [IPV4]: Unify access to the routing tables.

the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH

This patch returns the situation to the original state.

The problem was spotted by Tingwei Liu.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Tingwei Liu <tingw.liu@gmail.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b9e12db

05 10月, 2012 1 次提交

ipv4: add a fib_type to fib_info · f4ef85bb

由 Eric Dumazet 提交于 10月 04, 2012

commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.
Reported-by: NChris Clayton <chris2553@googlemail.com>
Bisected-by: NChris Clayton <chris2553@googlemail.com>
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: NChris Clayton <chris2553@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4ef85bb

01 8月, 2012 3 次提交

D
ipv4: Cache routes in nexthop exception entries. · c5038a83
由 David S. Miller 提交于 7月 31, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c5038a83

ipv4: percpu nh_rth_output cache · d26b3a7c

由 Eric Dumazet 提交于 7月 31, 2012

Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26b3a7c

ipv4: Restore old dst_free() behavior. · 54764bb6

由 Eric Dumazet 提交于 7月 31, 2012

commit 404e0a8b (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54764bb6

21 7月, 2012 2 次提交

ipv4: Cache input routes in fib_info nexthops. · d2d68ba9

由 David S. Miller 提交于 7月 17, 2012

Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2d68ba9

ipv4: Cache output routes in fib_info nexthops. · f2bb4bed

由 David S. Miller 提交于 7月 17, 2012

If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2bb4bed

20 7月, 2012 1 次提交

ipv4: use seqlock for nh_exceptions · aee06da6

由 Julian Anastasov 提交于 7月 18, 2012

Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aee06da6

17 7月, 2012 1 次提交

ipv4: Add FIB nexthop exceptions. · 4895c771

由 David S. Miller 提交于 7月 17, 2012

In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.

This is implemented here via nexthop exceptions.

The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5.  A more sophisticated scheme can be
devised if that proves necessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4895c771

13 7月, 2012 2 次提交

ipv4: Don't store a rule pointer in fib_result. · 85b91b03

由 David S. Miller 提交于 7月 13, 2012

We only use it to fetch the rule's tclassid, so just store the
tclassid there instead.

This also decreases the size of fib_result by a full 8 bytes on
64-bit.  On 32-bits it's a wash.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85b91b03

D
ipv4: Remove tb_peers from fib_table. · 391e5c22
由 David S. Miller 提交于 7月 12, 2012
```
No longer used.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
391e5c22

11 7月, 2012 1 次提交

ipv4: Fix crashes in fib_rules_tclass(). · e044a651

由 David S. Miller 提交于 7月 10, 2012

All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.

We violated that expectation in the new fib_lookup() fast path.
Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e044a651

06 7月, 2012 1 次提交

ipv4: Avoid overhead when no custom FIB rules are installed. · f4530fa5

由 David S. Miller 提交于 7月 05, 2012

If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4530fa5

29 6月, 2012 2 次提交

ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8

由 David S. Miller 提交于 6月 29, 2012

If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.

We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.

Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a9bc9b8

ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380

由 David S. Miller 提交于 6月 28, 2012

Checking for in_dev being NULL is pointless.

In fact, all of our callers have in_dev precomputed already,
so just pass it in and remove the NULL checking.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e56e380

28 6月, 2012 2 次提交

D
ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
由 David S. Miller 提交于 6月 28, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
41347dcd

ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e

由 David S. Miller 提交于 6月 28, 2012

The specific destination is the host we direct unicast replies to.
Usually this is the original packet source address, but if we are
responding to a multicast or broadcast packet we have to use something
different.

Specifically we must use the source address we would use if we were to
send a packet to the unicast source of the original packet.

The routing cache precomputes this value, but we want to remove that
precomputation because it creates a hard dependency on the expensive
rpfilter source address validation which we'd like to make cheaper.

There are only three places where this matters:

1) ICMP replies.

2) pktinfo CMSG

3) IP options

Now there will be no real users of rt->rt_spec_dst and we can simply
remove it altogether.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35ebf65e

11 6月, 2012 1 次提交
- D
  inet: Add inetpeer tree roots to the FIB tables. · 8e773277
  由 David S. Miller 提交于 6月 11, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  8e773277

openanolis / cloud-kernel 11 个月 前同步成功

openanolis / cloud-kernel
11 个月前同步成功