提交 · a5829f536b3d11a57617e83d4bcb2b7d70671e98 · openanolis / cloud-kernel

30 1月, 2016 1 次提交

fib_trie: Fix shift by 32 in fib_table_lookup · a5829f53

由 Alexander Duyck 提交于 1月 28, 2016

The fib_table_lookup function had a shift by 32 that triggered a UBSAN
warning. This was due to the fact that I had placed the shift first and
then followed it with the check for the suffix length to ignore the
undefined behavior. If we reorder this so that we verify the suffix is
less than 32 before shifting the value we can avoid the issue.
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5829f53

28 10月, 2015 1 次提交

fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key · c2229fe1

由 Alexander Duyck 提交于 10月 27, 2015

We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode. As a result
we were getting incorrect index values that would cause us to skip over
some children.

To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: NBrian Rak <brak@gameservers.com>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2229fe1

18 9月, 2015 1 次提交

net: Fix vti use case with oif in dst lookups · 58189ca7

由 David Ahern 提交于 9月 15, 2015

Steffen reported that the recent change to add oif to dst lookups breaks
the VTI use case. The problem is that with the oif set in the flow struct
the comparison to the nh_oif is triggered. Fix by splitting the
FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
nh oif (FLOWI_FLAG_SKIP_NH_OIF).

Fixes: 42a7b32b ("xfrm: Add oif to dst lookups")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58189ca7

30 8月, 2015 1 次提交

net: FIB tracepoints · f6d3c192

由 David Ahern 提交于 8月 28, 2015

A few useful tracepoints developing VRF driver.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d3c192

14 8月, 2015 2 次提交

net: Use VRF device index for lookups on TX · 613d09b3

由 David Ahern 提交于 8月 13, 2015

As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.
Signed-off-by: NShrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

613d09b3

ipv4: off-by-one in continuation handling in /proc/net/route · 25b97c01

由 Andy Whitcroft 提交于 8月 13, 2015

When generating /proc/net/route we emit a header followed by a line for
each route.  When a short read is performed we will restart this process
based on the open file descriptor.  When calculating the start point we
fail to take into account that the 0th entry is the header.  This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

  while read l; do echo "$l"; done </proc/net/route >A
  cat /proc/net/route >B
  diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output.  There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

  +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
  -tun1  00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NAndy Whitcroft <apw@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25b97c01

28 7月, 2015 1 次提交

fib_trie: Drop unnecessary calls to leaf_pull_suffix · 1513069e

由 Alexander Duyck 提交于 7月 27, 2015

It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node.  As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it.  As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.

The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.
Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1513069e

25 7月, 2015 1 次提交

ipv4: consider TOS in fib_select_default · 2392debc

由 Julian Anastasov 提交于 7月 22, 2015

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2392debc

24 6月, 2015 1 次提交

net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f

由 Andy Gospodarek 提交于 6月 23, 2015

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected.   The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eeb075f

22 6月, 2015 1 次提交

ipv4: include NLM_F_APPEND flag in append route notifications · a2bb6d7d

由 Roopa Prabhu 提交于 6月 17, 2015

This patch adds NLM_F_APPEND flag to struct nlmsg_hdr->nlmsg_flags
in newroute notifications if the route add was an append.
(This is similar to how NLM_F_REPLACE is already part of new
route replace notifications today)

This helps userspace determine if the route add operation was
an append.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2bb6d7d

08 6月, 2015 1 次提交

fib_trie: coding style: Use pointer after check · f38b24c9

由 Firo Yang 提交于 6月 08, 2015

As Alexander Duyck pointed out that:
struct tnode {
        ...
        struct key_vector kv[1];
}
The kv[1] member of struct tnode is an arry that refernced by
a null pointer will not crash the system, like this:
struct tnode *p = NULL;
struct key_vector *kv = p->kv;
As such p->kv doesn't actually dereference anything, it is simply a
means for getting the offset to the array from the pointer p.

This patch make the code more regular to avoid making people feel
odd when they look at the code.
Signed-off-by: NFiro Yang <firogm@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38b24c9

27 5月, 2015 1 次提交

ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include. · ffa915d0

由 David S. Miller 提交于 5月 27, 2015

We used to get this indirectly I supposed, but no longer do.

Either way, an explicit include should have been done in the
first place.

   net/ipv4/fib_trie.c: In function '__node_free_rcu':
>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
      vfree(n);
      ^
   net/ipv4/fib_trie.c: In function 'tnode_alloc':
>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
      return vzalloc(size);
      ^
>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast
   cc1: some warnings being treated as errors
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffa915d0

23 5月, 2015 1 次提交

ipv4: fill in table id when replacing a route · d4e64c29

由 Michal Kubeček 提交于 5月 22, 2015

When replacing an IPv4 route, tb_id member of the new fib_alias
structure is not set in the replace code path so that the new route is
ignored.

Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4e64c29

15 5月, 2015 1 次提交

rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD · eea39946

由 Roopa Prabhu 提交于 5月 13, 2015

RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.

This patch renames the flag to be consistent with what the user sees.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eea39946

13 5月, 2015 1 次提交

switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ · ebb9a03a

由 Jiri Pirko 提交于 5月 10, 2015

Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebb9a03a

04 4月, 2015 2 次提交

ipv4: coding style: comparison for inequality with NULL · 00db4124

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00db4124

ipv4: coding style: comparison for equality with NULL · 51456b29

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51456b29

24 3月, 2015 1 次提交

fib_trie: Fix regression in handling of inflate/halve failure · b6f15f82

由 Alexander Duyck 提交于 3月 23, 2015

When I updated the code to address a possible null pointer dereference in
resize I ended up reverting an exception handling fix for the suffix length
in the event that inflate or halve failed. This change is meant to correct
that by reverting the earlier fix and instead simply getting the parent
again after inflate has been completed to avoid the possible null pointer
issue.

Fixes: ddb4b9a1 ("fib_trie: Address possible NULL pointer dereference in resize")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6f15f82

13 3月, 2015 1 次提交

fib_trie: Provide a deterministic order for fib_alias w/ tables merged · 0b65bd97

由 Alexander Duyck 提交于 3月 12, 2015

This change makes it so that we should always have a deterministic ordering
for the main and local aliases within the merged table when two leaves
overlap.

So for example if we have a leaf with a key of 192.168.254.0. If we
previously added two aliases with a prefix length of 24 from both local and
main the first entry would be first and the second would be second. When I
was coding this I had added a WARN_ON should such a situation occur as I
wasn't sure how likely it would be. However this WARN_ON has been
triggered so this is something that should be addressed.

With this patch the ordering of the aliases is as follows. First they are
sorted on prefix length, then on their table ID, then tos, and finally
priority. This way what we end up doing is essentially interleaving the
two tables on what used to be leaf_info structure boundaries.

Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b65bd97

12 3月, 2015 2 次提交

fib_trie: Only display main table in /proc/net/route · 654eff45

由 Alexander Duyck 提交于 3月 11, 2015

When we merged the tries for local and main I had overlooked the iterator
for /proc/net/route. As a result it was outputting both local and main
when the two tries were merged.

This patch resolves that by only providing output for aliases that are
actually in the main trie. As a result we should go back to the original
behavior which I assume will be necessary to maintain legacy support.

Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

654eff45

ipv4: FIB Local/MAIN table collapse · 0ddcf43d

由 Alexander Duyck 提交于 3月 06, 2015

This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer.  Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.

As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.

In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems.  Using this we can also reverse the
merge in the event that custom FIB rules are enabled.

With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ddcf43d

11 3月, 2015 2 次提交

fib_trie: Address possible NULL pointer dereference in resize · ddb4b9a1

由 Alexander Duyck 提交于 3月 10, 2015

If the inflate call failed it would return NULL. As a result tp would be
set to NULL and cause use to trigger a NULL pointer dereference in
should_halve if the inflate failed on the first attempt.

In order to prevent this we should decrement max_work before we actually
attempt to inflate as this will force us to exit before attempting to halve
a node we should have inflated. In order to keep things symmetric between
inflate and halve I went ahead and also moved the decrement of max_work for
the halve case as well so we take care of that before we actually attempt
to halve the tnode.

Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddb4b9a1

fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu · 3ec320dd

由 Alexander Duyck 提交于 3月 10, 2015

In the case of a trie that had no tnodes with a key of 0 the initial
look-up would fail resulting in an out-of-bounds cindex on the first tnode.
This resulted in an entire trie being skipped.

In order resolve this I have updated the cindex logic in the initial
look-up so that if the key is zero we will always traverse the child zero
path.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ec320dd

10 3月, 2015 1 次提交

switchdev: add netlink flags to IPv4 FIB add op · f8f21471

由 Scott Feldman 提交于 3月 09, 2015

Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.
Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8f21471

07 3月, 2015 10 次提交

fib_trie: Add key vector to root, return parent key_vector in resize · 88bae714

由 Alexander Duyck 提交于 3月 06, 2015

This change makes it so that the root of the trie contains a key_vector, by
doing this we make room to essentially collapse the entire trie by at least
one cache line as we can store the information about the tnode or leaf that
is pointed to in the root.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88bae714

fib_trie: Move parent from key_vector to tnode · f23e59fb

由 Alexander Duyck 提交于 3月 06, 2015

This change pulls the parent pointer from the key_vector and places it in
the tnode structure.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f23e59fb

fib_trie: Pull empty_children and full_children into tnode · 6e22d174

由 Alexander Duyck 提交于 3月 06, 2015

This pulls the information about the child array out of the key_vector and
places it in the tnode since that is where it is needed.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e22d174

fib_trie: Move rcu from key_vector to tnode, add accessors. · 56ca2adf

由 Alexander Duyck 提交于 3月 06, 2015

RCU is only needed once for the entire node, not once per key_vector so we
can pull that out and move it to the tnode structure.

In addition add accessors to be used inside the RCU functions so that we
can more easily get from the key vector to either the tnode or the trie
pointers.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56ca2adf

fib_trie: Add tnode struct as a container for fields not needed in key_vector · dc35dbed

由 Alexander Duyck 提交于 3月 06, 2015

This change pulls the fields not explicitly needed in the key_vector and
placed them in the new tnode structure.  By doing this we will eventually
be able to reduce the key_vector down to 16 bytes on 64 bit systems, and
12 bytes on 32 bit systems.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc35dbed

fib_trie: Rename tnode_child_length to child_length · 2e1ac88a

由 Alexander Duyck 提交于 3月 06, 2015

We are now checking the length of a key_vector instead of a tnode so it
makes sense to probably just rename this to child_length since it would
probably even be applicable to a leaf.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e1ac88a

fib_trie: replace tnode_get_child functions with get_child macros · 754baf8d

由 Alexander Duyck 提交于 3月 06, 2015

I am replacing the tnode_get_child call with get_child since we are
techically pulling the child out of a key_vector now and not a tnode.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

754baf8d

fib_trie: Rename tnode to key_vector · 35c6edac

由 Alexander Duyck 提交于 3月 06, 2015

Rename the tnode to key_vector.  The key_vector will be the eventual
container for all of the information needed by either a leaf or a tnode.
The final result should be much smaller than the 40 bytes currently needed
for either one.

This also updates the trie struct so that it contains an array of size 1 of
tnode pointers.  This is to bring the structure more inline with how an
actual tnode itself is configured.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35c6edac

fib_trie: Return pointer to tnode pointer in resize/inflate/halve · 8d8e810c

由 Alexander Duyck 提交于 3月 06, 2015

Resize related functions now all return a pointer to the pointer that
references the object that was resized.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d8e810c

fib_trie: Minor cleanups to fib_table_flush_external · 72be7260

由 Alexander Duyck 提交于 3月 06, 2015

This change just does a couple of minor cleanups on
fib_table_flush_external.  Specifically it addresses the fact that resize
was being called even though nothing was being removed from the table, and
it drops an unecessary indent since we could just call continue on the
inverse of the fi && flag check.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72be7260

06 3月, 2015 3 次提交

ipv4: Fix unused variable warnings in fib_table_flush_external. · 23375a0f

由 David S. Miller 提交于 3月 06, 2015

net/ipv4/fib_trie.c: In function ‘fib_table_flush_external’:
net/ipv4/fib_trie.c:1572:6: warning: unused variable ‘found’ [-Wunused-variable]
  int found = 0;
      ^
net/ipv4/fib_trie.c:1571:16: warning: unused variable ‘slen’ [-Wunused-variable]
  unsigned char slen;
                ^
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23375a0f

fib: hook IPv4 fib for hardware offload · 8e05fd71

由 Scott Feldman 提交于 3月 05, 2015

Call into the switchdev driver any time an IPv4 fib entry is
added/modified/deleted from the kernel's FIB.  The switchdev driver may or
may not install the route to the offload device.  In the case where the
driver tries to install the route and something goes wrong (device's routing
table is full, etc), then all of the offloaded routes will be flushed from the
device, route forwarding falls back to the kernel, and no more routes are
offloading.

We can refine this logic later.  For now, use the simplist model of offloading
routes up to the point of failure, and then on failure, undo everything and
mark IPv4 offloading disabled.
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e05fd71

switchdev: don't support custom ip rules, for now · 104616e7

由 Scott Feldman 提交于 3月 05, 2015

Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

104616e7

05 3月, 2015 3 次提交

fib_trie: Prevent allocating tnode if bits is too big for size_t · 1de3d87b

由 Alexander Duyck 提交于 3月 04, 2015

This patch adds code to prevent us from attempting to allocate a tnode with
a size larger than what can be represented by size_t.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1de3d87b

fib_trie: Update last spot w/ idx >> n->bits code and explanation · 71e8b67d

由 Alexander Duyck 提交于 3月 04, 2015

This change updates the fib_table_lookup function so that it is in sync
with the fib_find_node function in terms of the explanation for the index
check based on the bits value.

I have also updated it from doing a mask to just doing a compare as I have
found that seems to provide more options to the compiler as I have seen it
turn this into a shift of the value and test under some circumstances.

In addition I addressed one minor issue in which we kept computing the key
^ n->key when checking the fib aliases.  I pulled the xor out of the loop
in order to reduce the number of memory reads in the lookup.  As a result
we should save a couple cycles since the xor is only done once much earlier
in the lookup.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71e8b67d

fib_trie: Make fib_table rcu safe · a7e53531

由 Alexander Duyck 提交于 3月 04, 2015

The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7e53531

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功