提交 · c589fa10f1ed2d963c3ceaa40005d30231a0e556 · openeuler / Kernel

20 4月, 2021 1 次提交

nexthop: Restart nexthop dump based on last dumped nexthop identifier · 9e46fb65

由 Ido Schimmel 提交于 4月 16, 2021

Currently, a multi-part nexthop dump is restarted based on the number of
nexthops that have been dumped so far. This can result in a lot of
nexthops not being dumped when nexthops are simultaneously deleted:

 # ip nexthop | wc -l
 65536
 # ip nexthop flush
 Dump was interrupted and may be inconsistent.
 Flushed 36040 nexthops
 # ip nexthop | wc -l
 29496

Instead, restart the dump based on the nexthop identifier (fixed number)
of the last successfully dumped nexthop:

 # ip nexthop | wc -l
 65536
 # ip nexthop flush
 Dump was interrupted and may be inconsistent.
 Flushed 65536 nexthops
 # ip nexthop | wc -l
 0
Reported-by: NMaksym Yaremchuk <maksymy@nvidia.com>
Tested-by: NMaksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e46fb65

29 3月, 2021 1 次提交

nexthop: Rename artifacts related to legacy multipath nexthop groups · de1d1ee3

由 Petr Machata 提交于 3月 26, 2021

After resilient next-hop groups have been added recently, there are two
types of multipath next-hop groups: the legacy "mpath", and the new
"resilient". Calling the legacy next-hop group type "mpath" is unfortunate,
because that describes the fact that a packet could be forwarded in one of
several paths, which is also true for the resilient next-hop groups.

Therefore, to make the naming clearer, rename various artifacts to reflect
the assumptions made. Therefore as of this patch:

- The flag for multipath groups is nh_grp_entry::is_multipath. This
  includes the legacy and resilient groups, as well as any future group
  types that behave as multipath groups.
  Functions that assume this have "mpath" in the name.

- The flag for legacy multipath groups is nh_grp_entry::hash_threshold.
  Functions that assume this have "hthr" in the name.

- The flag for resilient groups is nh_grp_entry::resilient.
  Functions that assume this have "res" in the name.

Besides the above, struct nh_grp_entry::mpath was renamed to ::hthr as
well.

UAPI artifacts were obviously left intact.
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de1d1ee3

12 3月, 2021 13 次提交

nexthop: Enable resilient next-hop groups · 15e1dd57

由 Petr Machata 提交于 3月 11, 2021

Now that all the code is in place, stop rejecting requests to create
resilient next-hop groups.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e1dd57

nexthop: Notify userspace about bucket migrations · 0b4818aa

由 Petr Machata 提交于 3月 11, 2021

Nexthop replacements et.al. are notified through netlink, but if a delayed
work migrates buckets on the background, userspace will stay oblivious.
Notify these as RTM_NEWNEXTHOPBUCKET events.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b4818aa

nexthop: Add netlink handlers for bucket get · 187d4c6b

由 Petr Machata 提交于 3月 11, 2021

Allow getting (but not setting) individual buckets to inspect the next hop
mapped therein, idle time, and flags.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

187d4c6b

nexthop: Add netlink handlers for bucket dump · 8a1bbabb

由 Petr Machata 提交于 3月 11, 2021

Add a dump handler for resilient next hop buckets. When next-hop group ID
is given, it walks buckets of that group, otherwise it walks buckets of all
groups. It then dumps the buckets whose next hops match the given filtering
criteria.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a1bbabb

nexthop: Add netlink handlers for resilient nexthop groups · a2601e2b

由 Petr Machata 提交于 3月 11, 2021

Implement the netlink messages that allow creation and dumping of resilient
nexthop groups.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2601e2b

nexthop: Allow reporting activity of nexthop buckets · cfc15c1d

由 Ido Schimmel 提交于 3月 11, 2021

The kernel periodically checks the idle time of nexthop buckets to
determine if they are idle and can be re-populated with a new nexthop.

When the resilient nexthop group is offloaded to hardware, the kernel
will not see activity on nexthop buckets unless it is reported from
hardware.

Add a function that can be periodically called by device drivers to
report activity on nexthop buckets after querying it from the underlying
device.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfc15c1d

nexthop: Allow setting "offload" and "trap" indication of nexthop buckets · 56ad5ba3

由 Ido Schimmel 提交于 3月 11, 2021

Add a function that can be called by device drivers to set "offload" or
"trap" indication on nexthop buckets following nexthop notifications and
other changes such as a neighbour becoming invalid.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56ad5ba3

nexthop: Implement notifiers for resilient nexthop groups · 7c37c7e0

由 Petr Machata 提交于 3月 11, 2021

Implement the following notifications towards drivers:

- NEXTHOP_EVENT_REPLACE, when a resilient nexthop group is created.

- NEXTHOP_EVENT_BUCKET_REPLACE any time there is a change in assignment of
  next hops to hash table buckets. That includes replacements, deletions,
  and delayed upkeep cycles. Some bucket notifications can be vetoed by the
  driver, to make it possible to propagate bucket busy-ness flags from the
  HW back to the algorithm. Some are however forced, e.g. if a next hop is
  deleted, all buckets that use this next hop simply must be migrated,
  whether the HW wishes so or not.

- NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, before a resilient nexthop group is
  replaced. Usually the driver will get the bucket notifications as well,
  and could veto those. But in some cases, a bucket may not be migrated
  immediately, but during delayed upkeep, and that is too late to roll the
  transaction back. This notification allows the driver to take a look and
  veto the new proposed group up front, before anything is committed.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c37c7e0

nexthop: Add implementation of resilient next-hop groups · 283a72a5

由 Petr Machata 提交于 3月 11, 2021

At this moment, there is only one type of next-hop group: an mpath group,
which implements the hash-threshold algorithm.

To select a next hop, hash-threshold algorithm first assigns a range of
hashes to each next hop in the group, and then selects the next hop by
comparing the SKB hash with the individual ranges. When a next hop is
removed from the group, the ranges are recomputed, which leads to
reassignment of parts of hash space from one next hop to another. While
there will usually be some overlap between the previous and the new
distribution, some traffic flows change the next hop that they resolve to.
That causes problems e.g. as established TCP connections are reset, because
the traffic is forwarded to a server that is not familiar with the
connection.

Resilient hashing is a technique to address the above problem. Resilient
next-hop group has another layer of indirection between the group itself
and its constituent next hops: a hash table. The selection algorithm uses a
straightforward modulo operation to choose a hash bucket, and then reads
the next hop that this bucket contains, and forwards traffic there.

This indirection brings an important feature. In the hash-threshold
algorithm, the range of hashes associated with a next hop must be
continuous. With a hash table, mapping between the hash table buckets and
the individual next hops is arbitrary. Therefore when a next hop is deleted
the buckets that held it are simply reassigned to other next hops. When
weights of next hops in a group are altered, it may be possible to choose a
subset of buckets that are currently not used for forwarding traffic, and
use those to satisfy the new next-hop distribution demands, keeping the
"busy" buckets intact. This way, established flows are ideally kept being
forwarded to the same endpoints through the same paths as before the
next-hop group change.

In a nutshell, the algorithm works as follows. Each next hop has a number
of buckets that it wants to have, according to its weight and the number of
buckets in the hash table. In case of an event that might cause bucket
allocation change, the numbers for individual next hops are updated,
similarly to how ranges are updated for mpath group next hops. Following
that, a new "upkeep" algorithm runs, and for idle buckets that belong to a
next hop that is currently occupying more buckets than it wants (it is
"overweight"), it migrates the buckets to one of the next hops that has
fewer buckets than it wants (it is "underweight"). If, after this, there
are still underweight next hops, another upkeep run is scheduled to a
future time.

Chances are there are not enough "idle" buckets to satisfy the new demands.
The algorithm has knobs to select both what it means for a bucket to be
idle, and for whether and when to forcefully migrate buckets if there keeps
being an insufficient number of idle buckets.

There are three users of the resilient data structures.

- The forwarding code accesses them under RCU, and does not modify them
except for updating the time a selected bucket was last used.

- Netlink code, running under RTNL, which may modify the data.

- The delayed upkeep code, which may modify the data. This runs unlocked,
and mutual exclusion between the RTNL code and the delayed upkeep is
maintained by canceling the delayed work synchronously before the RTNL
code touches anything. Later it restarts the delayed work if necessary.

The RTNL code has to implement next-hop group replacement, next hop
removal, etc. For removal, the mpath code uses a neat trick of having a
backup next hop group structure, doing the necessary changes offline, and
then RCU-swapping them in. However, the hash tables for resilient hashing
are about an order of magnitude larger than the groups themselves (the size
might be e.g. 4K entries), and it was felt that keeping two of them is an
overkill. Both the primary next-hop group and the spare therefore use the
same resilient table, and writers are careful to keep all references valid
for the forwarding code. The hash table references next-hop group entries
from the next-hop group that is currently in the primary role (i.e. not
spare). During the transition from primary to spare, the table references a
mix of both the primary group and the spare. When a next hop is deleted,
the corresponding buckets are not set to NULL, but instead marked as empty,
so that the pointer is valid and can be used by the forwarding code. The
buckets are then migrated to a new next-hop group entry during upkeep. The
only times that the hash table is invalid is the very beginning and very
end of its lifetime. Between those points, it is always kept valid.

This patch introduces the core support code itself. It does not handle
notifications towards drivers, which are kept as if the group were an mpath
one. It does not handle netlink either. The only bit currently exposed to
user space is the new next-hop group type, and that is currently bounced.
There is therefore no way to actually access this code.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

283a72a5

nexthop: Add netlink defines and enumerators for resilient NH groups · 710ec562

由 Ido Schimmel 提交于 3月 11, 2021

- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested
  attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*.

- RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will
  currently serve only for dumping of individual buckets of resilient next
  hop groups. For nexthop group buckets, these messages will carry a nested
  attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*.

  There are several reasons why a new suite of messages is created for
  nexthop buckets instead of overloading the information on the existing
  RTM_{NEW,DEL,GET}NEXTHOP messages.

  First, a nexthop group can contain a large number of nexthop buckets (4k
  is not unheard of). This imposes limits on the amount of information that
  can be encoded for each nexthop bucket given a netlink message is limited
  to 64k bytes.

  Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at
  this point, in the future it can be extended to provide user space with
  control over nexthop buckets configuration.

- The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is
  adjusted to bounce groups with that type for now.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

710ec562

nexthop: Add a dedicated flag for multipath next-hop groups · 90e1a9e2

由 Petr Machata 提交于 3月 11, 2021

With the introduction of resilient nexthop groups, there will be two types
of multipath groups: the current hash-threshold "mpath" ones, and resilient
groups. Both are multipath, but to determine the fact, the system needs to
consider two flags. This might prove costly in the datapath. Therefore,
introduce a new flag, that should be set for next-hop groups that have more
than one nexthop, and should be considered multipath.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90e1a9e2

nexthop: __nh_notifier_single_info_init(): Make nh_info an argument · 96a85625

由 Petr Machata 提交于 3月 11, 2021

The cited function currently uses rtnl_dereference() to get nh_info from a
handed-in nexthop. However, under the resilient hashing scheme, this
function will not always be called under RTNL, sometimes the mutual
exclusion will be achieved differently. Therefore move the nh_info
extraction from the function to its callers to make it possible to use a
different synchronization guarantee.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96a85625

nexthop: Pass nh_config to replace_nexthop() · 597f48e4

由 Petr Machata 提交于 3月 11, 2021

Currently, replace assumes that the new group that is given is a
fully-formed object. But mpath groups really only have one attribute, and
that is the constituent next hop configuration. This may not be universally
true. From the usability perspective, it is desirable to allow the replace
operation to adjust just the constituent next hop configuration and leave
the group attributes as such intact.

But the object that keeps track of whether an attribute was or was not
given is the nh_config object, not the next hop or next-hop group. To allow
(selective) attribute updates during NH group replacement, propagate `cfg'
to replace_nexthop() and further to replace_nexthop_grp().
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

597f48e4

05 3月, 2021 1 次提交

nexthop: Do not flush blackhole nexthops when loopback goes down · 76c03bf8

由 Ido Schimmel 提交于 3月 04, 2021

As far as user space is concerned, blackhole nexthops do not have a
nexthop device and therefore should not be affected by the
administrative or carrier state of any netdev.

However, when the loopback netdev goes down all the blackhole nexthops
are flushed. This happens because internally the kernel associates
blackhole nexthops with the loopback netdev.

This behavior is both confusing to those not familiar with kernel
internals and also diverges from the legacy API where blackhole IPv4
routes are not flushed when the loopback netdev goes down:

 # ip route add blackhole 198.51.100.0/24
 # ip link set dev lo down
 # ip route show 198.51.100.0/24
 blackhole 198.51.100.0/24

Blackhole IPv6 routes are flushed, but at least user space knows that
they are associated with the loopback netdev:

 # ip -6 route show 2001:db8:1::/64
 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium

Fix this by only flushing blackhole nexthops when the loopback netdev is
unregistered.

Fixes: ab84be7e ("net: Initial nexthop code")
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reported-by: NDonald Sharp <sharpd@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76c03bf8

29 1月, 2021 12 次提交

nexthop: Extract a helper for validation of get/del RTNL requests · 0bccf8ed

由 Petr Machata 提交于 1月 28, 2021

Validation of messages for get / del of a next hop is the same as will be
validation of messages for get of a resilient next hop group bucket. The
difference is that policy for resilient next hop group buckets is a
superset of that used for next-hop get.

It is therefore possible to reuse the code that validates the nhmsg fields,
extracts the next-hop ID, and validates that. To that end, extract from
nh_valid_get_del_req() a helper __nh_valid_get_del_req() that does just
that.

Make the nlh argument const so that the function can be called from the
dump context, which only has a const nlh. Propagate the constness to
nh_valid_get_del_req().
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0bccf8ed

nexthop: Add a callback parameter to rtm_dump_walk_nexthops() · e948217d

由 Petr Machata 提交于 1月 28, 2021

In order to allow different handling for next-hop tree dumper and for
bucket dumper, parameterize the next-hop tree walker with a callback. Add
rtm_dump_nexthop_cb() with just the bits relevant for next-hop tree
dumping.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e948217d

nexthop: Extract a helper for walking the next-hop tree · cbee1807

由 Petr Machata 提交于 1月 28, 2021

Extract from rtm_dump_nexthop() a helper to walk the next hop tree. A
separate function for this will be reusable from the bucket dumper.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

cbee1807

nexthop: Strongly-type context of rtm_dump_nexthop() · a6fbbaa6

由 Petr Machata 提交于 1月 28, 2021

The dump operations need to keep state from one invocation to another. A
scratch area is dedicated for this purpose in the passed-in argument, cb,
namely via two aliased arrays, struct netlink_callback.args and .ctx.

Dumping of buckets will end up having to iterate over next hops as well,
and it would be nice to be able to reuse the iteration logic with the NH
dumper. The fact that the logic currently relies on fixed index to the
.args array, and the indices would have to be coordinated between the two
dumpers, makes this somewhat awkward.

To make the access patters clearer, introduce a helper struct with a NH
index, and instead of using the .args array directly, use it through this
structure.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a6fbbaa6

nexthop: Extract a common helper for parsing dump attributes · b9ebea12

由 Petr Machata 提交于 1月 28, 2021

Requests to dump nexthops have many attributes in common with those that
requests to dump buckets of resilient NH groups will have. However, they
have different policies. To allow reuse of this code, extract a
policy-agnostic wrapper out of nh_valid_dump_req(), and convert this
function into a thin wrapper around it.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b9ebea12

nexthop: Extract dump filtering parameters into a single structure · 56450ec6

由 Petr Machata 提交于 1月 28, 2021

Requests to dump nexthops have many attributes in common with those that
requests to dump buckets of resilient NH groups will have. In order to make
reuse of this code simpler, convert the code to use a single structure with
filtering configuration instead of passing around the parameters one by
one.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

56450ec6

nexthop: Dispatch notifier init()/fini() by group type · da230501

由 Petr Machata 提交于 1月 28, 2021

After there are several next-hop group types, initialization and
finalization of notifier type needs to reflect the actual type. Transform
nh_notifier_grp_info_init() and _fini() to make extending them easier.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

da230501

nexthop: Use enum to encode notification type · 09ad6bec

由 Ido Schimmel 提交于 1月 28, 2021

Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.

As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09ad6bec

nexthop: Assert the invariant that a NH group is of only one type · 720ccd9a

由 Petr Machata 提交于 1月 28, 2021

Most of the code that deals with nexthop groups relies on the fact that the
group is of exactly one well-known type. Currently there is only one type,
"mpath", but as more next-hop group types come, it becomes desirable to
have a central place where the setting is validated. Introduce such place
into nexthop_create_group(), such that the check is done before the code
that relies on that invariant is invoked.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

720ccd9a

nexthop: Introduce to struct nh_grp_entry a per-type union · b9bae61b

由 Petr Machata 提交于 1月 28, 2021

The values that a next-hop group needs to keep track of depend on the group
type. Introduce a union to separate fields specific to the mpath groups
from fields specific to other group types.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b9bae61b

nexthop: Dispatch nexthop_select_path() by group type · 79bc55e3

由 Petr Machata 提交于 1月 28, 2021

The logic for selecting path depends on the next-hop group type. Adapt the
nexthop_select_path() to dispatch according to the group type.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

79bc55e3

nexthop: Rename nexthop_free_mpath · 5d1f0f09

由 David Ahern 提交于 1月 28, 2021

nexthop_free_mpath really should be nexthop_free_group. Rename it.
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

5d1f0f09

21 1月, 2021 3 次提交

nexthop: Specialize rtm_nh_policy · 643d0878

由 Petr Machata 提交于 1月 20, 2021

This policy is currently only used for creation of new next hops and new
next hop groups. Rename it accordingly and remove the two attributes that
are not valid in that context: NHA_GROUPS and NHA_MASTER.

For consistency with other policies, do not mention policy array size in
the declarator, and replace NHA_MAX for ARRAY_SIZE as appropriate.

Note that with this commit, NHA_MAX and __NHA_MAX are not used anymore.
Leave them in purely as a user API.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

643d0878

nexthop: Use a dedicated policy for nh_valid_dump_req() · 44551bff

由 Petr Machata 提交于 1月 20, 2021

This function uses the global nexthop policy, but only accepts four
particular attributes. Create a new policy that only includes the four
supported attributes, and use it. Convert the loop to a series of ifs.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

44551bff

nexthop: Use a dedicated policy for nh_valid_get_del_req() · 60f5ad5e

由 Petr Machata 提交于 1月 20, 2021

This function uses the global nexthop policy only to then bounce all
arguments except for NHA_ID. Instead, just create a new policy that
only includes the one allowed attribute.
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

60f5ad5e

08 1月, 2021 3 次提交

nexthop: Bounce NHA_GATEWAY in FDB nexthop groups · b19218b2

由 Petr Machata 提交于 1月 07, 2021

The function nh_check_attr_group() is called to validate nexthop groups.
The intention of that code seems to have been to bounce all attributes
above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all
these attributes except when NHA_FDB attribute is present--then it accepts
them.

NHA_FDB validation that takes place before, in rtm_to_nh_config(), already
bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further
back, NHA_GROUPS and NHA_MASTER are bounced unconditionally.

But that still leaves NHA_GATEWAY as an attribute that would be accepted in
FDB nexthop groups (with no meaning), so long as it keeps the address
family as unspecified:

 # ip nexthop add id 1 fdb via 127.0.0.1
 # ip nexthop add id 10 fdb via default group 1

The nexthop code is still relatively new and likely not used very broadly,
and the FDB bits are newer still. Even though there is a reproducer out
there, it relies on an improbable gateway arguments "via default", "via
all" or "via any". Given all this, I believe it is OK to reformulate the
condition to do the right thing and bounce NHA_GATEWAY.

Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops")
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b19218b2

nexthop: Unlink nexthop group entry in error path · 7b01e53e

由 Ido Schimmel 提交于 1月 07, 2021

In case of error, remove the nexthop group entry from the list to which
it was previously added.

Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

7b01e53e

nexthop: Fix off-by-one error in error path · 07e61a97

由 Ido Schimmel 提交于 1月 07, 2021

A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.

Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

07e61a97

07 11月, 2020 6 次提交

nexthop: Replay nexthops when registering a notifier · 975ff7f3

由 Ido Schimmel 提交于 11月 04, 2020

When registering a new notifier to the nexthop notification chain,
replay all the existing nexthops to the new notifier so that it will
have a complete picture of the available nexthops.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

975ff7f3

nexthop: Pass extack to register_nexthop_notifier() · ce7e9c8a

由 Ido Schimmel 提交于 11月 04, 2020

This will be used by the next patch which extends the function to replay
all the existing nexthops to the notifier block being registered.

Device drivers will be able to pass extack to the function since it is
passed to them upon reload from devlink.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ce7e9c8a

nexthop: Emit a notification when a nexthop group is reduced · 833a1065

由 Ido Schimmel 提交于 11月 04, 2020

When a single nexthop is deleted, the configuration of all the groups
using the nexthop is effectively modified. In this case, emit a
notification in the nexthop notification chain for each modified group
so that listeners would not need to keep track of which nexthops are
member in which groups.

In the rare cases where the notification fails, emit an error to the
kernel log. This is done by allocating extack on the stack and printing
the error logged by the listener that rejected the notification.

Changes since RFC:
* Allocate extack on the stack
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

833a1065

nexthop: Emit a notification when a nexthop group is modified · f17bc33d

由 Ido Schimmel 提交于 11月 04, 2020

When a single nexthop is replaced, the configuration of all the groups
using the nexthop is effectively modified. In this case, emit a
notification in the nexthop notification chain for each modified group
so that listeners would not need to keep track of which nexthops are
member in which groups.

The notification can only be emitted after the new configuration (i.e.,
'struct nh_info') is pointed at by the old shell (i.e., 'struct
nexthop'). Before that the configuration of the nexthop groups is still
the same as before the replacement.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f17bc33d

nexthop: Emit a notification when a single nexthop is replaced · 8c09c9f9

由 Ido Schimmel 提交于 11月 04, 2020

The notification is emitted after all the validation checks were
performed, but before the new configuration (i.e., 'struct nh_info') is
pointed at by the old shell (i.e., 'struct nexthop'). This prevents the
need to perform rollback in case the notification is vetoed.

The next patch will also emit a replace notification for all the nexthop
groups in which the nexthop is used.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8c09c9f9

nexthop: Emit a notification when a nexthop group is replaced · d144cc5f

由 Ido Schimmel 提交于 11月 04, 2020

Emit a notification in the nexthop notification chain when an existing
nexthop group is replaced.

The notification is emitted after all the validation checks were
performed, but before the new configuration (i.e., 'struct nh_grp') is
pointed at by the old shell (i.e., 'struct nexthop'). This prevents the
need to perform rollback in case the notification is vetoed.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d144cc5f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功