1. 11 3月, 2021 1 次提交
  2. 09 2月, 2021 2 次提交
    • A
      IPv6: Extend 'fib_notify_on_flag_change' sysctl · 6fad361a
      Amit Cohen 提交于
      Add the value '2' to 'fib_notify_on_flag_change' to allow sending
      notifications only for failed route installation.
      
      Separate value is added for such notifications because there are less of
      them, so they do not impact performance and some users will find them more
      important.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fad361a
    • A
      IPv6: Add "offload failed" indication to routes · 0c5fcf9e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv6 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib6_info' is extended with new field that indicates if route
      offload failed. Note that the new field is added using unused bit and
      therefore there is no need to increase struct size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5fcf9e
  3. 05 2月, 2021 1 次提交
  4. 04 2月, 2021 2 次提交
  5. 03 2月, 2021 1 次提交
    • A
      net: ipv6: Emit notification when fib hardware flags are changed · 907eea48
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel,
      but not necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead
      to a routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      It is also possible for a route already installed in hardware to change
      its action and therefore its flags. For example, a host route that is
      trapping packets can be "promoted" to perform decapsulation following
      the installation of an IPinIP/VXLAN tunnel.
      
      Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed. The aim is to provide an indication to user-space
      (e.g., routing daemons) about the state of the route in hardware.
      
      Introduce a sysctl that controls this behavior.
      
      Keep the default value at 0 (i.e., do not emit notifications) for several
      reasons:
      - Multiple RTM_NEWROUTE notification per-route might confuse existing
        routing daemons.
      - Convergence reasons in routing daemons.
      - The extra notifications will negatively impact the insertion rate.
      - Not all users are interested in these notifications.
      
      Move fib6_info_hw_flags_set() to C file because it is no longer a short
      function.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      907eea48
  6. 27 1月, 2021 1 次提交
    • P
      net: allow user to set metric on default route learned via Router Advertisement · 6b2e04bc
      Praveen Chaudhary 提交于
      For IPv4, default route is learned via DHCPv4 and user is allowed to change
      metric using config etc/network/interfaces. But for IPv6, default route can
      be learned via RA, for which, currently a fixed metric value 1024 is used.
      
      Ideally, user should be able to configure metric on default route for IPv6
      similar to IPv4. This patch adds sysctl for the same.
      
      Logs:
      
      For IPv4:
      
      Config in etc/network/interfaces:
      auto eth0
      iface eth0 inet dhcp
          metric 4261413864
      
      IPv4 Kernel Route Table:
      $ ip route list
      default via 172.21.47.1 dev eth0 metric 4261413864
      
      FRR Table, if a static route is configured:
      [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.]
      Codes: K - kernel route, C - connected, S - static, R - RIP,
             O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
             T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
             > - selected route, * - FIB route
      
      S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
      K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m
      
      i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
      Similar behavior is not possible for IPv6, without this fix.
      
      After fix [for IPv6]:
      sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705
      
      IP monitor: [When IPv6 RA is received]
      default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705  pref high
      
      Kernel IPv6 routing table
      $ ip -6 route list
      default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high
      
      FRR Table, if a static route is configured:
      [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.]
      Codes: K - kernel route, C - connected, S - static, R - RIPng,
             O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
             v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
             > - selected route, * - FIB route
      
      S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
      K   ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m
      
      If the metric is changed later, the effect will be seen only when next IPv6
      RA is received, because the default route must be fully controlled by RA msg.
      Below metric is changed from 1996489705 to 1996489704.
      
      $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
      net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704
      
      IP monitor:
      [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric]
      
      Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high
      default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high
      Signed-off-by: NPraveen Chaudhary <pchaudhary@linkedin.com>
      Signed-off-by: NZhenggen Xu <zxu@linkedin.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      6b2e04bc
  7. 20 11月, 2020 1 次提交
  8. 07 11月, 2020 1 次提交
  9. 10 10月, 2020 1 次提交
    • G
      net: ipv6: Discard next-hop MTU less than minimum link MTU · 4a65dff8
      Georg Kohmann 提交于
      When a ICMPV6_PKT_TOOBIG report a next-hop MTU that is less than the IPv6
      minimum link MTU, the estimated path MTU is reduced to the minimum link
      MTU. This behaviour breaks TAHI IPv6 Core Conformance Test v6LC4.1.6:
      Packet Too Big Less than IPv6 MTU.
      
      Referring to RFC 8201 section 4: "If a node receives a Packet Too Big
      message reporting a next-hop MTU that is less than the IPv6 minimum link
      MTU, it must discard it. A node must not reduce its estimate of the Path
      MTU below the IPv6 minimum link MTU on receipt of a Packet Too Big
      message."
      
      Drop the path MTU update if reported MTU is less than the minimum link MTU.
      Signed-off-by: NGeorg Kohmann <geokohma@cisco.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4a65dff8
  10. 22 9月, 2020 1 次提交
  11. 12 9月, 2020 1 次提交
  12. 29 7月, 2020 1 次提交
  13. 26 7月, 2020 1 次提交
  14. 22 7月, 2020 1 次提交
  15. 08 7月, 2020 1 次提交
    • D
      ipv6: Fix use of anycast address with loopback · aea23c32
      David Ahern 提交于
      Thomas reported a regression with IPv6 and anycast using the following
      reproducer:
      
          echo 1 >  /proc/sys/net/ipv6/conf/all/forwarding
          ip -6 a add fc12::1/16 dev lo
          sleep 2
          echo "pinging lo"
          ping6 -c 2 fc12::
      
      The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed
      the use of fib6_is_reject which checks addresses added to the loopback
      interface and sets the REJECT flag as needed. Update fib6_is_reject for
      loopback checks to handle RTF_ANYCAST addresses.
      
      Fixes: c7a1ce39 ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
      Reported-by: thomas.gambier@nexedi.com
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aea23c32
  16. 07 7月, 2020 1 次提交
    • D
      ipv6: fib6_select_path can not use out path for nexthop objects · 34fe5a1c
      David Ahern 提交于
      Brian reported a crash in IPv6 code when using rpfilter with a setup
      running FRR and external nexthop objects. The root cause of the crash
      is fib6_select_path setting fib6_nh in the result to NULL because of
      an improper check for nexthop objects.
      
      More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
      set causing fib6_select_path to be called with have_oif_match set.
      fib6_select_path has early check on have_oif_match and jumps to the
      out label which presumes a builtin fib6_nh. This path is invalid for
      nexthop objects; for external nexthops fib6_select_path needs to just
      return if the fib6_nh has already been set in the result otherwise it
      returns after the call to nexthop_path_fib6_result. Update the check
      on have_oif_match to not bail on external nexthops.
      
      Update selftests for this problem.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: NBrian Rak <brak@choopa.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34fe5a1c
  17. 24 6月, 2020 1 次提交
    • B
      ipv6: fib6: avoid indirect calls from fib6_rule_lookup · 55cced4f
      Brian Vazquez 提交于
      It was reported that a considerable amount of cycles were spent on the
      expensive indirect calls on fib6_rule_lookup. This patch introduces an
      inline helper called pol_route_func that uses the indirect_call_wrappers
      to avoid the indirect calls.
      
      This patch saves around 50ns per call.
      
      Performance was measured on the receiver by checking the amount of
      syncookies that server was able to generate under a synflood load.
      
      Traffic was generated using trafgen[1] which was pushing around 1Mpps on
      a single queue. Receiver was using only one rx queue which help to
      create a bottle neck and make the experiment rx-bounded.
      
      These are the syncookies generated over 10s from the different runs:
      
      Whithout the patch:
      TcpExtSyncookiesSent            3553749            0.0
      TcpExtSyncookiesSent            3550895            0.0
      TcpExtSyncookiesSent            3553845            0.0
      TcpExtSyncookiesSent            3541050            0.0
      TcpExtSyncookiesSent            3539921            0.0
      TcpExtSyncookiesSent            3557659            0.0
      TcpExtSyncookiesSent            3526812            0.0
      TcpExtSyncookiesSent            3536121            0.0
      TcpExtSyncookiesSent            3529963            0.0
      TcpExtSyncookiesSent            3536319            0.0
      
      With the patch:
      TcpExtSyncookiesSent            3611786            0.0
      TcpExtSyncookiesSent            3596682            0.0
      TcpExtSyncookiesSent            3606878            0.0
      TcpExtSyncookiesSent            3599564            0.0
      TcpExtSyncookiesSent            3601304            0.0
      TcpExtSyncookiesSent            3609249            0.0
      TcpExtSyncookiesSent            3617437            0.0
      TcpExtSyncookiesSent            3608765            0.0
      TcpExtSyncookiesSent            3620205            0.0
      TcpExtSyncookiesSent            3601895            0.0
      
      Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
      the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
      estimate of 50 ns per packet.
      
      [1] http://netsniff-ng.org/
      
      Changelog since v1:
       - Change ordering in the ICW (Paolo Abeni)
      
      Cc: Luigi Rizzo <lrizzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55cced4f
  18. 23 5月, 2020 1 次提交
  19. 19 5月, 2020 1 次提交
  20. 14 5月, 2020 3 次提交
  21. 10 5月, 2020 1 次提交
  22. 09 5月, 2020 2 次提交
  23. 08 5月, 2020 1 次提交
    • M
      Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu" · 09454fd0
      Maciej Żenczykowski 提交于
      This reverts commit 19bda36c:
      
      | ipv6: add mtu lock check in __ip6_rt_update_pmtu
      |
      | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
      | It leaded to that mtu lock doesn't really work when receiving the pkt
      | of ICMPV6_PKT_TOOBIG.
      |
      | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
      | did in __ip_rt_update_pmtu.
      
      The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
      There's already a comment to this effect elsewhere in the kernel:
      
        $ git grep -p -B1 -A3 'RTAX_MTU lock'
        net/ipv6/route.c=4813=
      
        static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
        ...
          /* In IPv6 pmtu discovery is not optional,
             so that RTAX_MTU lock cannot disable it.
             We still use this lock to block changes
             caused by addrconf/ndisc.
          */
      
      This reverts to the pre-4.9 behaviour.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Fixes: 19bda36c ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09454fd0
  24. 02 5月, 2020 1 次提交
    • D
      ipv6: Use global sernum for dst validation with nexthop objects · 8f34e53b
      David Ahern 提交于
      Nik reported a bug with pcpu dst cache when nexthop objects are
      used illustrated by the following:
          $ ip netns add foo
          $ ip -netns foo li set lo up
          $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
          $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
          $ ip li add veth1 type veth peer name veth2
          $ ip li set veth1 up
          $ ip addr add 2001:db8:10::1/64 dev veth1
          $ ip li set dev veth2 netns foo
          $ ip -netns foo li set veth2 up
          $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
          $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Create a pcpu entry on cpu 0:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
      
          Re-add the route entry:
          $ ip -6 ro del 2001:db8:11::1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Route get on cpu 0 returns the stale pcpu:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
          RTNETLINK answers: Network is unreachable
      
          While cpu 1 works:
          $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
          2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium
      
      Conversion of FIB entries to work with external nexthop objects
      missed an important difference between IPv4 and IPv6 - how dst
      entries are invalidated when the FIB changes. IPv4 has a per-network
      namespace generation id (rt_genid) that is bumped on changes to the FIB.
      Checking if a dst_entry is still valid means comparing rt_genid in the
      rtable to the current value of rt_genid for the namespace.
      
      IPv6 also has a per network namespace counter, fib6_sernum, but the
      count is saved per fib6_node. With the per-node counter only dst_entries
      based on fib entries under the node are invalidated when changes are
      made to the routes - limiting the scope of invalidations. IPv6 uses a
      reference in the rt6_info, 'from', to track the corresponding fib entry
      used to create the dst_entry. When validating a dst_entry, the 'from'
      is used to backtrack to the fib6_node and check the sernum of it to the
      cookie passed to the dst_check operation.
      
      With the inline format (nexthop definition inline with the fib6_info),
      dst_entries cached in the fib6_nh have a 1:1 correlation between fib
      entries, nexthop data and dst_entries. With external nexthops, IPv6
      looks more like IPv4 which means multiple fib entries across disparate
      fib6_nodes can all reference the same fib6_nh. That means validation
      of dst_entries based on external nexthops needs to use the IPv4 format
      - the per-network namespace counter.
      
      Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
      rt6_get_cookie to return sernum if it is set and update dst_check for
      IPv6 to look for sernum set and based the check on it if so. Finally,
      rt6_get_pcpu_route needs to validate the cached entry before returning
      a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
      __mkroute_output for IPv4).
      
      This problem only affects routes using the new, external nexthops.
      
      Thanks to the kbuild test robot for catching the IS_ENABLED needed
      around rt_genid_ipv6 before I sent this out.
      
      Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects")
      Reported-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f34e53b
  25. 29 4月, 2020 2 次提交
  26. 27 4月, 2020 1 次提交
  27. 30 3月, 2020 1 次提交
  28. 24 3月, 2020 1 次提交
  29. 13 3月, 2020 1 次提交
  30. 17 2月, 2020 1 次提交
    • B
      ipv6: Fix nlmsg_flags when splitting a multipath route · afecdb37
      Benjamin Poirier 提交于
      When splitting an RTA_MULTIPATH request into multiple routes and adding the
      second and later components, we must not simply remove NLM_F_REPLACE but
      instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink
      message was malformed.
      
      For example,
      	ip route add 2001:db8::1/128 dev dummy0
      	ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \
      		nexthop via fe80::30:2 dev dummy0
      results in the following warnings:
      [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
      [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route
      
      This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to
      what it would get if the multipath route had been added in multiple netlink
      operations:
      	ip route add 2001:db8::1/128 dev dummy0
      	ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0
      	ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NBenjamin Poirier <bpoirier@cumulusnetworks.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afecdb37
  31. 15 1月, 2020 1 次提交
  32. 25 12月, 2019 3 次提交