1. 10 2月, 2021 20 次提交
  2. 09 2月, 2021 20 次提交
    • D
      Merge branch 'route-offload-failure' · 5ea3c72c
      David S. Miller 提交于
      net: Add support for route offload failure notifications
      
      Ido Schimmel  says:
      
      ====================
      This is a complementary series to the one merged in commit 389cb1ec
      ("Merge branch 'add-notifications-when-route-hardware-flags-change'").
      
      The previous series added RTM_NEWROUTE notifications to user space
      whenever a route was successfully installed in hardware or when its
      state in hardware changed. This allows routing daemons to delay
      advertisement of routes until they are installed in hardware.
      
      However, if route installation failed, a routing daemon will wait
      indefinitely for a notification that will never come. The aim of this
      series is to provide a failure notification via a new flag
      (RTM_F_OFFLOAD_FAILED) in the RTM_NEWROUTE message. Upon such a
      notification a routing daemon may decide to withdraw the route from the
      FIB.
      
      Series overview:
      
      Patch #1 adds the new RTM_F_OFFLOAD_FAILED flag
      
      Patches #2-#3 and #4-#5 add failure notifications to IPv4 and IPv6,
      respectively
      
      Patches #6-#8 teach netdevsim to fail route installation via a new knob
      in debugfs
      
      Patch #9 extends mlxsw to mark routes with the new flag
      
      Patch #10 adds test cases for the new notification over netdevsim
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ea3c72c
    • A
      selftests: netdevsim: Test route offload failure notifications · 9ee53e37
      Amit Cohen 提交于
      Add cases to verify that when debugfs variable "fail_route_offload" is
      set, notification with "rt_offload_failed" flag is received.
      
      Extend the existing cases to verify that when sysctl
      "fib_notify_on_flag_change" is set to 2, the kernel emits notifications
      only for failed route installation.
      
      $ ./fib_notifications.sh
      TEST: IPv4 route addition				[ OK ]
      TEST: IPv4 route deletion				[ OK ]
      TEST: IPv4 route replacement				[ OK ]
      TEST: IPv4 route offload failed				[ OK ]
      TEST: IPv6 route addition				[ OK ]
      TEST: IPv6 route deletion				[ OK ]
      TEST: IPv6 route replacement				[ OK ]
      TEST: IPv6 route offload failed				[ OK ]
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee53e37
    • A
      mlxsw: spectrum_router: Set offload_failed flag · a4cb1c02
      Amit Cohen 提交于
      When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion
      fails, FIB abort is triggered.
      
      After aborting, set the appropriate hardware flag to make the kernel emit
      RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4cb1c02
    • A
      netdevsim: fib: Add debugfs to debug route offload failure · 134c7532
      Amit Cohen 提交于
      Add "fail_route_offload" flag to disallow offloading routes.
      It is needed to test "offload failed" notifications.
      
      Create the flag as part of nsim_fib_create() under fib directory and set
      it to false by default.
      
      When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and
      "fail_route_offload" value is true, set the appropriate hardware flag to
      make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED
      flag.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134c7532
    • I
      netdevsim: dev: Initialize FIB module after debugfs · f57ab5b7
      Ido Schimmel 提交于
      Initialize the dummy FIB offload module after debugfs, so that the FIB
      module could create its own directory there.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f57ab5b7
    • A
      netdevsim: fib: Do not warn if route was not found for several events · 484a4dfb
      Amit Cohen 提交于
      The next patch will add the ability to fail route offload controlled by
      debugfs variable called "fail_route_offload".
      
      If we vetoed the addition, we might get a delete or append notification
      for a route we do not have. Therefore, do not warn if route was not found.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      484a4dfb
    • A
      IPv6: Extend 'fib_notify_on_flag_change' sysctl · 6fad361a
      Amit Cohen 提交于
      Add the value '2' to 'fib_notify_on_flag_change' to allow sending
      notifications only for failed route installation.
      
      Separate value is added for such notifications because there are less of
      them, so they do not impact performance and some users will find them more
      important.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fad361a
    • A
      IPv6: Add "offload failed" indication to routes · 0c5fcf9e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv6 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib6_info' is extended with new field that indicates if route
      offload failed. Note that the new field is added using unused bit and
      therefore there is no need to increase struct size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5fcf9e
    • A
      IPv4: Extend 'fib_notify_on_flag_change' sysctl · 648106c3
      Amit Cohen 提交于
      Add the value '2' to 'fib_notify_on_flag_change' to allow sending
      notifications only for failed route installation.
      
      Separate value is added for such notifications because there are less of
      them, so they do not impact performance and some users will find them more
      important.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      648106c3
    • A
      IPv4: Add "offload failed" indication to routes · 36c5100e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv4 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib_alias', and 'struct fib_rt_info' are extended with new field
      that indicates if route offload failed. Note that the new field is added
      using unused bit and therefore there is no need to increase structs size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c5100e
    • A
      rtnetlink: Add RTM_F_OFFLOAD_FAILED flag · 49fc2513
      Amit Cohen 提交于
      The flag indicates to user space that route offload failed.
      
      Previous patch set added the ability to emit RTM_NEWROUTE notifications
      whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, but if the offload
      fails there is no indication to user-space.
      
      The flag will be used in subsequent patches by netdevsim and mlxsw to
      indicate to user space that route offload failed, so that users will
      have better visibility into the offload process.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49fc2513
    • D
      Merge tag 'mlx5-updates-2021-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 08cbabb7
      David S. Miller 提交于
      mlx5-updates-2021-02-04
      
      Vlad Buslov says:
      =================
      
      Implement support for VF tunneling
      
      Abstract
      
      Currently, mlx5 only supports configuration with tunnel endpoint IP address on
      uplink representor. Remove implicit and explicit assumptions of tunnel always
      being terminated on uplink and implement necessary infrastructure for
      configuring tunnels on VF representors and updating rules on such tunnels
      according to routing changes.
      
      SW TC model
      
      From TC perspective VF tunnel configuration requires two rules in both
      directions:
      
      TX rules
      
      1. Rule that redirects packets from UL to VF rep that has the tunnel
      endpoint IP address:
      
      $ tc -s filter show dev enp8s0f0 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac 16:c9:a0:2d:69:2c
        src_mac 0c:42:a1:58:ab:e4
        eth_type ipv4
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen
              index 3 ref 1 bind 1 installed 377 sec used 0 sec
              Action statistics:
              Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 114096 bytes 952 pkt
              backlog 0b 0p requeues 0
              cookie 878fa48d8c423fc08c3b6ca599b50a97
              no_percpu
              used_hw_stats delayed
      
      2. Rule that decapsulates the tunneled flow and redirects to destination VF
      representor:
      
      $ tc -s filter show dev vxlan_sys_4789 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac ca:2e:a7:3f:f5:0f
        src_mac 0a:40:bd:30:89:99
        eth_type ipv4
        enc_dst_ip 7.7.7.5
        enc_src_ip 7.7.7.1
        enc_key_id 98
        enc_dst_port 4789
        enc_tos 0
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  unset pipe
               index 2 ref 1 bind 1 installed 434 sec used 434 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
              index 4 ref 1 bind 1 installed 434 sec used 0 sec
              Action statistics:
              Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 129936 bytes 1082 pkt
              backlog 0b 0p requeues 0
              cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
              no_percpu
              used_hw_stats delayed
      
      RX rules
      
      1. Rule that encapsulates the tunneled flow and redirects packets from
      source VF rep to tunnel device:
      
      $ tc -s filter show dev enp8s0f0_1 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac 0a:40:bd:30:89:99
        src_mac ca:2e:a7:3f:f5:0f
        eth_type ipv4
        ip_tos 0/0x3
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  set
              src_ip 7.7.7.5
              dst_ip 7.7.7.1
              key_id 98
              dst_port 4789
              nocsum
              ttl 64 pipe
               index 1 ref 1 bind 1 installed 411 sec used 411 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              no_percpu
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen
              index 1 ref 1 bind 1 installed 411 sec used 0 sec
              Action statistics:
              Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 5615833 bytes 4028 pkt
              backlog 0b 0p requeues 0
              cookie bb406d45d343bf7ade9690ae80c7cba4
              no_percpu
              used_hw_stats delayed
      
      2. Rule that redirects from tunnel device to UL rep:
      
      $ tc -s filter show dev vxlan_sys_4789 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac ca:2e:a7:3f:f5:0f
        src_mac 0a:40:bd:30:89:99
        eth_type ipv4
        enc_dst_ip 7.7.7.5
        enc_src_ip 7.7.7.1
        enc_key_id 98
        enc_dst_port 4789
        enc_tos 0
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  unset pipe
               index 2 ref 1 bind 1 installed 434 sec used 434 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
              index 4 ref 1 bind 1 installed 434 sec used 0 sec
              Action statistics:
              Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 129936 bytes 1082 pkt
              backlog 0b 0p requeues 0
              cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
              no_percpu
              used_hw_stats delayed
      
      HW offloads model
      
      For hardware offload the goal is to mach packet on both rules without exposing
      it to software on tunnel endpoint VF. In order to achieve this for tx, TC
      implementation marks encap rules with tunnel endpoint on mlx5 VF of same eswitch
      with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE flag and adds header modification
      rule to overwrite packet source port to the value of tunnel VF. Eswitch code is
      modified to recirculate such packets after source port value is changed, which
      allows second tx rules to match.
      
      For rx path indirect table infrastructure is used to allow fully processing VF
      tunnel traffic in hardware. To implement such pipeline driver needs to program
      the hardware after matching on UL rule to overwrite source vport from UL to
      tunnel VF and recirculate the packet to the root table to allow matching on the
      rule installed on tunnel VF. For this, indirect table matches all encapsulated
      traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by
      the miss rule. Such configuration will cause packet to appear on VF representor
      instead of VF itself if packet has been matches by indirect table rule based on
      tunnel parameters but missed on second rule (after recirculation). Handle such
      case by marking packets processed by indirect table with special 0xFFF value in
      reg_c1 and extending slow table with additional flow group that matches on
      reg_c0 (source port value set by indirect tables) and reg_c1 (special 0xFFF
      mark). When creating offloads fdb tables, install one rule per VF vport to match
      on recirculated miss packets and redirect them to appropriate VF vport.
      
      Routing events
      
      In order to support routing changes and migration of tunnel device between
      different endpoint VFs, implement routing infrastructure and update it with FIB
      events. Routing entry table is introduced to mlx5 TC. Every rx and tx VF tunnel
      rule is attached to a routing entry, which is shared for rules of same tunnel.
      On FIB event the work is scheduled to delete/recreate all rules of affected
      tunnel.
      
      Note: only vxlan tunnel type is supported by this series.
      
      =================
      08cbabb7
    • H
      cxgb4: remove unused vpd_cap_addr · 4429c5fc
      Heiner Kallweit 提交于
      It is likely that this is a leftover from T3 driver heritage. cxgb4 uses
      the PCI core VPD access code that handles detection of VPD capabilities.
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4429c5fc
    • V
      net: bridge: use switchdev for port flags set through sysfs too · 8043c845
      Vladimir Oltean 提交于
      Looking through patchwork I don't see that there was any consensus to
      use switchdev notifiers only in case of netlink provided port flags but
      not sysfs (as a sort of deprecation, punishment or anything like that),
      so we should probably keep the user interface consistent in terms of
      functionality.
      
      http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/
      http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/
      
      Fixes: 3922285d ("net: bridge: Add support for offloading port attributes")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8043c845
    • P
      selftests: tc-testing: u32: Add tests covering sample option · 373e13bc
      Phil Sutter 提交于
      Kernel's key folding basically consists of shifting away least
      significant zero bits in mask and masking the resulting value with
      (divisor - 1). Test for u32's 'sample' option to behave identical.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      373e13bc
    • X
      rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket · 1a9b86c9
      Xin Long 提交于
      In rxrpc_open_socket(), now it's using sock_create_kern() and
      kernel_bind() to create a udp tunnel socket, and other kernel
      APIs to set up it. These code can be replaced with udp tunnel
      APIs udp_sock_create() and setup_udp_tunnel_sock(), and it'll
      simplify rxrpc_open_socket().
      
      Note that with this patch, the udp tunnel socket will always
      bind to a random port if transport is not provided by users,
      which is suggested by David Howells, thanks!
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a9b86c9
    • A
      net-sysfs: Add rtnl locking for getting Tx queue traffic class · b2f17564
      Alexander Duyck 提交于
      In order to access the suboordinate dev for a device we should be holding
      the rtnl_lock when outside of the transmit path. The existing code was not
      doing that for the sysfs dump function and as a result we were open to a
      possible race.
      
      To resolve that take the rtnl lock prior to accessing the sb_dev field of
      the Tx queue and release it after we have retrieved the tc for the queue.
      Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2f17564
    • W
      nfc: st-nci: Remove unnecessary variable · 796c9015
      wengjianfeng 提交于
      The variable r is defined at the beginning and initialized
      to 0 until the function returns r, and the variable r is
      not reassigned.Therefore, we do not need to define the
      variable r, just return 0 directly at the end of the function.
      Signed-off-by: Nwengjianfeng <wengjianfeng@yulong.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      796c9015
    • Y
      selftests/net: so_txtime: remove unneeded semicolon · c85b3bb7
      Yang Li 提交于
      Eliminate the following coccicheck warning:
      ./tools/testing/selftests/net/so_txtime.c:199:3-4: Unneeded semicolon
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c85b3bb7
    • A
      seg6: fool-proof the processing of SRv6 behavior attributes · 300a0fd8
      Andrea Mayer 提交于
      The set of required attributes for a given SRv6 behavior is identified
      using a bitmap stored in an unsigned long, since the initial design of SRv6
      networking in Linux. Recently the same approach has been used for
      identifying the optional attributes.
      
      However, the number of attributes supported by SRv6 behaviors depends on
      the size of the unsigned long type which changes with the architecture.
      Indeed, on a 64-bit architecture, an SRv6 behavior can support up to 64
      attributes while on a 32-bit architecture it can support at most 32
      attributes.
      
      To fool-proof the processing of SRv6 behaviors we verify, at compile time,
      that the set of all supported SRv6 attributes can be encoded into a bitmap
      stored in an unsigned long. Otherwise, kernel build fails forcing
      developers to reconsider adding a new attribute or extend the total
      number of supported attributes by the SRv6 behaviors.
      
      Moreover, we replace all patterns (1 << i) with the macro SEG6_F_ATTR(i) in
      order to address potential overflow issues caused by 32-bit signed
      arithmetic.
      
      Thanks to Colin Ian King for catching the overflow problem, providing a
      solution and inspiring this patch.
      Thanks to Jakub Kicinski for his useful suggestions during the design of
      this patch.
      
      v2:
       - remove the SEG6_LOCAL_MAX_SUPP which is not strictly needed: it can
         be derived from the unsigned long type. Thanks to David Ahern for
         pointing it out.
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210206170934.5982-1-andrea.mayer@uniroma2.itSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      300a0fd8