1. 13 7月, 2022 1 次提交
  2. 12 4月, 2022 1 次提交
  3. 01 4月, 2022 1 次提交
    • N
      net: ipv4: fix route with nexthop object delete warning · 6bf92d70
      Nikolay Aleksandrov 提交于
      FRR folks have hit a kernel warning[1] while deleting routes[2] which is
      caused by trying to delete a route pointing to a nexthop id without
      specifying nhid but matching on an interface. That is, a route is found
      but we hit a warning while matching it. The warning is from
      fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
      with nexthop object. The call chain is:
       inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
      nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
      the fib_info and triggering the warning). The fix is to not do any
      matching in that branch if the fi has a nexthop object because those are
      managed separately. I.e. we should match when deleting without nh spec and
      should fail when deleting a nexthop route with old-style nh spec because
      nexthop objects are managed separately, e.g.:
       $ ip r show 1.2.3.4/32
       1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
      
       $ ip r del 1.2.3.4/32
       $ ip r del 1.2.3.4/32 nhid 12
       <both should work>
      
       $ ip r del 1.2.3.4/32 dev dummy0
       <should fail with ESRCH>
      
      [1]
       [  523.462226] ------------[ cut here ]------------
       [  523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
       [  523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
       [  523.462274]  videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
       [  523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P           OE     5.16.18-200.fc35.x86_64 #1
       [  523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
       [  523.462303] RIP: 0010:fib_nh_match+0x210/0x460
       [  523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
       [  523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
       [  523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
       [  523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
       [  523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
       [  523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
       [  523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
       [  523.462311] FS:  00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
       [  523.462313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
       [  523.462315] Call Trace:
       [  523.462316]  <TASK>
       [  523.462320]  fib_table_delete+0x1a9/0x310
       [  523.462323]  inet_rtm_delroute+0x93/0x110
       [  523.462325]  rtnetlink_rcv_msg+0x133/0x370
       [  523.462327]  ? _copy_to_iter+0xb5/0x6f0
       [  523.462330]  ? rtnl_calcit.isra.0+0x110/0x110
       [  523.462331]  netlink_rcv_skb+0x50/0xf0
       [  523.462334]  netlink_unicast+0x211/0x330
       [  523.462336]  netlink_sendmsg+0x23f/0x480
       [  523.462338]  sock_sendmsg+0x5e/0x60
       [  523.462340]  ____sys_sendmsg+0x22c/0x270
       [  523.462341]  ? import_iovec+0x17/0x20
       [  523.462343]  ? sendmsg_copy_msghdr+0x59/0x90
       [  523.462344]  ? __mod_lruvec_page_state+0x85/0x110
       [  523.462348]  ___sys_sendmsg+0x81/0xc0
       [  523.462350]  ? netlink_seq_start+0x70/0x70
       [  523.462352]  ? __dentry_kill+0x13a/0x180
       [  523.462354]  ? __fput+0xff/0x250
       [  523.462356]  __sys_sendmsg+0x49/0x80
       [  523.462358]  do_syscall_64+0x3b/0x90
       [  523.462361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [  523.462364] RIP: 0033:0x7f24552aa337
       [  523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
       [  523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       [  523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
       [  523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
       [  523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
       [  523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
       [  523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
       [  523.462373]  </TASK>
       [  523.462374] ---[ end trace ba537bc16f6bf4ed ]---
      
      [2] https://github.com/FRRouting/frr/issues/6412
      
      Fixes: 4c7e8084 ("ipv4: Plumb support for nexthop object in a fib_info")
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bf92d70
  4. 16 3月, 2022 1 次提交
    • D
      net: Add l3mdev index to flow struct and avoid oif reset for port devices · 40867d74
      David Ahern 提交于
      The fundamental premise of VRF and l3mdev core code is binding a socket
      to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
      Legacy code resets flowi_oif to the l3mdev losing any original port
      device binding. Ben (among others) has demonstrated use cases where the
      original port device binding is important and needs to be retained.
      This patch handles that by adding a new entry to the common flow struct
      that can indicate the l3mdev index for later rule and table matching
      avoiding the need to reset flowi_oif.
      
      In addition to allowing more use cases that require port device binds,
      this patch brings a few datapath simplications:
      
      1. l3mdev_fib_rule_match is only called when walking fib rules and
         always after l3mdev_update_flow. That allows an optimization to bail
         early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
         only that index needs to be checked for the FIB table id.
      
      2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
         (e.g., VRF) device. By resetting flowi_oif only for this case the
         FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
         removing several checks in the datapath. The flowi_iif path can be
         simplified to only be called if the it is not loopback (loopback can
         not be assigned to an L3 domain) and the l3mdev index is not already
         set.
      
      3. Avoid another device lookup in the output path when the fib lookup
         returns a reject failure.
      
      Note: 2 functional tests for local traffic with reject fib rules are
      updated to reflect the new direct failure at FIB lookup time for ping
      rather than the failure on packet path. The current code fails like this:
      
          HINT: Fails since address on vrf device is out of device scope
          COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
          ping: Warning: source address might be selected on device other than: eth1
          PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.
      
          --- 172.16.3.1 ping statistics ---
          1 packets transmitted, 0 received, 100% packet loss, time 0ms
      
      where the test now directly fails:
      
          HINT: Fails since address on vrf device is out of device scope
          COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
          ping: connect: No route to host
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Tested-by: NBen Greear <greearb@candelatech.com>
      Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      40867d74
  5. 18 2月, 2022 1 次提交
    • E
      ipv4: fix data races in fib_alias_hw_flags_set · 9fcf986c
      Eric Dumazet 提交于
      fib_alias_hw_flags_set() can be used by concurrent threads,
      and is only RCU protected.
      
      We need to annotate accesses to following fields of struct fib_alias:
      
          offload, trap, offload_failed
      
      Because of READ_ONCE()WRITE_ONCE() limitations, make these
      field u8.
      
      BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
      
      read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
       fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
       fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e8-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 90b93f1b ("ipv4: Add "offload" and "trap" indications to routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9fcf986c
  6. 08 2月, 2022 1 次提交
  7. 25 1月, 2022 1 次提交
  8. 20 1月, 2022 2 次提交
  9. 16 1月, 2022 1 次提交
    • E
      ipv4: update fib_info_cnt under spinlock protection · 0a6e6b3c
      Eric Dumazet 提交于
      In the past, free_fib_info() was supposed to be called
      under RTNL protection.
      
      This eventually was no longer the case.
      
      Instead of enforcing RTNL it seems we simply can
      move fib_info_cnt changes to occur when fib_info_lock
      is held.
      
      v2: David Laight suggested to update fib_info_cnt
      only when an entry is added/deleted to/from the hash table,
      as fib_info_cnt is used to make sure hash table size
      is optimal.
      
      BUG: KCSAN: data-race in fib_create_info / free_fib_info
      
      write to 0xffffffff86e243a0 of 4 bytes by task 26429 on cpu 0:
       fib_create_info+0xe78/0x3440 net/ipv4/fib_semantics.c:1428
       fib_table_insert+0x148/0x10c0 net/ipv4/fib_trie.c:1224
       fib_magic+0x195/0x1e0 net/ipv4/fib_frontend.c:1087
       fib_add_ifaddr+0xd0/0x2e0 net/ipv4/fib_frontend.c:1109
       fib_netdev_event+0x178/0x510 net/ipv4/fib_frontend.c:1466
       notifier_call_chain kernel/notifier.c:83 [inline]
       raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:391
       __dev_notify_flags+0x1d3/0x3b0
       dev_change_flags+0xa2/0xc0 net/core/dev.c:8872
       do_setlink+0x810/0x2410 net/core/rtnetlink.c:2719
       rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3396 [inline]
       rtnl_newlink+0xb10/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2496
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x726/0x840 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1:
       free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252
       fib_info_put include/net/ip_fib.h:575 [inline]
       nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline]
       nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       process_scheduled_works kernel/workqueue.c:2361 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2447
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000d2d -> 0x00000d2e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 48bb9eb4 ("netdevsim: fib: Add dummy implementation for FIB offload")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a6e6b3c
  10. 31 12月, 2021 3 次提交
  11. 08 12月, 2021 1 次提交
  12. 02 12月, 2021 1 次提交
  13. 24 9月, 2021 1 次提交
  14. 05 8月, 2021 1 次提交
  15. 03 8月, 2021 1 次提交
  16. 30 7月, 2021 1 次提交
  17. 18 5月, 2021 1 次提交
  18. 09 2月, 2021 1 次提交
    • A
      IPv4: Add "offload failed" indication to routes · 36c5100e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv4 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib_alias', and 'struct fib_rt_info' are extended with new field
      that indicates if route offload failed. Note that the new field is added
      using unused bit and therefore there is no need to increase structs size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c5100e
  19. 03 2月, 2021 2 次提交
  20. 17 11月, 2020 1 次提交
  21. 12 11月, 2020 1 次提交
  22. 07 11月, 2020 1 次提交
    • I
      rtnetlink: Add RTNH_F_TRAP flag · 968a83f8
      Ido Schimmel 提交于
      The flag indicates to user space that the nexthop is not programmed to
      forward packets in hardware, but rather to trap them to the CPU. This is
      needed, for example, when the MAC of the nexthop neighbour is not
      resolved and packets should reach the CPU to trigger neighbour
      resolution.
      
      The flag will be used in subsequent patches by netdevsim to test nexthop
      objects programming to device drivers and in the future by mlxsw as
      well.
      
      Changes since RFC:
      * Reword commit message
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      968a83f8
  23. 19 6月, 2020 1 次提交
    • G
      net: Fix the arp error in some cases · 5eea3a63
      guodeqing 提交于
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472 ("net: Use passed in table for nexthop lookups")
      Signed-off-by: Nguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5eea3a63
  24. 29 4月, 2020 1 次提交
    • R
      net: ipv4: add sysctl for nexthop api compatibility mode · 4f80116d
      Roopa Prabhu 提交于
      Current route nexthop API maintains user space compatibility
      with old route API by default. Dumps and netlink notifications
      support both new and old API format. In systems which have
      moved to the new API, this compatibility mode cancels some
      of the performance benefits provided by the new nexthop API.
      
      This patch adds new sysctl nexthop_compat_mode which is on
      by default but provides the ability to turn off compatibility
      mode allowing systems to run entirely with the new routing
      API. Old route API behaviour and support is not modified by this
      sysctl.
      
      Uses a single sysctl to cover both ipv4 and ipv6 following
      other sysctls. Covers dumps and delete notifications as
      suggested by David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f80116d
  25. 23 4月, 2020 1 次提交
    • D
      ipv4: Update fib_select_default to handle nexthop objects · 7c74b0be
      David Ahern 提交于
      A user reported [0] hitting the WARN_ON in fib_info_nh:
      
          [ 8633.839816] ------------[ cut here ]------------
          [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
          ...
          [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
          ...
          [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
          [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
          [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
          [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
          [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
          [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
          [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
          [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
          [ 8633.839867] Call Trace:
          [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
          [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
          [ 8633.839876]  ip_route_output_flow+0x1a/0x50
          [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
          [ 8633.839880]  ip4_datagram_connect+0x28/0x40
          [ 8633.839882]  __sys_connect+0xd6/0x100
          ...
      
      The WARN_ON is triggered in fib_select_default which is invoked when
      there are multiple default routes. Update the function to use
      fib_info_nhc and convert the nexthop checks to use fib_nh_common.
      
      Add test case that covers the affected code path.
      
      [0] https://github.com/FRRouting/frr/issues/6089
      
      Fixes: 493ced1a ("ipv4: Allow routes to use nexthop objects")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c74b0be
  26. 30 3月, 2020 1 次提交
  27. 13 3月, 2020 1 次提交
  28. 15 1月, 2020 2 次提交
    • I
      ipv4: Add "offload" and "trap" indications to routes · 90b93f1b
      Ido Schimmel 提交于
      When performing L3 offload, routes and nexthops are usually programmed
      into two different tables in the underlying device. Therefore, the fact
      that a nexthop resides in hardware does not necessarily mean that all
      the associated routes also reside in hardware and vice-versa.
      
      While the kernel can signal to user space the presence of a nexthop in
      hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
      for routes. In addition, the fact that a route resides in hardware does
      not necessarily mean that the traffic is offloaded. For example,
      unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
      packets to the CPU so that the kernel will be able to generate the
      appropriate ICMP error packet.
      
      This patch adds an "offload" and "trap" indications to IPv4 routes, so
      that users will have better visibility into the offload process.
      
      'struct fib_alias' is extended with two new fields that indicate if the
      route resides in hardware or not and if it is offloading traffic from
      the kernel or trapping packets to it. Note that the new fields are added
      in the 6 bytes hole and therefore the struct still fits in a single
      cache line [1].
      
      Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
      route's key in order to set the flags.
      
      The indications are dumped to user space via a new flags (i.e.,
      'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
      ancillary header.
      
      v2:
      * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
      
      [1]
      struct fib_alias {
              struct hlist_node  fa_list;                      /*     0    16 */
              struct fib_info *          fa_info;              /*    16     8 */
              u8                         fa_tos;               /*    24     1 */
              u8                         fa_type;              /*    25     1 */
              u8                         fa_state;             /*    26     1 */
              u8                         fa_slen;              /*    27     1 */
              u32                        tb_id;                /*    28     4 */
              s16                        fa_default;           /*    32     2 */
              u8                         offload:1;            /*    34: 0  1 */
              u8                         trap:1;               /*    34: 1  1 */
              u8                         unused:6;             /*    34: 2  1 */
      
              /* XXX 5 bytes hole, try to pack */
      
              struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */
      
              /* size: 56, cachelines: 1, members: 12 */
              /* sum members: 50, holes: 1, sum holes: 5 */
              /* sum bitfield members: 8 bits (1 bytes) */
              /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
              /* last cacheline: 56 bytes */
      } __attribute__((__aligned__(8)));
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90b93f1b
    • I
      ipv4: Encapsulate function arguments in a struct · 1e301fd0
      Ido Schimmel 提交于
      fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
      using the passed arguments. Currently, the function takes 11 arguments,
      6 of which are attributes of the route being dumped (e.g., prefix, TOS).
      
      The next patch will need the function to also dump to user space an
      indication if the route is present in hardware or not. Instead of
      passing yet another argument, change the function to take a struct
      containing the different route attributes.
      
      v2:
      * Name last argument of fib_dump_info()
      * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
        later be passed to fib_alias_hw_flags_set()
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e301fd0
  29. 08 11月, 2019 1 次提交
  30. 05 9月, 2019 1 次提交
    • D
      net: Properly update v4 routes with v6 nexthop · 7bdf4de1
      Donald Sharp 提交于
      When creating a v4 route that uses a v6 nexthop from a nexthop group.
      Allow the kernel to properly send the nexthop as v6 via the RTA_VIA
      attribute.
      
      Broken behavior:
      
      $ ip nexthop add via fe80::9 dev eth0
      $ ip nexthop show
      id 1 via fe80::9 dev eth0 scope link
      $ ip route add 4.5.6.7/32 nhid 1
      $ ip route show
      default via 10.0.2.2 dev eth0
      4.5.6.7 nhid 1 via 254.128.0.0 dev eth0
      10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
      $
      
      Fixed behavior:
      
      $ ip nexthop add via fe80::9 dev eth0
      $ ip nexthop show
      id 1 via fe80::9 dev eth0 scope link
      $ ip route add 4.5.6.7/32 nhid 1
      $ ip route show
      default via 10.0.2.2 dev eth0
      4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0
      10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
      $
      
      v2, v3: Addresses code review comments from David Ahern
      
      Fixes: dcb1ecb5 (“ipv4: Prepare for fib6_nh from a nexthop object”)
      Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bdf4de1
  31. 11 6月, 2019 2 次提交
  32. 10 6月, 2019 1 次提交
  33. 06 6月, 2019 1 次提交
  34. 05 6月, 2019 1 次提交
    • D
      ipv4: Plumb support for nexthop object in a fib_info · 4c7e8084
      David Ahern 提交于
      Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
      fib_info side of the nexthop <-> fib_info relationship.
      
      Add fi_list list_head to 'struct nexthop' to track fib_info entries
      using a nexthop instance. Add __remove_nexthop_fib and add it to
      __remove_nexthop to walk the new list_head and mark those fib entries
      as dead when the nexthop is deleted.
      
      Add a few nexthop helpers for use when a nexthop is added to fib_info:
      - nexthop_cmp to determine if 2 nexthops are the same
      - nexthop_path_fib_result to select a path for a multipath
        'struct nexthop'
      - nexthop_fib_nhc to select a specific fib_nh_common within a
        multipath 'struct nexthop'
      
      Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
      a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
      case.
      
      Update the fib_info functions to check for fi->nh and take a different
      path as needed:
      - free_fib_info_rcu - put the nexthop object reference
      - fib_release_info - remove the fib_info from the nexthop's fi_list
      - nh_comp - use nexthop_cmp when either fib_info references a nexthop
        object
      - fib_info_hashfn - use the nexthop id for the hashing vs the oif of
        each fib_nh in a fib_info
      - fib_nlmsg_size - add space for the RTA_NH_ID attribute
      - fib_create_info - verify nexthop reference can be taken, verify
        nexthop spec is valid for fib entry, and add fib_info to fi_list for
        a nexthop
      - fib_select_multipath - use the new nexthop_path_fib_result to select a
        path when nexthop objects are used
      - fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
        it the same as a fib entry using 'blackhole'
      
      The bulk of the changes are in fib_semantics.c and most of that is
      moving the existing change_nexthops into an else branch.
      
      Update the nexthop code to walk fi_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c7e8084