1. 12 3月, 2021 1 次提交
    • P
      nexthop: Pass nh_config to replace_nexthop() · 597f48e4
      Petr Machata 提交于
      Currently, replace assumes that the new group that is given is a
      fully-formed object. But mpath groups really only have one attribute, and
      that is the constituent next hop configuration. This may not be universally
      true. From the usability perspective, it is desirable to allow the replace
      operation to adjust just the constituent next hop configuration and leave
      the group attributes as such intact.
      
      But the object that keeps track of whether an attribute was or was not
      given is the nh_config object, not the next hop or next-hop group. To allow
      (selective) attribute updates during NH group replacement, propagate `cfg'
      to replace_nexthop() and further to replace_nexthop_grp().
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597f48e4
  2. 05 3月, 2021 1 次提交
    • I
      nexthop: Do not flush blackhole nexthops when loopback goes down · 76c03bf8
      Ido Schimmel 提交于
      As far as user space is concerned, blackhole nexthops do not have a
      nexthop device and therefore should not be affected by the
      administrative or carrier state of any netdev.
      
      However, when the loopback netdev goes down all the blackhole nexthops
      are flushed. This happens because internally the kernel associates
      blackhole nexthops with the loopback netdev.
      
      This behavior is both confusing to those not familiar with kernel
      internals and also diverges from the legacy API where blackhole IPv4
      routes are not flushed when the loopback netdev goes down:
      
       # ip route add blackhole 198.51.100.0/24
       # ip link set dev lo down
       # ip route show 198.51.100.0/24
       blackhole 198.51.100.0/24
      
      Blackhole IPv6 routes are flushed, but at least user space knows that
      they are associated with the loopback netdev:
      
       # ip -6 route show 2001:db8:1::/64
       blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium
      
      Fix this by only flushing blackhole nexthops when the loopback netdev is
      unregistered.
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reported-by: NDonald Sharp <sharpd@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76c03bf8
  3. 29 1月, 2021 12 次提交
  4. 21 1月, 2021 3 次提交
  5. 08 1月, 2021 3 次提交
  6. 07 11月, 2020 11 次提交
  7. 20 10月, 2020 1 次提交
    • I
      nexthop: Fix performance regression in nexthop deletion · df6afe2f
      Ido Schimmel 提交于
      While insertion of 16k nexthops all using the same netdev ('dummy10')
      takes less than a second, deletion takes about 130 seconds:
      
      # time -p ip -b nexthop.batch
      real 0.29
      user 0.01
      sys 0.15
      
      # time -p ip link set dev dummy10 down
      real 131.03
      user 0.06
      sys 0.52
      
      This is because of repeated calls to synchronize_rcu() whenever a
      nexthop is removed from a nexthop group:
      
      # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
      ...
          b'finish_task_switch'
          b'schedule'
          b'schedule_timeout'
          b'wait_for_completion'
          b'__wait_rcu_gp'
          b'synchronize_rcu.part.0'
          b'synchronize_rcu'
          b'__remove_nexthop'
          b'remove_nexthop'
          b'nexthop_flush_dev'
          b'nh_netdev_event'
          b'raw_notifier_call_chain'
          b'call_netdevice_notifiers_info'
          b'__dev_notify_flags'
          b'dev_change_flags'
          b'do_setlink'
          b'__rtnl_newlink'
          b'rtnl_newlink'
          b'rtnetlink_rcv_msg'
          b'netlink_rcv_skb'
          b'rtnetlink_rcv'
          b'netlink_unicast'
          b'netlink_sendmsg'
          b'____sys_sendmsg'
          b'___sys_sendmsg'
          b'__sys_sendmsg'
          b'__x64_sys_sendmsg'
          b'do_syscall_64'
          b'entry_SYSCALL_64_after_hwframe'
          -                ip (277)
              126554955
      
      Since nexthops are always deleted under RTNL, synchronize_net() can be
      used instead. It will call synchronize_rcu_expedited() which only blocks
      for several microseconds as opposed to multiple milliseconds like
      synchronize_rcu().
      
      With this patch deletion of 16k nexthops takes less than a second:
      
      # time -p ip link set dev dummy10 down
      real 0.12
      user 0.00
      sys 0.04
      
      Tested with fib_nexthops.sh which includes torture tests that prompted
      the initial change:
      
      # ./fib_nexthops.sh
      ...
      Tests passed: 134
      Tests failed:   0
      
      Fixes: 90f33bff ("nexthops: don't modify published nexthop groups")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      df6afe2f
  8. 16 9月, 2020 2 次提交
  9. 27 8月, 2020 5 次提交
  10. 23 8月, 2020 1 次提交
    • N
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov 提交于
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeaac363