1. 29 1月, 2021 4 次提交
  2. 21 1月, 2021 3 次提交
  3. 08 1月, 2021 3 次提交
  4. 07 11月, 2020 11 次提交
  5. 20 10月, 2020 1 次提交
    • I
      nexthop: Fix performance regression in nexthop deletion · df6afe2f
      Ido Schimmel 提交于
      While insertion of 16k nexthops all using the same netdev ('dummy10')
      takes less than a second, deletion takes about 130 seconds:
      
      # time -p ip -b nexthop.batch
      real 0.29
      user 0.01
      sys 0.15
      
      # time -p ip link set dev dummy10 down
      real 131.03
      user 0.06
      sys 0.52
      
      This is because of repeated calls to synchronize_rcu() whenever a
      nexthop is removed from a nexthop group:
      
      # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
      ...
          b'finish_task_switch'
          b'schedule'
          b'schedule_timeout'
          b'wait_for_completion'
          b'__wait_rcu_gp'
          b'synchronize_rcu.part.0'
          b'synchronize_rcu'
          b'__remove_nexthop'
          b'remove_nexthop'
          b'nexthop_flush_dev'
          b'nh_netdev_event'
          b'raw_notifier_call_chain'
          b'call_netdevice_notifiers_info'
          b'__dev_notify_flags'
          b'dev_change_flags'
          b'do_setlink'
          b'__rtnl_newlink'
          b'rtnl_newlink'
          b'rtnetlink_rcv_msg'
          b'netlink_rcv_skb'
          b'rtnetlink_rcv'
          b'netlink_unicast'
          b'netlink_sendmsg'
          b'____sys_sendmsg'
          b'___sys_sendmsg'
          b'__sys_sendmsg'
          b'__x64_sys_sendmsg'
          b'do_syscall_64'
          b'entry_SYSCALL_64_after_hwframe'
          -                ip (277)
              126554955
      
      Since nexthops are always deleted under RTNL, synchronize_net() can be
      used instead. It will call synchronize_rcu_expedited() which only blocks
      for several microseconds as opposed to multiple milliseconds like
      synchronize_rcu().
      
      With this patch deletion of 16k nexthops takes less than a second:
      
      # time -p ip link set dev dummy10 down
      real 0.12
      user 0.00
      sys 0.04
      
      Tested with fib_nexthops.sh which includes torture tests that prompted
      the initial change:
      
      # ./fib_nexthops.sh
      ...
      Tests passed: 134
      Tests failed:   0
      
      Fixes: 90f33bff ("nexthops: don't modify published nexthop groups")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      df6afe2f
  6. 16 9月, 2020 2 次提交
  7. 27 8月, 2020 5 次提交
  8. 23 8月, 2020 1 次提交
    • N
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov 提交于
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeaac363
  9. 11 6月, 2020 1 次提交
    • D
      nexthop: Fix fdb labeling for groups · ce9ac056
      David Ahern 提交于
      fdb nexthops are marked with a flag. For standalone nexthops, a flag was
      added to the nh_info struct. For groups that flag was added to struct
      nexthop when it should have been added to the group information. Fix
      by removing the flag from the nexthop struct and adding a flag to nh_group
      that mirrors nh_info and is really only a caching of the individual types.
      Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
      internal code to use the flag from either nh_info or nh_group.
      
      v2
      - propagate fdb_nh in remove_nh_grp_entry
      
      Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce9ac056
  10. 02 6月, 2020 1 次提交
  11. 28 5月, 2020 1 次提交
  12. 27 5月, 2020 2 次提交
  13. 23 5月, 2020 2 次提交
  14. 21 5月, 2020 1 次提交
    • S
      net: nlmsg_cancel() if put fails for nhmsg · d69100b8
      Stephen Worley 提交于
      Fixes data remnant seen when we fail to reserve space for a
      nexthop group during a larger dump.
      
      If we fail the reservation, we goto nla_put_failure and
      cancel the message.
      
      Reproduce with the following iproute2 commands:
      =====================
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      ip link add dummy20 type dummy
      ip link add dummy21 type dummy
      ip link add dummy22 type dummy
      ip link add dummy23 type dummy
      ip link add dummy24 type dummy
      ip link add dummy25 type dummy
      ip link add dummy26 type dummy
      ip link add dummy27 type dummy
      ip link add dummy28 type dummy
      ip link add dummy29 type dummy
      ip link add dummy30 type dummy
      ip link add dummy31 type dummy
      ip link add dummy32 type dummy
      
      ip link set dummy1 up
      ip link set dummy2 up
      ip link set dummy3 up
      ip link set dummy4 up
      ip link set dummy5 up
      ip link set dummy6 up
      ip link set dummy7 up
      ip link set dummy8 up
      ip link set dummy9 up
      ip link set dummy10 up
      ip link set dummy11 up
      ip link set dummy12 up
      ip link set dummy13 up
      ip link set dummy14 up
      ip link set dummy15 up
      ip link set dummy16 up
      ip link set dummy17 up
      ip link set dummy18 up
      ip link set dummy19 up
      ip link set dummy20 up
      ip link set dummy21 up
      ip link set dummy22 up
      ip link set dummy23 up
      ip link set dummy24 up
      ip link set dummy25 up
      ip link set dummy26 up
      ip link set dummy27 up
      ip link set dummy28 up
      ip link set dummy29 up
      ip link set dummy30 up
      ip link set dummy31 up
      ip link set dummy32 up
      
      ip link set dummy33 up
      ip link set dummy34 up
      
      ip link set vrf-red up
      ip link set vrf-blue up
      
      ip link set dummyVRFred up
      ip link set dummyVRFblue up
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      ip ro add 1.1.1.20/32 dev dummy20
      ip ro add 1.1.1.21/32 dev dummy21
      ip ro add 1.1.1.22/32 dev dummy22
      ip ro add 1.1.1.23/32 dev dummy23
      ip ro add 1.1.1.24/32 dev dummy24
      ip ro add 1.1.1.25/32 dev dummy25
      ip ro add 1.1.1.26/32 dev dummy26
      ip ro add 1.1.1.27/32 dev dummy27
      ip ro add 1.1.1.28/32 dev dummy28
      ip ro add 1.1.1.29/32 dev dummy29
      ip ro add 1.1.1.30/32 dev dummy30
      ip ro add 1.1.1.31/32 dev dummy31
      ip ro add 1.1.1.32/32 dev dummy32
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      ip next add id 20 via 1.1.1.20 dev dummy20
      ip next add id 21 via 1.1.1.21 dev dummy21
      ip next add id 22 via 1.1.1.22 dev dummy22
      ip next add id 23 via 1.1.1.23 dev dummy23
      ip next add id 24 via 1.1.1.24 dev dummy24
      ip next add id 25 via 1.1.1.25 dev dummy25
      ip next add id 26 via 1.1.1.26 dev dummy26
      ip next add id 27 via 1.1.1.27 dev dummy27
      ip next add id 28 via 1.1.1.28 dev dummy28
      ip next add id 29 via 1.1.1.29 dev dummy29
      ip next add id 30 via 1.1.1.30 dev dummy30
      ip next add id 31 via 1.1.1.31 dev dummy31
      ip next add id 32 via 1.1.1.32 dev dummy32
      
      i=100
      
      while [ $i -le 200 ]
      do
      ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      
      	echo $i
      
      	((i++))
      
      done
      
      ip next add id 999 group 1/2/3/4/5/6
      
      ip next ls
      
      ========================
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d69100b8
  15. 18 5月, 2020 1 次提交
  16. 29 4月, 2020 1 次提交
    • R
      net: ipv4: add sysctl for nexthop api compatibility mode · 4f80116d
      Roopa Prabhu 提交于
      Current route nexthop API maintains user space compatibility
      with old route API by default. Dumps and netlink notifications
      support both new and old API format. In systems which have
      moved to the new API, this compatibility mode cancels some
      of the performance benefits provided by the new nexthop API.
      
      This patch adds new sysctl nexthop_compat_mode which is on
      by default but provides the ability to turn off compatibility
      mode allowing systems to run entirely with the new routing
      API. Old route API behaviour and support is not modified by this
      sysctl.
      
      Uses a single sysctl to cover both ipv4 and ipv6 following
      other sysctls. Covers dumps and delete notifications as
      suggested by David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f80116d