1. 08 1月, 2021 2 次提交
  2. 07 11月, 2020 11 次提交
  3. 20 10月, 2020 1 次提交
    • I
      nexthop: Fix performance regression in nexthop deletion · df6afe2f
      Ido Schimmel 提交于
      While insertion of 16k nexthops all using the same netdev ('dummy10')
      takes less than a second, deletion takes about 130 seconds:
      
      # time -p ip -b nexthop.batch
      real 0.29
      user 0.01
      sys 0.15
      
      # time -p ip link set dev dummy10 down
      real 131.03
      user 0.06
      sys 0.52
      
      This is because of repeated calls to synchronize_rcu() whenever a
      nexthop is removed from a nexthop group:
      
      # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
      ...
          b'finish_task_switch'
          b'schedule'
          b'schedule_timeout'
          b'wait_for_completion'
          b'__wait_rcu_gp'
          b'synchronize_rcu.part.0'
          b'synchronize_rcu'
          b'__remove_nexthop'
          b'remove_nexthop'
          b'nexthop_flush_dev'
          b'nh_netdev_event'
          b'raw_notifier_call_chain'
          b'call_netdevice_notifiers_info'
          b'__dev_notify_flags'
          b'dev_change_flags'
          b'do_setlink'
          b'__rtnl_newlink'
          b'rtnl_newlink'
          b'rtnetlink_rcv_msg'
          b'netlink_rcv_skb'
          b'rtnetlink_rcv'
          b'netlink_unicast'
          b'netlink_sendmsg'
          b'____sys_sendmsg'
          b'___sys_sendmsg'
          b'__sys_sendmsg'
          b'__x64_sys_sendmsg'
          b'do_syscall_64'
          b'entry_SYSCALL_64_after_hwframe'
          -                ip (277)
              126554955
      
      Since nexthops are always deleted under RTNL, synchronize_net() can be
      used instead. It will call synchronize_rcu_expedited() which only blocks
      for several microseconds as opposed to multiple milliseconds like
      synchronize_rcu().
      
      With this patch deletion of 16k nexthops takes less than a second:
      
      # time -p ip link set dev dummy10 down
      real 0.12
      user 0.00
      sys 0.04
      
      Tested with fib_nexthops.sh which includes torture tests that prompted
      the initial change:
      
      # ./fib_nexthops.sh
      ...
      Tests passed: 134
      Tests failed:   0
      
      Fixes: 90f33bff ("nexthops: don't modify published nexthop groups")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      df6afe2f
  4. 16 9月, 2020 2 次提交
  5. 27 8月, 2020 5 次提交
  6. 23 8月, 2020 1 次提交
    • N
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov 提交于
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeaac363
  7. 11 6月, 2020 1 次提交
    • D
      nexthop: Fix fdb labeling for groups · ce9ac056
      David Ahern 提交于
      fdb nexthops are marked with a flag. For standalone nexthops, a flag was
      added to the nh_info struct. For groups that flag was added to struct
      nexthop when it should have been added to the group information. Fix
      by removing the flag from the nexthop struct and adding a flag to nh_group
      that mirrors nh_info and is really only a caching of the individual types.
      Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
      internal code to use the flag from either nh_info or nh_group.
      
      v2
      - propagate fdb_nh in remove_nh_grp_entry
      
      Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce9ac056
  8. 02 6月, 2020 1 次提交
  9. 28 5月, 2020 1 次提交
  10. 27 5月, 2020 2 次提交
  11. 23 5月, 2020 2 次提交
  12. 21 5月, 2020 1 次提交
    • S
      net: nlmsg_cancel() if put fails for nhmsg · d69100b8
      Stephen Worley 提交于
      Fixes data remnant seen when we fail to reserve space for a
      nexthop group during a larger dump.
      
      If we fail the reservation, we goto nla_put_failure and
      cancel the message.
      
      Reproduce with the following iproute2 commands:
      =====================
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      ip link add dummy20 type dummy
      ip link add dummy21 type dummy
      ip link add dummy22 type dummy
      ip link add dummy23 type dummy
      ip link add dummy24 type dummy
      ip link add dummy25 type dummy
      ip link add dummy26 type dummy
      ip link add dummy27 type dummy
      ip link add dummy28 type dummy
      ip link add dummy29 type dummy
      ip link add dummy30 type dummy
      ip link add dummy31 type dummy
      ip link add dummy32 type dummy
      
      ip link set dummy1 up
      ip link set dummy2 up
      ip link set dummy3 up
      ip link set dummy4 up
      ip link set dummy5 up
      ip link set dummy6 up
      ip link set dummy7 up
      ip link set dummy8 up
      ip link set dummy9 up
      ip link set dummy10 up
      ip link set dummy11 up
      ip link set dummy12 up
      ip link set dummy13 up
      ip link set dummy14 up
      ip link set dummy15 up
      ip link set dummy16 up
      ip link set dummy17 up
      ip link set dummy18 up
      ip link set dummy19 up
      ip link set dummy20 up
      ip link set dummy21 up
      ip link set dummy22 up
      ip link set dummy23 up
      ip link set dummy24 up
      ip link set dummy25 up
      ip link set dummy26 up
      ip link set dummy27 up
      ip link set dummy28 up
      ip link set dummy29 up
      ip link set dummy30 up
      ip link set dummy31 up
      ip link set dummy32 up
      
      ip link set dummy33 up
      ip link set dummy34 up
      
      ip link set vrf-red up
      ip link set vrf-blue up
      
      ip link set dummyVRFred up
      ip link set dummyVRFblue up
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      ip ro add 1.1.1.20/32 dev dummy20
      ip ro add 1.1.1.21/32 dev dummy21
      ip ro add 1.1.1.22/32 dev dummy22
      ip ro add 1.1.1.23/32 dev dummy23
      ip ro add 1.1.1.24/32 dev dummy24
      ip ro add 1.1.1.25/32 dev dummy25
      ip ro add 1.1.1.26/32 dev dummy26
      ip ro add 1.1.1.27/32 dev dummy27
      ip ro add 1.1.1.28/32 dev dummy28
      ip ro add 1.1.1.29/32 dev dummy29
      ip ro add 1.1.1.30/32 dev dummy30
      ip ro add 1.1.1.31/32 dev dummy31
      ip ro add 1.1.1.32/32 dev dummy32
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      ip next add id 20 via 1.1.1.20 dev dummy20
      ip next add id 21 via 1.1.1.21 dev dummy21
      ip next add id 22 via 1.1.1.22 dev dummy22
      ip next add id 23 via 1.1.1.23 dev dummy23
      ip next add id 24 via 1.1.1.24 dev dummy24
      ip next add id 25 via 1.1.1.25 dev dummy25
      ip next add id 26 via 1.1.1.26 dev dummy26
      ip next add id 27 via 1.1.1.27 dev dummy27
      ip next add id 28 via 1.1.1.28 dev dummy28
      ip next add id 29 via 1.1.1.29 dev dummy29
      ip next add id 30 via 1.1.1.30 dev dummy30
      ip next add id 31 via 1.1.1.31 dev dummy31
      ip next add id 32 via 1.1.1.32 dev dummy32
      
      i=100
      
      while [ $i -le 200 ]
      do
      ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      
      	echo $i
      
      	((i++))
      
      done
      
      ip next add id 999 group 1/2/3/4/5/6
      
      ip next ls
      
      ========================
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d69100b8
  13. 18 5月, 2020 1 次提交
  14. 29 4月, 2020 2 次提交
  15. 13 3月, 2020 1 次提交
  16. 27 1月, 2020 1 次提交
    • S
      net: include struct nhmsg size in nh nlmsg size · f9e95555
      Stephen Worley 提交于
      Include the size of struct nhmsg size when calculating
      how much of a payload to allocate in a new netlink nexthop
      notification message.
      
      Without this, we will fail to fill the skbuff at certain nexthop
      group sizes.
      
      You can reproduce the failure with the following iproute2 commands:
      
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      
      ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      ip next del id 1111
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: NStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9e95555
  17. 22 11月, 2019 1 次提交
  18. 23 8月, 2019 1 次提交
  19. 11 6月, 2019 2 次提交
    • D
      nexthops: add support for replace · 7bf4796d
      David Ahern 提交于
      Add support for atomically upating a nexthop config.
      
      When updating a nexthop, walk the lists of associated fib entries and
      verify the new config is valid. Replace is done by swapping nh_info
      for single nexthops - new config is applied to old nexthop struct, and
      old config is moved to new nexthop struct. For nexthop groups the same
      applies but for nh_group. In addition for groups the nh_parent reference
      needs to be updated. The old config is released by calling __remove_nexthop
      on the 'new' nexthop which now has the old config. This is done to avoid
      messing around with the list_heads that track which fib entries are
      using the nexthop.
      
      After the swap of config data, bump the sequence counters for FIB entries
      to invalidate any dst entries and send notifications to userspace. The
      notifications include the new nexthop spec as well as any fib entries
      using the updated nexthop struct.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bf4796d
    • D
      nexthops: Add ipv6 helper to walk all fib6_nh in a nexthop struct · f88c9aa1
      David Ahern 提交于
      IPv6 has traditionally had a single fib6_nh per fib6_info. With
      nexthops we can have multiple fib6_nh associated with a fib6_info.
      Add a nexthop helper to invoke a callback for each fib6_nh in a
      'struct nexthop'. If the callback returns non-0, the loop is
      stopped and the return value passed to the caller.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88c9aa1
  20. 05 6月, 2019 1 次提交
    • D
      ipv6: Plumb support for nexthop object in a fib6_info · f88d8ea6
      David Ahern 提交于
      Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
      fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
      referencing a nexthop object can not have 'sibling' entries (the old way
      of doing multipath routes), the nh_list is a union with fib6_siblings.
      
      Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
      using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
      and delete fib entries using the nexthop.
      
      Add a few nexthop helpers for use when a nexthop is added to fib6_info:
      - nexthop_fib6_nh - return first fib6_nh in a nexthop object
      - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
        if the fib6_info references a nexthop object
      - nexthop_path_fib6_result - similar to ipv4, select a path within a
        multipath nexthop object. If the nexthop is a blackhole, set
        fib6_result type to RTN_BLACKHOLE, and set the REJECT flag
      
      Update the fib6_info references to check for nh and take a different path
      as needed:
      - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
        be coalesced with other fib entries into a multipath route
      - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
        a nexthop
      - addrconf (host routes), RA's and info entries (anything configured via
        ndisc) does not use nexthop objects
      - fib6_info_destroy_rcu - put reference to nexthop object
      - fib6_purge_rt - drop fib6_info from f6i_list
      - fib6_select_path - update to use the new nexthop_path_fib6_result when
        fib entry uses a nexthop object
      - rt6_device_match - update to catch use of nexthop object as a blackhole
        and set fib6_type and flags.
      - ip6_route_info_create - don't add space for fib6_nh if fib entry is
        going to reference a nexthop object, take a reference to nexthop object,
        disallow use of source routing
      - rt6_nlmsg_size - add space for RTA_NH_ID
      - add rt6_fill_node_nexthop to add nexthop data on a dump
      
      As with ipv4, most of the changes push existing code into the else branch
      of whether the fib entry uses a nexthop object.
      
      Update the nexthop code to walk f6i_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88d8ea6