1. 23 6月, 2021 1 次提交
    • E
      vxlan: add missing rcu_read_lock() in neigh_reduce() · 85e8b032
      Eric Dumazet 提交于
      syzbot complained in neigh_reduce(), because rcu_read_lock_bh()
      is treated differently than rcu_read_lock()
      
      WARNING: suspicious RCU usage
      5.13.0-rc6-syzkaller #0 Not tainted
      -----------------------------
      include/net/addrconf.h:313 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by kworker/0:0/5:
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2247
       #1: ffffc90000ca7da8 ((work_completion)(&port->wq)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2251
       #2: ffffffff8bf795c0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1da/0x3130 net/core/dev.c:4180
      
      stack backtrace:
      CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events ipvlan_process_multicast
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       __in6_dev_get include/net/addrconf.h:313 [inline]
       __in6_dev_get include/net/addrconf.h:311 [inline]
       neigh_reduce drivers/net/vxlan.c:2167 [inline]
       vxlan_xmit+0x34d5/0x4c30 drivers/net/vxlan.c:2919
       __netdev_start_xmit include/linux/netdevice.h:4944 [inline]
       netdev_start_xmit include/linux/netdevice.h:4958 [inline]
       xmit_one net/core/dev.c:3654 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3670
       __dev_queue_xmit+0x2133/0x3130 net/core/dev.c:4246
       ipvlan_process_multicast+0xa99/0xd70 drivers/net/ipvlan/ipvlan_core.c:287
       process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
       kthread+0x3b1/0x4a0 kernel/kthread.c:313
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      Fixes: f564f45c ("vxlan: add ipv6 proxy support")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85e8b032
  2. 31 3月, 2021 1 次提交
  3. 26 3月, 2021 1 次提交
    • A
      vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply · 30a93d2b
      Antoine Tenart 提交于
      When the interface is part of a bridge or an Open vSwitch port and a
      packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
      using the external mode (collect metadata) the source and destination
      addresses are reversed, so that Open vSwitch can match the packet
      against an existing (reverse) flow.
      
      But inverting the source and destination addresses in the shared
      ip_tunnel_info will make following packets of the flow to use a wrong
      destination address (packets will be tunnelled to itself), if the flow
      isn't updated. Which happens with Open vSwitch, until the flow times
      out.
      
      Fixes this by uncloning the skb's ip_tunnel_info before inverting its
      source and destination addresses, so that the modification will only be
      made for the PTMU packet, not the following ones.
      
      Fixes: fc68c995 ("vxlan: Support for PMTU discovery on directly bridged links")
      Tested-by: NEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: NEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30a93d2b
  4. 14 3月, 2021 1 次提交
  5. 24 2月, 2021 1 次提交
    • T
      vxlan: move debug check after netdev unregister · 92584ddf
      Taehee Yoo 提交于
      The debug check must be done after unregister_netdevice_many() call --
      the hlist_del_rcu() for this is done inside .ndo_stop.
      
      This is the same with commit 0fda7600 ("geneve: move debug check after
      netdev unregister")
      
      Test commands:
          ip netns del A
          ip netns add A
          ip netns add B
      
          ip netns exec B ip link add vxlan0 type vxlan vni 100 local 10.0.0.1 \
      	    remote 10.0.0.2 dstport 4789 srcport 4789 4789
          ip netns exec B ip link set vxlan0 netns A
          ip netns exec A ip link set vxlan0 up
          ip netns del B
      
      Splat looks like:
      [   73.176249][    T7] ------------[ cut here ]------------
      [   73.178662][    T7] WARNING: CPU: 4 PID: 7 at drivers/net/vxlan.c:4743 vxlan_exit_batch_net+0x52e/0x720 [vxlan]
      [   73.182597][    T7] Modules linked in: vxlan openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_core nfp mlxfw ixgbevf tls sch_fq_codel nf_tables nfnetlink ip_tables x_tables unix
      [   73.190113][    T7] CPU: 4 PID: 7 Comm: kworker/u16:0 Not tainted 5.11.0-rc7+ #838
      [   73.193037][    T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   73.196986][    T7] Workqueue: netns cleanup_net
      [   73.198946][    T7] RIP: 0010:vxlan_exit_batch_net+0x52e/0x720 [vxlan]
      [   73.201509][    T7] Code: 00 01 00 00 0f 84 39 fd ff ff 48 89 ca 48 c1 ea 03 80 3c 1a 00 0f 85 a6 00 00 00 89 c2 48 83 c2 02 49 8b 14 d4 48 85 d2 74 ce <0f> 0b eb ca e8 b9 51 db dd 84 c0 0f 85 4a fe ff ff 48 c7 c2 80 bc
      [   73.208813][    T7] RSP: 0018:ffff888100907c10 EFLAGS: 00010286
      [   73.211027][    T7] RAX: 000000000000003c RBX: dffffc0000000000 RCX: ffff88800ec411f0
      [   73.213702][    T7] RDX: ffff88800a278000 RSI: ffff88800fc78c70 RDI: ffff88800fc78070
      [   73.216169][    T7] RBP: ffff88800b5cbdc0 R08: fffffbfff424de61 R09: fffffbfff424de61
      [   73.218463][    T7] R10: ffffffffa126f307 R11: fffffbfff424de60 R12: ffff88800ec41000
      [   73.220794][    T7] R13: ffff888100907d08 R14: ffff888100907c50 R15: ffff88800fc78c40
      [   73.223337][    T7] FS:  0000000000000000(0000) GS:ffff888114800000(0000) knlGS:0000000000000000
      [   73.225814][    T7] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   73.227616][    T7] CR2: 0000562b5cb4f4d0 CR3: 0000000105fbe001 CR4: 00000000003706e0
      [   73.229700][    T7] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   73.231820][    T7] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   73.233844][    T7] Call Trace:
      [   73.234698][    T7]  ? vxlan_err_lookup+0x3c0/0x3c0 [vxlan]
      [   73.235962][    T7]  ? ops_exit_list.isra.11+0x93/0x140
      [   73.237134][    T7]  cleanup_net+0x45e/0x8a0
      [ ... ]
      
      Fixes: 57b61127 ("vxlan: speedup vxlan tunnels dismantle")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Link: https://lore.kernel.org/r/20210221154552.11749-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      92584ddf
  6. 19 1月, 2021 1 次提交
  7. 08 1月, 2021 1 次提交
  8. 11 12月, 2020 1 次提交
  9. 03 12月, 2020 1 次提交
  10. 01 12月, 2020 2 次提交
  11. 10 11月, 2020 1 次提交
  12. 07 11月, 2020 2 次提交
  13. 04 11月, 2020 1 次提交
  14. 06 10月, 2020 1 次提交
  15. 27 9月, 2020 1 次提交
    • J
      Revert "vxlan: move encapsulation warning" · 435be28b
      Jakub Kicinski 提交于
      This reverts commit 546c044c.
      
      Nothing prevents user from sending frames to "external" VxLAN devices.
      In fact kernel itself may generate icmp chatter.
      
      This is fine, such frames should be dropped.
      
      The point of the "missing encapsulation" warning was that
      frames with missing encap should not make it into vxlan_xmit_one().
      And vxlan_xmit() drops them cleanly, so let it just do that.
      
      Without this revert the warning is triggered by the udp_tunnel_nic.sh
      test, but the minimal repro is:
      
      $ ip link add vxlan0 type vxlan \
           	      	     group 239.1.1.1 \
      		     dev lo \
      		     dstport 1234 \
      		     external
      $ ip li set dev vxlan0 up
      
      [  419.165981] vxlan0: Missing encapsulation instructions
      [  419.166551] WARNING: CPU: 0 PID: 1041 at drivers/net/vxlan.c:2889 vxlan_xmit+0x15c0/0x1fc0 [vxlan]
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435be28b
  16. 26 9月, 2020 5 次提交
  17. 06 8月, 2020 1 次提交
    • H
      Revert "vxlan: fix tos value before xmit" · a0dced17
      Hangbin Liu 提交于
      This reverts commit 71130f29.
      
      In commit 71130f29 ("vxlan: fix tos value before xmit") we want to
      make sure the tos value are filtered by RT_TOS() based on RFC1349.
      
             0     1     2     3     4     5     6     7
          +-----+-----+-----+-----+-----+-----+-----+-----+
          |   PRECEDENCE    |          TOS          | MBZ |
          +-----+-----+-----+-----+-----+-----+-----+-----+
      
      But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like
      
             0     1     2     3     4     5     6     7
          +-----+-----+-----+-----+-----+-----+-----+-----+
          |          DS FIELD, DSCP           | ECN FIELD |
          +-----+-----+-----+-----+-----+-----+-----+-----+
      
      So with
      
      IPTOS_TOS_MASK          0x1E
      RT_TOS(tos)		((tos)&IPTOS_TOS_MASK)
      
      the first 3 bits DSCP info will get lost.
      
      To take all the DSCP info in xmit, we should revert the patch and just push
      all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.
      
      Fixes: 71130f29 ("vxlan: fix tos value before xmit")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0dced17
  18. 05 8月, 2020 2 次提交
    • S
      vxlan: Support for PMTU discovery on directly bridged links · fc68c995
      Stefano Brivio 提交于
      If the interface is a bridge or Open vSwitch port, and we can't
      forward a packet because it exceeds the local PMTU estimate,
      trigger an ICMP or ICMPv6 reply to the sender, using the same
      interface to forward it back.
      
      If metadata collection is enabled, reverse destination and source
      addresses, so that Open vSwitch is able to match this packet against
      the existing, reverse flow.
      
      v2: Use netif_is_any_bridge_port() (David Ahern)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc68c995
    • S
      tunnels: PMTU discovery support for directly bridged IP packets · 4cb47a86
      Stefano Brivio 提交于
      It's currently possible to bridge Ethernet tunnels carrying IP
      packets directly to external interfaces without assigning them
      addresses and routes on the bridged network itself: this is the case
      for UDP tunnels bridged with a standard bridge or by Open vSwitch.
      
      PMTU discovery is currently broken with those configurations, because
      the encapsulation effectively decreases the MTU of the link, and
      while we are able to account for this using PMTU discovery on the
      lower layer, we don't have a way to relay ICMP or ICMPv6 messages
      needed by the sender, because we don't have valid routes to it.
      
      On the other hand, as a tunnel endpoint, we can't fragment packets
      as a general approach: this is for instance clearly forbidden for
      VXLAN by RFC 7348, section 4.3:
      
         VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
         fragment encapsulated VXLAN packets due to the larger frame size.
         The destination VTEP MAY silently discard such VXLAN fragments.
      
      The same paragraph recommends that the MTU over the physical network
      accomodates for encapsulations, but this isn't a practical option for
      complex topologies, especially for typical Open vSwitch use cases.
      
      Further, it states that:
      
         Other techniques like Path MTU discovery (see [RFC1191] and
         [RFC1981]) MAY be used to address this requirement as well.
      
      Now, PMTU discovery already works for routed interfaces, we get
      route exceptions created by the encapsulation device as they receive
      ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
      we already rebuild those messages with the appropriate MTU and route
      them back to the sender.
      
      Add the missing bits for bridged cases:
      
      - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
        to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
        RFC 4443 section 2.4 for ICMPv6. This function is already called by
        UDP tunnels
      
      - a new function generating those ICMP or ICMPv6 replies. We can't
        reuse icmp_send() and icmp6_send() as we don't see the sender as a
        valid destination. This doesn't need to be generic, as we don't
        cover any other type of ICMP errors given that we only provide an
        encapsulation function to the sender
      
      While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
      we might receive GSO buffers here, and the passed headroom already
      includes the inner MAC length, so we don't have to account for it
      a second time (that would imply three MAC headers on the wire, but
      there are just two).
      
      This issue became visible while bridging IPv6 packets with 4500 bytes
      of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
      bytes of encapsulation headroom, we would advertise MTU as 3950, and
      we would reject fragmented IPv6 datagrams of 3958 bytes size on the
      wire. We're exclusively dealing with network MTU here, though, so we
      could get Ethernet frames up to 3964 octets in that case.
      
      v2:
      - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
      - split IPv4/IPv6 functions (David Ahern)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cb47a86
  19. 02 8月, 2020 1 次提交
    • T
      vxlan: fix memleak of fdb · fda2ec62
      Taehee Yoo 提交于
      When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
      vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
      all-zeros-mac because it is deleted by vxlan_uninit().
      But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
      and default vni.
      So, the fdb, which contains both all-zeros-mac and non-default vni
      will not be deleted.
      
      Test commands:
          ip link add vxlan0 type vxlan dstport 4789 external
          ip link set vxlan0 up
          bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
      	    src_vni 10000 self permanent
          ip link del vxlan0
      
      kmemleak reports as follows:
      unreferenced object 0xffff9486b25ced88 (size 96):
        comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
        hex dump (first 32 bytes):
          02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff  ........@.......
          46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b  F.............jk
        backtrace:
          [<00000000c10cf651>] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
          [<000000006b31a8d9>] vxlan_fdb_create+0x184/0x1a0 [vxlan]
          [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
          [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
          [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
          [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
          [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
          [<00000000dff433e7>] netlink_unicast+0x18e/0x250
          [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
          [<000000002ed55153>] ____sys_sendmsg+0x237/0x260
          [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
          [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
          [<00000000a8f875d2>] do_syscall_64+0x56/0xe0
          [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff9486b1c40080 (size 128):
        comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff  ..........B.....
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        backtrace:
          [<00000000a2981b60>] vxlan_fdb_create+0x67/0x1a0 [vxlan]
          [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
          [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
          [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
          [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
          [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
          [<00000000dff433e7>] netlink_unicast+0x18e/0x250
          [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
          [<000000002ed55153>] ____sys_sendmsg+0x237/0x260
          [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
          [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
          [<00000000a8f875d2>] do_syscall_64+0x56/0xe0
          [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 3ad7a4b1 ("vxlan: support fdb and learning in COLLECT_METADATA mode")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fda2ec62
  20. 30 7月, 2020 1 次提交
    • I
      vxlan: Ensure FDB dump is performed under RCU · b5141915
      Ido Schimmel 提交于
      The commit cited below removed the RCU read-side critical section from
      rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
      without RCU protection.
      
      This results in the following warning [1] in the VXLAN driver, which
      relied on the callback being invoked from an RCU read-side critical
      section.
      
      Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
      in the bridge driver.
      
      [1]
      WARNING: suspicious RCU usage
      5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
      -----------------------------
      drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by bridge/166:
       #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090
      
      stack backtrace:
      CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      Call Trace:
       dump_stack+0x100/0x184
       lockdep_rcu_suspicious+0x153/0x15d
       vxlan_fdb_dump+0x51e/0x6d0
       rtnl_fdb_dump+0x4dc/0xad0
       netlink_dump+0x540/0x1090
       __netlink_dump_start+0x695/0x950
       rtnetlink_rcv_msg+0x802/0xbd0
       netlink_rcv_skb+0x17a/0x480
       rtnetlink_rcv+0x22/0x30
       netlink_unicast+0x5ae/0x890
       netlink_sendmsg+0x98a/0xf40
       __sys_sendto+0x279/0x3b0
       __x64_sys_sendto+0xe6/0x1a0
       do_syscall_64+0x54/0xa0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fe14fa2ade0
      Code: Bad RIP value.
      RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
      RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
      RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5141915
  21. 11 7月, 2020 1 次提交
    • J
      udp_tunnel: add central NIC RX port offload infrastructure · cc4e3835
      Jakub Kicinski 提交于
      Cater to devices which:
       (a) may want to sleep in the callbacks;
       (b) only have IPv4 support;
       (c) need all the programming to happen while the netdev is up.
      
      Drivers attach UDP tunnel offload info struct to their netdevs,
      where they declare how many UDP ports of various tunnel types
      they support. Core takes care of tracking which ports to offload.
      
      Use a fixed-size array since this matches what almost all drivers
      do, and avoids a complexity and uncertainty around memory allocations
      in an atomic context.
      
      Make sure that tunnel drivers don't try to replay the ports when
      new NIC netdev is registered. Automatic replays would mess up
      reference counting, and will be removed completely once all drivers
      are converted.
      
      v4:
       - use a #define NULL to avoid build issues with CONFIG_INET=n.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc4e3835
  22. 26 6月, 2020 1 次提交
  23. 11 6月, 2020 2 次提交
    • D
      vxlan: Remove access to nexthop group struct · 50cb8769
      David Ahern 提交于
      vxlan driver should be using helpers to access nexthop struct
      internals. Remove open check if whether nexthop is multipath in
      favor of the existing nexthop_is_multipath helper. Add a new
      helper, nexthop_has_v4, to cover the need to check has_v4 in
      a group.
      
      Fixes: 1274e1cc ("vxlan: ecmp support for mac fdb entries")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50cb8769
    • D
      nexthop: Fix fdb labeling for groups · ce9ac056
      David Ahern 提交于
      fdb nexthops are marked with a flag. For standalone nexthops, a flag was
      added to the nh_info struct. For groups that flag was added to struct
      nexthop when it should have been added to the group information. Fix
      by removing the flag from the nexthop struct and adding a flag to nh_group
      that mirrors nh_info and is really only a caching of the individual types.
      Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the
      internal code to use the flag from either nh_info or nh_group.
      
      v2
      - propagate fdb_nh in remove_nh_grp_entry
      
      Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce9ac056
  24. 10 6月, 2020 1 次提交
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
  25. 02 6月, 2020 2 次提交
  26. 31 5月, 2020 2 次提交
  27. 25 5月, 2020 1 次提交
    • I
      vxlan: Do not assume RTNL is held in vxlan_fdb_info() · 06ec313e
      Ido Schimmel 提交于
      vxlan_fdb_info() is not always called with RTNL held or from an RCU
      read-side critical section. For example, in the following call path:
      
      vxlan_cleanup()
        vxlan_fdb_destroy()
          vxlan_fdb_notify()
            __vxlan_fdb_notify()
              vxlan_fdb_info()
      
      The use of rtnl_dereference() can therefore result in the following
      splat [1].
      
      Fix this by dereferencing the nexthop under RCU read-side critical
      section.
      
      [1]
      [May24 22:56] =============================
      [  +0.004676] WARNING: suspicious RCU usage
      [  +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted
      [  +0.007116] -----------------------------
      [  +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage!
      [  +0.008164]
                    other info that might help us debug this:
      
      [  +0.009126]
                    rcu_scheduler_active = 2, debug_locks = 1
      [  +0.007504] 5 locks held by bash/6892:
      [  +0.004392]  #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0
      [  +0.011795]  #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030
      [  +0.010947]  #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590
      [  +0.010585]  #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800
      [  +0.010192]  #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0
      [  +0.010382]
                    stack backtrace:
      [  +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772
      [  +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  +0.010155] Call Trace:
      [  +0.002775]  <IRQ>
      [  +0.002313]  dump_stack+0xfd/0x178
      [  +0.003895]  lockdep_rcu_suspicious+0x14a/0x153
      [  +0.005157]  vxlan_fdb_info+0xe39/0x12a0
      [  +0.004775]  __vxlan_fdb_notify+0xb8/0x160
      [  +0.004672]  vxlan_fdb_notify+0x8e/0xe0
      [  +0.004370]  vxlan_fdb_destroy+0x117/0x330
      [  +0.004662]  vxlan_cleanup+0x1aa/0x4a0
      [  +0.004329]  call_timer_fn+0x1c4/0x800
      [  +0.004357]  run_timer_softirq+0x129d/0x17e0
      [  +0.004762]  __do_softirq+0x24c/0xaef
      [  +0.004232]  irq_exit+0x167/0x190
      [  +0.003767]  smp_apic_timer_interrupt+0x1dd/0x6a0
      [  +0.005340]  apic_timer_interrupt+0xf/0x20
      [  +0.004620]  </IRQ>
      
      Fixes: 1274e1cc ("vxlan: ecmp support for mac fdb entries")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAmit Cohen <amitc@mellanox.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06ec313e
  28. 23 5月, 2020 2 次提交
    • R
      vxlan: support for nexthop notifiers · c7cdbe2e
      Roopa Prabhu 提交于
      vxlan driver registers for nexthop add/del notifiers to
      cleanup fdb entries pointing to such nexthops.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cdbe2e
    • R
      vxlan: ecmp support for mac fdb entries · 1274e1cc
      Roopa Prabhu 提交于
      Todays vxlan mac fdb entries can point to multiple remote
      ips (rdsts) with the sole purpose of replicating
      broadcast-multicast and unknown unicast packets to those remote ips.
      
      E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
      load balanced to remote switches (vteps) belonging to the
      same multi-homed ethernet segment (E-VPN multihoming is analogous
      to multi-homed LAG implementations, but with the inter-switch
      peerlink replaced with a vxlan tunnel). In other words it needs
      support for mac ecmp. Furthermore, for faster convergence, E-VPN
      multihoming needs the ability to update fdb ecmp nexthops independent
      of the fdb entries.
      
      New route nexthop API is perfect for this usecase.
      This patch extends the vxlan fdb code to take a nexthop id
      pointing to an ecmp nexthop group.
      
      Changes include:
      - New NDA_NH_ID attribute for fdbs
      - Use the newly added fdb nexthop groups
      - makes vxlan rdsts and nexthop handling code mutually
        exclusive
      - since this is a new use-case and the requirement is for ecmp
      nexthop groups, the fdb add and update path checks that the
      nexthop is really an ecmp nexthop group. This check can be relaxed
      in the future, if we want to introduce replication fdb nexthop groups
      and allow its use in lieu of current rdst lists.
      - fdb update requests with nexthop id's only allowed for existing
      fdb's that have nexthop id's
      - learning will not override an existing fdb entry with nexthop
      group
      - I have wrapped the switchdev offload code around the presence of
      rdst
      
      [1] E-VPN RFC https://tools.ietf.org/html/rfc7432
      [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
      [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
      
      Includes a null check fix in vxlan_xmit from Nikolay
      
      v2 - Fixed build issue:
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1274e1cc
  29. 24 4月, 2020 1 次提交