1. 02 6月, 2020 2 次提交
  2. 31 5月, 2020 2 次提交
  3. 25 5月, 2020 1 次提交
    • I
      vxlan: Do not assume RTNL is held in vxlan_fdb_info() · 06ec313e
      Ido Schimmel 提交于
      vxlan_fdb_info() is not always called with RTNL held or from an RCU
      read-side critical section. For example, in the following call path:
      
      vxlan_cleanup()
        vxlan_fdb_destroy()
          vxlan_fdb_notify()
            __vxlan_fdb_notify()
              vxlan_fdb_info()
      
      The use of rtnl_dereference() can therefore result in the following
      splat [1].
      
      Fix this by dereferencing the nexthop under RCU read-side critical
      section.
      
      [1]
      [May24 22:56] =============================
      [  +0.004676] WARNING: suspicious RCU usage
      [  +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted
      [  +0.007116] -----------------------------
      [  +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage!
      [  +0.008164]
                    other info that might help us debug this:
      
      [  +0.009126]
                    rcu_scheduler_active = 2, debug_locks = 1
      [  +0.007504] 5 locks held by bash/6892:
      [  +0.004392]  #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0
      [  +0.011795]  #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030
      [  +0.010947]  #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590
      [  +0.010585]  #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800
      [  +0.010192]  #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0
      [  +0.010382]
                    stack backtrace:
      [  +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772
      [  +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  +0.010155] Call Trace:
      [  +0.002775]  <IRQ>
      [  +0.002313]  dump_stack+0xfd/0x178
      [  +0.003895]  lockdep_rcu_suspicious+0x14a/0x153
      [  +0.005157]  vxlan_fdb_info+0xe39/0x12a0
      [  +0.004775]  __vxlan_fdb_notify+0xb8/0x160
      [  +0.004672]  vxlan_fdb_notify+0x8e/0xe0
      [  +0.004370]  vxlan_fdb_destroy+0x117/0x330
      [  +0.004662]  vxlan_cleanup+0x1aa/0x4a0
      [  +0.004329]  call_timer_fn+0x1c4/0x800
      [  +0.004357]  run_timer_softirq+0x129d/0x17e0
      [  +0.004762]  __do_softirq+0x24c/0xaef
      [  +0.004232]  irq_exit+0x167/0x190
      [  +0.003767]  smp_apic_timer_interrupt+0x1dd/0x6a0
      [  +0.005340]  apic_timer_interrupt+0xf/0x20
      [  +0.004620]  </IRQ>
      
      Fixes: 1274e1cc ("vxlan: ecmp support for mac fdb entries")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAmit Cohen <amitc@mellanox.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06ec313e
  4. 23 5月, 2020 2 次提交
    • R
      vxlan: support for nexthop notifiers · c7cdbe2e
      Roopa Prabhu 提交于
      vxlan driver registers for nexthop add/del notifiers to
      cleanup fdb entries pointing to such nexthops.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cdbe2e
    • R
      vxlan: ecmp support for mac fdb entries · 1274e1cc
      Roopa Prabhu 提交于
      Todays vxlan mac fdb entries can point to multiple remote
      ips (rdsts) with the sole purpose of replicating
      broadcast-multicast and unknown unicast packets to those remote ips.
      
      E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
      load balanced to remote switches (vteps) belonging to the
      same multi-homed ethernet segment (E-VPN multihoming is analogous
      to multi-homed LAG implementations, but with the inter-switch
      peerlink replaced with a vxlan tunnel). In other words it needs
      support for mac ecmp. Furthermore, for faster convergence, E-VPN
      multihoming needs the ability to update fdb ecmp nexthops independent
      of the fdb entries.
      
      New route nexthop API is perfect for this usecase.
      This patch extends the vxlan fdb code to take a nexthop id
      pointing to an ecmp nexthop group.
      
      Changes include:
      - New NDA_NH_ID attribute for fdbs
      - Use the newly added fdb nexthop groups
      - makes vxlan rdsts and nexthop handling code mutually
        exclusive
      - since this is a new use-case and the requirement is for ecmp
      nexthop groups, the fdb add and update path checks that the
      nexthop is really an ecmp nexthop group. This check can be relaxed
      in the future, if we want to introduce replication fdb nexthop groups
      and allow its use in lieu of current rdst lists.
      - fdb update requests with nexthop id's only allowed for existing
      fdb's that have nexthop id's
      - learning will not override an existing fdb entry with nexthop
      group
      - I have wrapped the switchdev offload code around the presence of
      rdst
      
      [1] E-VPN RFC https://tools.ietf.org/html/rfc7432
      [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
      [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
      
      Includes a null check fix in vxlan_xmit from Nikolay
      
      v2 - Fixed build issue:
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1274e1cc
  5. 24 4月, 2020 1 次提交
  6. 19 3月, 2020 1 次提交
  7. 03 1月, 2020 2 次提交
  8. 10 12月, 2019 1 次提交
  9. 05 12月, 2019 1 次提交
    • S
      net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · 6c8991f4
      Sabrina Dubroca 提交于
      ipv6_stub uses the ip6_dst_lookup function to allow other modules to
      perform IPv6 lookups. However, this function skips the XFRM layer
      entirely.
      
      All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
      ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
      which calls xfrm_lookup_route(). This patch fixes this inconsistent
      behavior by switching the stub to ip6_dst_lookup_flow, which also calls
      xfrm_lookup_route().
      
      This requires some changes in all the callers, as these two functions
      take different arguments and have different return types.
      
      Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c8991f4
  10. 13 11月, 2019 1 次提交
  11. 31 10月, 2019 2 次提交
  12. 30 10月, 2019 1 次提交
  13. 25 10月, 2019 1 次提交
    • T
      vxlan: add adjacent link to limit depth level · 0ce1822c
      Taehee Yoo 提交于
      Current vxlan code doesn't limit the number of nested devices.
      Nested devices would be handled recursively and this routine needs
      huge stack memory. So, unlimited nested devices could make
      stack overflow.
      
      In order to fix this issue, this patch adds adjacent links.
      The adjacent link APIs internally check the depth level.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \
      	    dstport 4789
          for i in {1..100}
          do
      	    let A=$i-1
      	    ip link add vxlan$i type vxlan id $i group 239.1.1.1 \
      		    dev vxlan$A dstport 4789
          done
          ip link del dummy0
      
      The top upper link is vxlan100 and the lowest link is vxlan0.
      When vxlan0 is deleting, the upper devices will be deleted recursively.
      It needs huge stack memory so it makes stack overflow.
      
      Splat looks like:
      [  229.628477] =============================================================================
      [  229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2
      [  229.629785] -----------------------------------------------------------------------------
      [  229.629785]
      [  229.655439] ==================================================================
      [  229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200
      [  229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0
      [  229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334
      [  229.655688]
      [  229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff  ........h.......
      [  229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff  8.......h.......
      [  229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f  30b.......`"....
      [  229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff  .`..............
      [  229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff  h........!......
      [  229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00  ................
      [  229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df  !..!............
      [  229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff  .`..............
      [  229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00                          ...A....
      [  229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a
      [  ... ]
      
      Fixes: acaf4e70 ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ce1822c
  14. 02 7月, 2019 1 次提交
    • T
      vxlan: do not destroy fdb if register_netdevice() is failed · 7c31e54a
      Taehee Yoo 提交于
      __vxlan_dev_create() destroys FDB using specific pointer which indicates
      a fdb when error occurs.
      But that pointer should not be used when register_netdevice() fails because
      register_netdevice() internally destroys fdb when error occurs.
      
      This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev
      internally.
      Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan
      dev.
      
      vxlan_fdb_insert() is called after calling register_netdevice().
      This routine can avoid situation that ->ndo_uninit() destroys fdb entry
      in error path of register_netdevice().
      Hence, error path of __vxlan_dev_create() routine can have an opportunity
      to destroy default fdb entry by hand.
      
      Test command
          ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \
      	    dev enp0s9 dstport 4789
      
      Splat looks like:
      [  213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256
      [  213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan]
      [  213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d
      [  213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202
      [  213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000
      [  213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0
      [  213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000
      [  213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200
      [  213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0
      [  213.402178] FS:  00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
      [  213.402178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0
      [  213.402178] Call Trace:
      [  213.402178]  __vxlan_dev_create+0x3a9/0x7d0 [vxlan]
      [  213.402178]  ? vxlan_changelink+0x740/0x740 [vxlan]
      [  213.402178]  ? rcu_read_unlock+0x60/0x60 [vxlan]
      [  213.402178]  ? __kasan_kmalloc.constprop.3+0xa0/0xd0
      [  213.402178]  vxlan_newlink+0x8d/0xc0 [vxlan]
      [  213.402178]  ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan]
      [  213.554119]  ? __netlink_ns_capable+0xc3/0xf0
      [  213.554119]  __rtnl_newlink+0xb75/0x1180
      [  213.554119]  ? rtnl_link_unregister+0x230/0x230
      [ ... ]
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c31e54a
  15. 19 6月, 2019 1 次提交
  16. 12 6月, 2019 1 次提交
  17. 07 6月, 2019 1 次提交
  18. 06 6月, 2019 1 次提交
  19. 30 3月, 2019 1 次提交
  20. 19 3月, 2019 1 次提交
  21. 11 3月, 2019 1 次提交
    • E
      vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() · 59cbf56f
      Eric Dumazet 提交于
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx() or gro_cells_receive() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro_cells infrastructure, as
      gro_cells_destroy() will be called only after a full rcu
      grace period is observed after IFF_UP has been cleared.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59cbf56f
  22. 09 3月, 2019 2 次提交
  23. 27 2月, 2019 1 次提交
  24. 25 2月, 2019 1 次提交
  25. 12 2月, 2019 1 次提交
    • E
      vxlan: test dev->flags & IFF_UP before calling netif_rx() · 4179cb5a
      Eric Dumazet 提交于
      netif_rx() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Note this patch also fixes a small issue that came
      with commit ce6502a8 ("vxlan: fix a use after free
      in vxlan_encap_bypass"), since the dev->stats.rx_dropped
      change was done on the wrong device.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Fixes: ce6502a8 ("vxlan: fix a use after free in vxlan_encap_bypass")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petr Machata <petrm@mellanox.com>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4179cb5a
  26. 08 2月, 2019 1 次提交
    • P
      net: vxlan: Free a leaked vetoed multicast rdst · fc4aa1ca
      Petr Machata 提交于
      When an rdst is rejected by a driver, the current code removes it from
      the remote list, but neglects to free it. This is triggered by
      tools/testing/selftests/drivers/net/mlxsw/vxlan_fdb_veto.sh and shows as
      the following kmemleak trace:
      
      unreferenced object 0xffff88817fa3d888 (size 96):
        comm "softirq", pid 0, jiffies 4372702718 (age 165.252s)
        hex dump (first 32 bytes):
          02 00 00 00 c6 33 64 03 80 f5 a2 61 81 88 ff ff  .....3d....a....
          06 df 71 ae ff ff ff ff 0c 00 00 00 04 d2 6a 6b  ..q...........jk
        backtrace:
          [<00000000296b27ac>] kmem_cache_alloc_trace+0x1ae/0x370
          [<0000000075c86dc6>] vxlan_fdb_append.part.12+0x62/0x3b0 [vxlan]
          [<00000000e0414b63>] vxlan_fdb_update+0xc61/0x1020 [vxlan]
          [<00000000f330c4bd>] vxlan_fdb_add+0x2e8/0x3d0 [vxlan]
          [<0000000008f81c2c>] rtnl_fdb_add+0x4c2/0xa10
          [<00000000bdc4b270>] rtnetlink_rcv_msg+0x6dd/0x970
          [<000000006701f2ce>] netlink_rcv_skb+0x290/0x410
          [<00000000c08a5487>] rtnetlink_rcv+0x15/0x20
          [<00000000d5f54b1e>] netlink_unicast+0x43f/0x5e0
          [<00000000db4336bb>] netlink_sendmsg+0x789/0xcd0
          [<00000000e1ee26b6>] sock_sendmsg+0xba/0x100
          [<00000000ba409802>] ___sys_sendmsg+0x631/0x960
          [<000000003c332113>] __sys_sendmsg+0xea/0x180
          [<00000000f4139144>] __x64_sys_sendmsg+0x78/0xb0
          [<000000006d1ddc59>] do_syscall_64+0x94/0x410
          [<00000000c8defa9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Move vxlan_dst_free() up and schedule a call thereof to plug this leak.
      
      Fixes: 61f46fe8 ("vxlan: Allow vetoing of FDB notifications")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4aa1ca
  27. 18 1月, 2019 8 次提交