1. 16 3月, 2020 1 次提交
    • S
      net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · d93d588d
      Sabrina Dubroca 提交于
      mainline inclusion
      from mainline-v5.5-rc1
      commit 6c8991f41546c3c472503dff1ea9daaddf9331c2
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2020-1749
      
      -------------------------------------------------
      
      ipv6_stub uses the ip6_dst_lookup function to allow other modules to
      perform IPv6 lookups. However, this function skips the XFRM layer
      entirely.
      
      All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
      ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
      which calls xfrm_lookup_route(). This patch fixes this inconsistent
      behavior by switching the stub to ip6_dst_lookup_flow, which also calls
      xfrm_lookup_route().
      
      This requires some changes in all the callers, as these two functions
      take different arguments and have different return types.
      
      Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Conflicts:
        include/net/addrconf.h
        drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
        net/core/lwt_bpf.c
        net/tipc/udp_media.c
        net/ipv6/addrconf_core.c
        net/ipv6/af_inet6.c
        drivers/infiniband/core/addr.c
      [yyl: adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NWenan Mao <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d93d588d
  2. 05 3月, 2020 1 次提交
  3. 27 12月, 2019 7 次提交
    • X
      vxlan: check tun_info options_len properly · aa61e5cd
      Xin Long 提交于
      [ Upstream commit eadf52cf1852196a1363044dcda22fa5d7f296f7 ]
      
      This patch is to improve the tun_info options_len by dropping
      the skb when TUNNEL_VXLAN_OPT is set but options_len is less
      than vxlan_metadata. This can void a potential out-of-bounds
      access on ip_tun_info.
      
      Fixes: ee122c79 ("vxlan: Flow based tunneling")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      aa61e5cd
    • Z
      vxlan: Don't call gro_cells_destroy() before device is unregistered · 365e20e8
      Zhiqiang Liu 提交于
      [ Upstream commit cc4807bb609230d8959fd732b0bf3bd4c2de8eac ]
      
      Commit ad6c9986bcb62 ("vxlan: Fix GRO cells race condition between
      receive and link delete") fixed a race condition for the typical case a vxlan
      device is dismantled from the current netns. But if a netns is dismantled,
      vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue()
      of all the vxlan tunnels that are related to this netns.
      
      In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before
      unregister_netdevice_queue(). This means that the gro_cells_destroy() call is
      done too soon, for the same reasons explained in above commit.
      
      So we need to fully respect the RCU rules, and thus must remove the
      gro_cells_destroy() call or risk use after-free.
      
      Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")
      Signed-off-by: NSuanming.Mou <mousuanming@huawei.com>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      365e20e8
    • E
      vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() · 61b6a4d7
      Eric Dumazet 提交于
      [ Upstream commit 59cbf56fcd98ba2a715b6e97c4e43f773f956393 ]
      
      Same reasons than the ones explained in commit 4179cb5a4c92
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx() or gro_cells_receive() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro_cells infrastructure, as
      gro_cells_destroy() will be called only after a full rcu
      grace period is observed after IFF_UP has been cleared.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      61b6a4d7
    • S
      vxlan: Fix GRO cells race condition between receive and link delete · 5afd3840
      Stefano Brivio 提交于
      [ Upstream commit ad6c9986bcb627c7c22b8f9e9a934becc27df87c ]
      
      If we receive a packet while deleting a VXLAN device, there's a chance
      vxlan_rcv() is called at the same time as vxlan_dellink(). This is fine,
      except that vxlan_dellink() should never ever touch stuff that's still in
      use, such as the GRO cells list.
      
      Otherwise, vxlan_rcv() crashes while queueing packets via
      gro_cells_receive().
      
      Move the gro_cells_destroy() to vxlan_uninit(), which runs after the RCU
      grace period is elapsed and nothing needs the gro_cells anymore.
      
      This is now done in the same way as commit 8e816df8 ("geneve: Use GRO
      cells infrastructure.") originally implemented for GENEVE.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5afd3840
    • E
      vxlan: test dev->flags & IFF_UP before calling netif_rx() · 945e6b7e
      Eric Dumazet 提交于
      mainline inclusion
      from mainline-v5.0-rc7
      commit 4179cb5a4c92
      category: bugfix
      bugzilla: 9534
      CVE: NA
      
      -------------------------------------------------
      
      netif_rx() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Note this patch also fixes a small issue that came
      with commit ce6502a8 ("vxlan: fix a use after free
      in vxlan_encap_bypass"), since the dev->stats.rx_dropped
      change was done on the wrong device.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Fixes: ce6502a8 ("vxlan: fix a use after free in vxlan_encap_bypass")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petr Machata <petrm@mellanox.com>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      945e6b7e
    • P
      vxlan: changelink: Fix handling of default remotes · af568256
      Petr Machata 提交于
      mainline inclusion
      from mainline-4.20
      commit ce5e098f7a10b4bf8e948c12fa350320c5c3afad
      category: bugfix
      bugzilla: 6009
      CVE: NA
      
      -------------------------------------------------
      
      Default remotes are stored as FDB entries with an Ethernet address of
      00:00:00:00:00:00. When a request is made to change a remote address of
      a VXLAN device, vxlan_changelink() first deletes the existing default
      remote, and then creates a new FDB entry.
      
      This works well as long as the list of default remotes matches exactly
      the configuration of a VXLAN remote address. Thus when the VXLAN device
      has a remote of X, there should be exactly one default remote FDB entry
      X. If the VXLAN device has no remote address, there should be no such
      entry.
      
      Besides using "ip link set", it is possible to manipulate the list of
      default remotes by using the "bridge fdb". It is therefore easy to break
      the above condition. Under such circumstances, the __vxlan_fdb_delete()
      call doesn't delete the FDB entry itself, but just one remote. The
      following vxlan_fdb_create() then creates a new FDB entry, leading to a
      situation where two entries exist for the address 00:00:00:00:00:00,
      each with a different subset of default remotes.
      
      An even more obvious breakage rooted in the same cause can be observed
      when a remote address is configured for a VXLAN device that did not have
      one before. In that case vxlan_changelink() doesn't remove any remote,
      and just creates a new FDB entry for the new address:
      
      $ ip link add name vx up type vxlan id 2000 dstport 4789
      $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
      $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
      $ ip link set dev vx type vxlan remote 192.0.2.30
      $ bridge fdb sh dev vx | grep 00:00:00:00:00:00
      00:00:00:00:00:00 dst 192.0.2.30 self permanent <- new entry, 1 rdst
      00:00:00:00:00:00 dst 192.0.2.20 self permanent <- orig. entry, 2 rdsts
      00:00:00:00:00:00 dst 192.0.2.30 self permanent
      
      To fix this, instead of calling vxlan_fdb_create() directly, defer to
      vxlan_fdb_update(). That has logic to handle the duplicates properly.
      Additionally, it also handles notifications, so drop that call from
      changelink as well.
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      af568256
    • P
      vxlan: Fix error path in __vxlan_dev_create() · e44201e4
      Petr Machata 提交于
      mainline inclusion
      from mainline-4.20
      commit 6db924687139
      category: bugfix
      bugzilla: 6186
      CVE: NA
      
      -------------------------------------------------
      
      When a failure occurs in rtnl_configure_link(), the current code
      calls unregister_netdevice() to roll back the earlier call to
      register_netdevice(), and jumps to errout, which calls
      vxlan_fdb_destroy().
      
      However unregister_netdevice() calls transitively ndo_uninit, which is
      vxlan_uninit(), and that already takes care of deleting the default FDB
      entry by calling vxlan_fdb_delete_default(). Since the entry added
      earlier in __vxlan_dev_create() is exactly the default entry, the
      cleanup code in the errout block always leads to double free and thus a
      panic.
      
      Besides, since vxlan_fdb_delete_default() always destroys the FDB entry
      with notification enabled, the deletion of the default entry is notified
      even before the addition was notified.
      
      Instead, move the unregister_netdevice() call after the manual destroy,
      which solves both problems.
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e44201e4
  4. 18 10月, 2018 2 次提交
  5. 27 9月, 2018 1 次提交
  6. 23 7月, 2018 3 次提交
  7. 07 7月, 2018 3 次提交
  8. 02 7月, 2018 1 次提交
    • S
      net: fix use-after-free in GRO with ESP · 603d4cf8
      Sabrina Dubroca 提交于
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      603d4cf8
  9. 29 6月, 2018 1 次提交
  10. 26 6月, 2018 1 次提交
    • D
      net: Convert GRO SKB handling to list_head. · d4546c25
      David Miller 提交于
      Manage pending per-NAPI GRO packets via list_head.
      
      Return an SKB pointer from the GRO receive handlers.  When GRO receive
      handlers return non-NULL, it means that this SKB needs to be completed
      at this time and removed from the NAPI queue.
      
      Several operations are greatly simplified by this transformation,
      especially timing out the oldest SKB in the list when gro_count
      exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
      in reverse order.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4546c25
  11. 18 4月, 2018 1 次提交
  12. 28 3月, 2018 1 次提交
  13. 28 2月, 2018 1 次提交
  14. 26 1月, 2018 1 次提交
  15. 20 12月, 2017 1 次提交
    • X
      vxlan: update skb dst pmtu on tx path · a93bf0ff
      Xin Long 提交于
      Unlike ip tunnels, now vxlan doesn't do any pmtu update for
      upper dst pmtu, even if it doesn't match the lower dst pmtu
      any more.
      
      The problem can be reproduced when reducing the vxlan lower
      dev's pmtu when running netperf. In jianlin's testing, the
      performance went to 1/7 of the previous.
      
      This patch is to update the upper dst pmtu to match the lower
      dst pmtu on tx path so that packets can be sent out even when
      lower dev's pmtu has been changed.
      
      It also works for metadata dst.
      
      Note that this patch doesn't process any pmtu icmp packet.
      But even in the future, the support for pmtu icmp packets
      process of udp tunnels will also needs this.
      
      The same thing will be done for geneve in another patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a93bf0ff
  16. 19 12月, 2017 1 次提交
  17. 17 12月, 2017 1 次提交
    • A
      vxlan: restore dev->mtu setting based on lower device · f870c1ff
      Alexey Kodanev 提交于
      Stefano Brivio says:
          Commit a985343b ("vxlan: refactor verification and
          application of configuration") introduced a change in the
          behaviour of initial MTU setting: earlier, the MTU for a link
          created on top of a given lower device, without an initial MTU
          specification, was set to the MTU of the lower device minus
          headroom as a result of this path in vxlan_dev_configure():
      
      	if (!conf->mtu)
      		dev->mtu = lowerdev->mtu -
      			   (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
      
          which is now gone. Now, the initial MTU, in absence of a
          configured value, is simply set by ether_setup() to ETH_DATA_LEN
          (1500 bytes).
      
          This breaks userspace expectations in case the MTU of
          the lower device is higher than 1500 bytes minus headroom.
      
      This patch restores the previous behaviour on newlink operation. Since
      max_mtu can be negative and we update dev->mtu directly, also check it
      for valid minimum.
      Reported-by: NJunhan Yan <juyan@redhat.com>
      Fixes: a985343b ("vxlan: refactor verification and application of configuration")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f870c1ff
  18. 29 11月, 2017 1 次提交
  19. 14 11月, 2017 2 次提交
  20. 05 10月, 2017 1 次提交
    • K
      timer: Remove init_timer_deferrable() in favor of timer_setup() · df7e828c
      Kees Cook 提交于
      This refactors the only users of init_timer_deferrable() to use
      the new timer_setup() and from_timer(). Removes definition of
      init_timer_deferrable().
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: David S. Miller <davem@davemloft.net> # for networking parts
      Acked-by: Sebastian Reichel <sre@kernel.org> # for drivers/hsi parts
      Cc: linux-mips@linux-mips.org
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: linux-s390@vger.kernel.org
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-pm@vger.kernel.org
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linux-watchdog@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-wireless@vger.kernel.org
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Michael Reed <mdr@sgi.com>
      Cc: netdev@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lkml.kernel.org/r/1507159627-127660-6-git-send-email-keescook@chromium.org
      df7e828c
  21. 30 8月, 2017 1 次提交
    • J
      vxlan: factor out VXLAN-GPE next protocol · fa20e0e3
      Jiri Benc 提交于
      The values are shared between VXLAN-GPE and NSH. Originally probably by
      coincidence but I notified both working groups about this last year and they
      seem to keep the values in sync since then.
      
      Hopefully they'll get a single IANA registry for the values, too. (I asked
      them for that.)
      
      Factor out the code to be shared by the NSH implementation.
      
      NSH and MPLS values are added in this patch, too. For MPLS, the drafts
      incorrectly assign only a single value, while we have two MPLS ethertypes.
      I raised the problem with both groups. For now, I assume the value is for
      unicast.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa20e0e3
  22. 14 8月, 2017 1 次提交
  23. 02 8月, 2017 1 次提交
    • K
      vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP · be73b304
      K. Den 提交于
      In the case that GRO is turned on and the original received packet is
      CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
      csum-unnecessary point, which for instance could occur if the packet
      comes from another Linux guest on the same Linux host, we have to do
      either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
      csum_start properly reset considering RCO.
      
      However, since b7fe10e5("gro: Fix remcsum offload to deal with frags
      in GRO") that barrier in such case could be skipped if GRO turned on,
      hence we pass over it and the inner L4 validation mistakenly reckons
      it as a bad csum.
      
      This patch makes remcsum_offload being reset at the same time of GRO
      remcsum cleanup, so as to make it work in such case as before.
      
      Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO")
      Signed-off-by: NKoichiro Den <den@klaipeden.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be73b304
  24. 25 7月, 2017 2 次提交
  25. 05 7月, 2017 1 次提交
  26. 03 7月, 2017 2 次提交