1. 23 2月, 2018 1 次提交
    • A
      macvlan: fix use-after-free in macvlan_common_newlink() · 4e14bf42
      Alexey Kodanev 提交于
      The following use-after-free was reported by KASan when running
      LTP macvtap01 test on 4.16-rc2:
      
      [10642.528443] BUG: KASAN: use-after-free in
                     macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450
      ...
      [10642.963873] Call Trace:
      [10642.994352]  dump_stack+0x5c/0x7c
      [10643.035325]  print_address_description+0x75/0x290
      [10643.092938]  kasan_report+0x28d/0x390
      [10643.137971]  ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.207963]  macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.275978]  macvtap_newlink+0x171/0x260 [macvtap]
      [10643.334532]  rtnl_newlink+0xd4f/0x1300
      ...
      [10646.256176] Allocated by task 18450:
      [10646.299964]  kasan_kmalloc+0xa6/0xd0
      [10646.343746]  kmem_cache_alloc_trace+0xf1/0x210
      [10646.397826]  macvlan_common_newlink+0x6de/0x14a0 [macvlan]
      [10646.464386]  macvtap_newlink+0x171/0x260 [macvtap]
      [10646.522728]  rtnl_newlink+0xd4f/0x1300
      ...
      [10647.022028] Freed by task 18450:
      [10647.061549]  __kasan_slab_free+0x138/0x180
      [10647.111468]  kfree+0x9e/0x1c0
      [10647.147869]  macvlan_port_destroy+0x3db/0x650 [macvlan]
      [10647.211411]  rollback_registered_many+0x5b9/0xb10
      [10647.268715]  rollback_registered+0xd9/0x190
      [10647.319675]  register_netdevice+0x8eb/0xc70
      [10647.370635]  macvlan_common_newlink+0xe58/0x14a0 [macvlan]
      [10647.437195]  macvtap_newlink+0x171/0x260 [macvtap]
      
      Commit d02fd6e7 ("macvlan: Fix one possible double free") handles
      the case when register_netdevice() invokes ndo_uninit() on error and
      as a result free the port. But 'macvlan_port_get_rtnl(dev))' check
      (returns dev->rx_handler_data), which was added by this commit in order
      to prevent double free, is not quite correct:
      
      * for macvlan it always returns NULL because 'lowerdev' is the one that
        was used to register rx handler (port) in macvlan_port_create() as
        well as to unregister it in macvlan_port_destroy().
      * for macvtap it always returns a valid pointer because macvtap registers
        its own rx handler before macvlan_common_newlink().
      
      Fixes: d02fd6e7 ("macvlan: Fix one possible double free")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e14bf42
  2. 03 1月, 2018 1 次提交
  3. 19 10月, 2017 1 次提交
    • A
      macvlan/macvtap: Add support for L2 forwarding offloads with macvtap · 56fd2b2c
      Alexander Duyck 提交于
      This patch reverts earlier commit b13ba1b8 ("macvlan: forbid L2
      fowarding offload for macvtap"). The reason for reverting this is because
      the original patch no longer fixes what it previously did as the
      underlying structure has changed for macvtap. Specifically macvtap
      originally pulled packets directly off of the lowerdev. However in commit
      6acf54f1 ("macvtap: Add support of packet capture on macvtap device.")
      that code was changed and instead macvtap would listen directly on the
      macvtap device itself instead of the lower device. As such, the L2
      forwarding offload should now be able to provide a performance advantage of
      skipping the checks on the lower dev while not introducing any sort of
      regression.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56fd2b2c
  4. 15 10月, 2017 2 次提交
  5. 05 10月, 2017 1 次提交
  6. 21 9月, 2017 1 次提交
  7. 19 8月, 2017 1 次提交
  8. 18 7月, 2017 1 次提交
  9. 27 6月, 2017 3 次提交
  10. 22 6月, 2017 4 次提交
  11. 15 6月, 2017 1 次提交
  12. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  13. 16 5月, 2017 1 次提交
    • V
      macvlan: Fix performance issues with vlan tagged packets · 70957eae
      Vlad Yasevich 提交于
      Macvlan always turns on offload features that have sofware
      fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
      communications over macvtap.
      
      However, macvtap does not turn on these features for vlan tagged traffic.
      As a result, depending on the HW that mactap is configured on, the
      performance of guest-guest communication over a vlan is very
      inconsistent.  If the HW supports TSO/UFO over vlans, then the
      performance will be fine.  If not, the the performance will suffer
      greatly since the VM may continue using TSO/UFO, and will force the host
      segment the traffic and possibly overlow the macvtap queue.
      
      This patch adds the always on offloads to vlan_features.  This
      makes sure that any vlan tagged traffic between 2 guest will not
      be segmented needlessly.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70957eae
  14. 25 4月, 2017 1 次提交
  15. 12 2月, 2017 1 次提交
  16. 21 1月, 2017 1 次提交
  17. 09 1月, 2017 1 次提交
  18. 08 12月, 2016 1 次提交
  19. 24 11月, 2016 1 次提交
  20. 22 11月, 2016 1 次提交
  21. 15 11月, 2016 1 次提交
  22. 10 11月, 2016 1 次提交
  23. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
  24. 14 8月, 2016 1 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
  25. 10 6月, 2016 1 次提交
  26. 02 6月, 2016 2 次提交
    • H
      macvlan: Avoid unnecessary multicast cloning · 9c127a01
      Herbert Xu 提交于
      Currently we always queue a multicast packet for further processing,
      even if none of the macvlan devices are subscribed to the address.
      
      This patch optimises this by adding a global multicast filter for
      a macvlan_port.
      
      Note that this patch doesn't handle the broadcast addresses of the
      individual macvlan devices correctly, if they are not all identical
      to vlan->lowerdev.  However, this is already broken because there
      is no mechanism in place to update the individual multicast filters
      when you change the broadcast address.
      
      If someone cares enough they should fix this by collecting all
      broadcast addresses for a macvlan as we do for multicast and unicast.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c127a01
    • H
      macvlan: Fix potential use-after free for broadcasts · 260916df
      Herbert Xu 提交于
      When we postpone a broadcast packet we save the source port in
      the skb if it is local.  However, the source port can disappear
      before we get a chance to process the packet.
      
      This patch fixes this by holding a ref count on the netdev.
      
      It also delays the skb->cb modification until after we allocate
      the new skb as you should not modify shared skbs.
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      260916df
  27. 27 4月, 2016 1 次提交
  28. 18 3月, 2016 1 次提交
  29. 26 2月, 2016 1 次提交
  30. 18 2月, 2016 1 次提交
  31. 30 1月, 2016 1 次提交
    • N
      macvlan: make operstate and carrier more accurate · de7d244d
      Nikolay Aleksandrov 提交于
      Currently when a macvlan is being initialized and the lower device is
      netif_carrier_ok(), the macvlan device doesn't run through
      rfc2863_policy() and is left with UNKNOWN operstate. Fix it by adding an
      unconditional linkwatch event for the new macvlan device. Similar fix is
      already used by the 8021q device (see register_vlan_dev()). Also fix the
      inconsistent state when the lower device has been down and its carrier
      was changed (when a device is down NETDEV_CHANGE doesn't get generated).
      The second issue can be seen f.e. when we have a macvlan on top of a 8021q
      device which has been down and its real device has been changing carrier
      states, after setting the 8021q device up, the macvlan device will have
      the same carrier state as it was before even though the 8021q can now
      have a different state.
      Example for case 1:
      4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
      state UP mode DEFAULT group default qlen 1000
      
      $ ip l add l eth2 macvl0 type macvlan
      $ ip l set macvl0 up
      $ ip l sh macvl0
      72: macvl0@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UNKNOWN mode DEFAULT group default
          link/ether f6:0b:54:0a:9d:a3 brd ff:ff:ff:ff:ff:ff
      
      Example for case 2 (order is important):
      Prestate: eth2 UP/CARRIER, vlan1 down, vlan1-macvlan down
      $ ip l set vlan1-macvlan up
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      
      [ eth2 loses CARRIER before vlan1 has been UP-ed ]
      
      $ ip l sh eth2
      4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
      state DOWN mode DEFAULT group default qlen 1000
          link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      $ ip l set vlan1 up
      $ ip l sh vlan1
      70: vlan1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
      noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
          link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      
      vlan1-macvlan is still UP, still has carrier and is still in the same
      operstate as before. After the patch in case 1 macvl0 has state UP as it
      should and in case 2 vlan1-macvlan has state LOWERLAYERDOWN again as it
      should. Note that while the lower macvlan device is down their carrier
      and thus operstate can go out of sync but that will be fixed once the
      lower device goes up again.
      This behaviour seems to have been present since beginning of git history.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de7d244d
  32. 16 12月, 2015 2 次提交