1. 27 6月, 2017 1 次提交
  2. 15 6月, 2017 1 次提交
  3. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  4. 16 5月, 2017 1 次提交
    • V
      macvlan: Fix performance issues with vlan tagged packets · 70957eae
      Vlad Yasevich 提交于
      Macvlan always turns on offload features that have sofware
      fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
      communications over macvtap.
      
      However, macvtap does not turn on these features for vlan tagged traffic.
      As a result, depending on the HW that mactap is configured on, the
      performance of guest-guest communication over a vlan is very
      inconsistent.  If the HW supports TSO/UFO over vlans, then the
      performance will be fine.  If not, the the performance will suffer
      greatly since the VM may continue using TSO/UFO, and will force the host
      segment the traffic and possibly overlow the macvtap queue.
      
      This patch adds the always on offloads to vlan_features.  This
      makes sure that any vlan tagged traffic between 2 guest will not
      be segmented needlessly.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70957eae
  5. 25 4月, 2017 1 次提交
  6. 12 2月, 2017 1 次提交
  7. 21 1月, 2017 1 次提交
  8. 09 1月, 2017 1 次提交
  9. 08 12月, 2016 1 次提交
  10. 24 11月, 2016 1 次提交
  11. 22 11月, 2016 1 次提交
  12. 15 11月, 2016 1 次提交
  13. 10 11月, 2016 1 次提交
  14. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
  15. 14 8月, 2016 1 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
  16. 10 6月, 2016 1 次提交
  17. 02 6月, 2016 2 次提交
    • H
      macvlan: Avoid unnecessary multicast cloning · 9c127a01
      Herbert Xu 提交于
      Currently we always queue a multicast packet for further processing,
      even if none of the macvlan devices are subscribed to the address.
      
      This patch optimises this by adding a global multicast filter for
      a macvlan_port.
      
      Note that this patch doesn't handle the broadcast addresses of the
      individual macvlan devices correctly, if they are not all identical
      to vlan->lowerdev.  However, this is already broken because there
      is no mechanism in place to update the individual multicast filters
      when you change the broadcast address.
      
      If someone cares enough they should fix this by collecting all
      broadcast addresses for a macvlan as we do for multicast and unicast.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c127a01
    • H
      macvlan: Fix potential use-after free for broadcasts · 260916df
      Herbert Xu 提交于
      When we postpone a broadcast packet we save the source port in
      the skb if it is local.  However, the source port can disappear
      before we get a chance to process the packet.
      
      This patch fixes this by holding a ref count on the netdev.
      
      It also delays the skb->cb modification until after we allocate
      the new skb as you should not modify shared skbs.
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      260916df
  18. 27 4月, 2016 1 次提交
  19. 18 3月, 2016 1 次提交
  20. 26 2月, 2016 1 次提交
  21. 18 2月, 2016 1 次提交
  22. 30 1月, 2016 1 次提交
    • N
      macvlan: make operstate and carrier more accurate · de7d244d
      Nikolay Aleksandrov 提交于
      Currently when a macvlan is being initialized and the lower device is
      netif_carrier_ok(), the macvlan device doesn't run through
      rfc2863_policy() and is left with UNKNOWN operstate. Fix it by adding an
      unconditional linkwatch event for the new macvlan device. Similar fix is
      already used by the 8021q device (see register_vlan_dev()). Also fix the
      inconsistent state when the lower device has been down and its carrier
      was changed (when a device is down NETDEV_CHANGE doesn't get generated).
      The second issue can be seen f.e. when we have a macvlan on top of a 8021q
      device which has been down and its real device has been changing carrier
      states, after setting the 8021q device up, the macvlan device will have
      the same carrier state as it was before even though the 8021q can now
      have a different state.
      Example for case 1:
      4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
      state UP mode DEFAULT group default qlen 1000
      
      $ ip l add l eth2 macvl0 type macvlan
      $ ip l set macvl0 up
      $ ip l sh macvl0
      72: macvl0@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UNKNOWN mode DEFAULT group default
          link/ether f6:0b:54:0a:9d:a3 brd ff:ff:ff:ff:ff:ff
      
      Example for case 2 (order is important):
      Prestate: eth2 UP/CARRIER, vlan1 down, vlan1-macvlan down
      $ ip l set vlan1-macvlan up
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      
      [ eth2 loses CARRIER before vlan1 has been UP-ed ]
      
      $ ip l sh eth2
      4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
      state DOWN mode DEFAULT group default qlen 1000
          link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      $ ip l set vlan1 up
      $ ip l sh vlan1
      70: vlan1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
      noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
          link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
      $ ip l sh vlan1-macvlan
      71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
      qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
      
      vlan1-macvlan is still UP, still has carrier and is still in the same
      operstate as before. After the patch in case 1 macvl0 has state UP as it
      should and in case 2 vlan1-macvlan has state LOWERLAYERDOWN again as it
      should. Note that while the lower macvlan device is down their carrier
      and thus operstate can go out of sync but that will be fixed once the
      lower device goes up again.
      This behaviour seems to have been present since beginning of git history.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de7d244d
  23. 16 12月, 2015 2 次提交
  24. 18 11月, 2015 1 次提交
  25. 13 10月, 2015 1 次提交
  26. 04 8月, 2015 1 次提交
  27. 04 5月, 2015 1 次提交
  28. 03 4月, 2015 1 次提交
  29. 03 3月, 2015 1 次提交
  30. 24 1月, 2015 1 次提交
  31. 10 12月, 2014 2 次提交
    • M
      macvlan: play well with ipvlan device · d6b00fec
      Mahesh Bandewar 提交于
      If device is already used as an ipvlan port then refuse to
      use it as a macvlan port at early stage of port creation.
      
      	thost1:~# ip link add link eth0 ipvl0 type ipvlan
      	thost1:~# echo $?
      	0
      	thost1:~# ip link add link eth0 mvl0 type macvlan
      	RTNETLINK answers: Device or resource busy
      	thost1:~# echo $?
      	2
      	thost1:~#
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6b00fec
    • M
      macvlan: allow setting LRO independently of lower device · 62dbe830
      Michal Kubeček 提交于
      Since commit fbe168ba ("net: generic dev_disable_lro() stacked
      device handling"), dev_disable_lro() zeroes NETIF_F_LRO feature flag
      first for a macvlan device and then for its lower device. As an attempt
      to set NETIF_F_LRO to zero is ignored, dev_disable_lro() issues a
      warning and taints kernel.
      
      Allowing NETIF_F_LRO to be set independently of the lower device
      consists of three parts:
      
        - add the flag to hw_features to allow toggling it
        - allow setting it to 0 even if lower device has the flag set
        - add the flag to MACVLAN_FEATURES to restore copying from lower
          device on macvlan creation
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62dbe830
  32. 03 12月, 2014 1 次提交
  33. 30 11月, 2014 1 次提交
  34. 26 10月, 2014 1 次提交
  35. 11 10月, 2014 2 次提交
    • J
      macvlan: optimize the receive path · d1dd9119
      jbaron@akamai.com 提交于
      The netif_rx() call on the fast path of macvlan_handle_frame() appears to
      be there to ensure that we properly throttle incoming packets. However, it
      would appear as though the proper throttling is already in place for all
      possible ingress paths, and that the call is redundant. If packets are arriving
      from the physical NIC, we've already throttled them by this point. Otherwise,
      if they are coming via macvlan_queue_xmit(), it calls either
      'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in
      the broadcast case, we are throttling via macvlan_broadcast_enqueue().
      
      The test results below are from off the box to an lxc instance running macvlan.
      Once the tranactions/sec stop increasing, the cpu idle time has gone to 0.
      Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card.
      
      for i in {10,100,200,300,400,500};
      do super_netperf $i -H $ip -t TCP_RR; done
      Average of 5 runs.
      
      trans/sec 		 trans/sec
      (3.17-rc7-net-next)      (3.17-rc7-net-next + this patch)
      ----------               ----------
      208101                   211534 (+1.6%)
      839493                   850162 (+1.3%)
      845071                   844053 (-.12%)
      816330                   819623 (+.4%)
      778700                   789938 (+1.4%)
      735984                   754408 (+2.5%)
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1dd9119
    • J
      macvlan: pass 'bool' type to macvlan_count_rx() · 4c979935
      jbaron@akamai.com 提交于
      Pass last argument to macvlan_count_rx() as the correct bool type.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c979935
  36. 08 10月, 2014 1 次提交
    • E
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet 提交于
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02875878