1. 24 2月, 2022 1 次提交
  2. 09 2月, 2022 2 次提交
    • X
      vlan: move dev_put into vlan_dev_uninit · d6ff94af
      Xin Long 提交于
      Shuang Li reported an QinQ issue by simply doing:
      
        # ip link add dummy0 type dummy
        # ip link add link dummy0 name dummy0.1 type vlan id 1
        # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
        # rmmod 8021q
      
       unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
      
      When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
      and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
      before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
      
      When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
      NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
      due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
      netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
      release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
      vlan_dev_free().
      
      This issue was introduced by Commit 563bcbae ("net: vlan: fix a UAF in
      vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
      vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
      netdev_wait_allrefs().
      
      Fixes: 563bcbae ("net: vlan: fix a UAF in vlan_dev_real_dev()")
      Reported-by: NShuang Li <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6ff94af
    • X
      vlan: introduce vlan_dev_free_egress_priority · 37aa50c5
      Xin Long 提交于
      This patch is to introduce vlan_dev_free_egress_priority() to
      free egress priority for vlan dev, and keep vlan_dev_uninit()
      static as .ndo_uninit. It makes the code more clear and safer
      when adding new code in vlan_dev_uninit() in the future.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37aa50c5
  3. 08 12月, 2021 1 次提交
  4. 27 11月, 2021 1 次提交
    • Z
      net: vlan: fix underflow for the real_dev refcnt · 01d9cc2d
      Ziyang Xuan 提交于
      Inject error before dev_hold(real_dev) in register_vlan_dev(),
      and execute the following testcase:
      
      ip link add dev dummy1 type dummy
      ip link add name dummy1.100 link dummy1 type vlan id 100
      ip link del dev dummy1
      
      When the dummy netdevice is removed, we will get a WARNING as following:
      
      =======================================================================
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0
      
      and an endless loop of:
      
      =======================================================================
      unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824
      
      That is because dev_put(real_dev) in vlan_dev_free() be called without
      dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
      underflow.
      
      Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
      ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
      symmetrical.
      
      Fixes: 563bcbae ("net: vlan: fix a UAF in vlan_dev_real_dev()")
      Reported-by: NPetr Machata <petrm@nvidia.com>
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      01d9cc2d
  5. 22 11月, 2021 2 次提交
  6. 03 11月, 2021 1 次提交
    • Z
      net: vlan: fix a UAF in vlan_dev_real_dev() · 563bcbae
      Ziyang Xuan 提交于
      The real_dev of a vlan net_device may be freed after
      unregister_vlan_dev(). Access the real_dev continually by
      vlan_dev_real_dev() will trigger the UAF problem for the
      real_dev like following:
      
      ==================================================================
      BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
      Call Trace:
       kasan_report.cold+0x83/0xdf
       vlan_dev_real_dev+0xf9/0x120
       is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
       is_eth_port_of_netdev_filter+0x28/0x40
       ib_enum_roce_netdev+0x1a3/0x300
       ib_enum_all_roce_netdevs+0xc7/0x140
       netdevice_event_work_handler+0x9d/0x210
      ...
      
      Freed by task 9288:
       kasan_save_stack+0x1b/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x20/0x30
       __kasan_slab_free+0xfc/0x130
       slab_free_freelist_hook+0xdd/0x240
       kfree+0xe4/0x690
       kvfree+0x42/0x50
       device_release+0x9f/0x240
       kobject_put+0x1c8/0x530
       put_device+0x1b/0x30
       free_netdev+0x370/0x540
       ppp_destroy_interface+0x313/0x3d0
      ...
      
      Move the put_device(real_dev) to vlan_dev_free(). Ensure
      real_dev not be freed before vlan_dev unregistered.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563bcbae
  7. 02 10月, 2021 1 次提交
  8. 28 7月, 2021 1 次提交
    • A
      dev_ioctl: split out ndo_eth_ioctl · a7605370
      Arnd Bergmann 提交于
      Most users of ndo_do_ioctl are ethernet drivers that implement
      the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
      timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
      
      Separate these from the few drivers that use ndo_do_ioctl to
      implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
      
      This is a purely cosmetic change intended to help readers find
      their way through the implementation.
      
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Vladimir Oltean <olteanv@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7605370
  9. 04 6月, 2021 1 次提交
  10. 25 3月, 2021 2 次提交
  11. 15 1月, 2021 1 次提交
  12. 24 8月, 2020 1 次提交
  13. 29 6月, 2020 1 次提交
  14. 10 6月, 2020 1 次提交
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
  15. 08 5月, 2020 1 次提交
  16. 05 5月, 2020 1 次提交
  17. 08 1月, 2020 1 次提交
    • E
      vlan: fix memory leak in vlan_dev_set_egress_priority · 9bbd917e
      Eric Dumazet 提交于
      There are few cases where the ndo_uninit() handler might be not
      called if an error happens while device is initialized.
      
      Since vlan_newlink() calls vlan_changelink() before
      trying to register the netdevice, we need to make sure
      vlan_dev_uninit() has been called at least once,
      or we might leak allocated memory.
      
      BUG: memory leak
      unreferenced object 0xffff888122a206c0 (size 32):
        comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00  ......as........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline]
          [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline]
          [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194
          [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126
          [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181
          [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305
          [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363
          [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixe: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbd917e
  18. 26 12月, 2019 1 次提交
  19. 25 10月, 2019 2 次提交
    • T
      net: remove unnecessary variables and callback · f3b0a18b
      Taehee Yoo 提交于
      This patch removes variables and callback these are related to the nested
      device structure.
      devices that can be nested have their own nest_level variable that
      represents the depth of nested devices.
      In the previous patch, new {lower/upper}_level variables are added and
      they replace old private nest_level variable.
      So, this patch removes all 'nest_level' variables.
      
      In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
      to get lockdep subclass value, which is actually lower nested depth value.
      But now, they use the dynamic lockdep key to avoid lockdep warning instead
      of the subclass.
      So, this patch removes ->ndo_get_lock_subclass() callback.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3b0a18b
    • T
      net: core: add generic lockdep keys · ab92d68f
      Taehee Yoo 提交于
      Some interface types could be nested.
      (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
      These interface types should set lockdep class because, without lockdep
      class key, lockdep always warn about unexisting circular locking.
      
      In the current code, these interfaces have their own lockdep class keys and
      these manage itself. So that there are so many duplicate code around the
      /driver/net and /net/.
      This patch adds new generic lockdep keys and some helper functions for it.
      
      This patch does below changes.
      a) Add lockdep class keys in struct net_device
         - qdisc_running, xmit, addr_list, qdisc_busylock
         - these keys are used as dynamic lockdep key.
      b) When net_device is being allocated, lockdep keys are registered.
         - alloc_netdev_mqs()
      c) When net_device is being free'd llockdep keys are unregistered.
         - free_netdev()
      d) Add generic lockdep key helper function
         - netdev_register_lockdep_key()
         - netdev_unregister_lockdep_key()
         - netdev_update_lockdep_key()
      e) Remove unnecessary generic lockdep macro and functions
      f) Remove unnecessary lockdep code of each interfaces.
      
      After this patch, each interface modules don't need to maintain
      their lockdep keys.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab92d68f
  20. 05 6月, 2019 1 次提交
    • A
      net: vlan: Inherit MPLS features from parent device · 8b6912a5
      Ariel Levkovich 提交于
      During the creation of the VLAN interface net device,
      the various device features and offloads are being set based
      on the parent device's features.
      The code initiates the basic, vlan and encapsulation features
      but doesn't address the MPLS features set and they remain blank.
      As a result, all device offloads that have significant performance
      effect are disabled for MPLS traffic going via this VLAN device such
      as checksumming and TSO.
      
      This patch makes sure that MPLS features are also set for the
      VLAN device based on the parent which will allow HW offloads of
      checksumming and TSO to be performed on MPLS tagged packets.
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b6912a5
  21. 31 5月, 2019 1 次提交
  22. 21 5月, 2019 1 次提交
    • G
      vlan: Mark expected switch fall-through · fa2c52be
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warning:
      
      net/8021q/vlan_dev.c: In function ‘vlan_dev_ioctl’:
      net/8021q/vlan_dev.c:374:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
         if (!net_eq(dev_net(dev), &init_net))
            ^
      net/8021q/vlan_dev.c:376:2: note: here
        case SIOCGMIIPHY:
        ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa2c52be
  23. 10 5月, 2019 1 次提交
  24. 20 4月, 2019 2 次提交
  25. 05 4月, 2019 1 次提交
    • C
      vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x · 0a89eb92
      Chris Leech 提交于
      Way back in 3c9c36bc the
      ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
      CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
      driver and used by bnx2fc without including the generic software fcoe
      module.
      
      But, FCoE is generally used over an 802.1q VLAN, and the implementation
      of ndo_fcoe_get_wwn in the 8021q module was not similarly changed.  The
      result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
      call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
      bnx2x interface.  The bnx2fc driver then falls back to a potentially
      different mapping of Ethernet MAC to Fibre Channel WWN, creating an
      incompatibility with the fabric and target configurations when compared
      to the WWNs used by pre-boot firmware and differently-configured
      kernels.
      
      So make the conditional inclusion of FCoE code in 8021q match the
      conditional inclusion in netdevice.h
      Signed-off-by: NChris Leech <cleech@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a89eb92
  26. 25 2月, 2019 1 次提交
  27. 08 11月, 2018 1 次提交
    • D
      net: vlan: add support for tunnel offload · 7dad9937
      Davide Caratti 提交于
      GSO tunneled packets are always segmented in software before they are
      transmitted by a VLAN, even when the lower device can offload tunnel
      encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL
      mask are set in the lower device 'vlan_features'). If we let VLANs have
      the same tunnel offload capabilities as their lower device, throughput
      can improve significantly when CPU is limited on the transmitter side.
      
       - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure
       that 'features' will have those bits zeroed only when the lower device
       has no hardware support for tunnel encapsulation.
       - for the same reason, copy GSO-related bits of 'hw_enc_features' from
       lower device to VLAN, and ensure to update that value when the lower
       device changes its features.
       - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev'
       is able to compute checksums at least for a kind of packets, like done
       with commit 8403debe ("vlan: Keep NETIF_F_HW_CSUM similar to other
       software devices"). This avoids software segmentation due to mismatching
       checksum capabilities between VLAN's 'features' and 'hw_enc_features'.
      Reported-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dad9937
  28. 20 10月, 2018 1 次提交
    • D
      netpoll: allow cleanup to be synchronous · c9fbd71f
      Debabrata Banerjee 提交于
      This fixes a problem introduced by:
      commit 2cde6acd ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
      
      When using netconsole on a bond, __netpoll_cleanup can asynchronously
      recurse multiple times, each __netpoll_free_async call can result in
      more __netpoll_free_async's. This means there is now a race between
      cleanup_work queues on multiple netpoll_info's on multiple devices and
      the configuration of a new netpoll. For example if a netconsole is set
      to enable 0, reconfigured, and enable 1 immediately, this netconsole
      will likely not work.
      
      Given the reason for __netpoll_free_async is it can be called when rtnl
      is not locked, if it is locked, we should be able to execute
      synchronously. It appears to be locked everywhere it's called from.
      
      Generalize the design pattern from the teaming driver for current
      callers of __netpoll_free_async.
      
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fbd71f
  29. 08 5月, 2018 1 次提交
  30. 02 4月, 2018 1 次提交
  31. 16 6月, 2017 1 次提交
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
  32. 09 6月, 2017 1 次提交
  33. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  34. 09 5月, 2017 1 次提交
    • V
      vlan: Keep NETIF_F_HW_CSUM similar to other software devices · 8403debe
      Vlad Yasevich 提交于
      Vlan devices, like all other software devices, enable
      NETIF_F_HW_CSUM feature.  However, unlike all the othe other
      software devices, vlans will switch to using IP|IPV6_CSUM
      features, if the underlying devices uses them.  In these situations,
      checksum offload features on the vlan device can't be controlled
      via ethtool.
      
      This patch makes vlans keep HW_CSUM feature if the underlying
      device supports checksum offloading.  This makes vlan devices
      behave like other software devices, and restores control to the
      user.
      
      A side-effect is that some offload settings (typically UFO)
      may be enabled on the vlan device while being disabled on the HW.
      However, the GSO code will correctly process the packets. This
      actually results in slightly better raw throughput.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8403debe
  35. 22 3月, 2017 1 次提交
    • A
      net/8021q: create device with all possible features in wanted_features · 88997e42
      Andrey Vagin 提交于
      wanted_features is a set of features which have to be enabled if a
      hardware allows that.
      
      Currently when a vlan device is created, its wanted_features is set to
      current features of its base device.
      
      The problem is that the base device can get new features and they are
      not propagated to vlan-s of this device.
      
      If we look at bonding devices, they doesn't have this problem and this
      patch suggests to fix this issue by the same way how it works for bonding
      devices.
      
      We meet this problem, when we try to create a vlan device over a bonding
      device. When a system are booting, real devices require time to be
      initialized, so bonding devices created without slaves, then vlan
      devices are created and only then ethernet devices are added to the
      bonding device. As a result we have vlan devices with disabled
      scatter-gather.
      
      * create a bonding device
        $ ip link add bond0 type bond
        $ ethtool -k bond0 | grep scatter
        scatter-gather: off
      	tx-scatter-gather: off [requested on]
      	tx-scatter-gather-fraglist: off [requested on]
      
      * create a vlan device
        $ ip link add link bond0 name bond0.10 type vlan id 10
        $ ethtool -k bond0.10 | grep scatter
        scatter-gather: off
      	tx-scatter-gather: off
      	tx-scatter-gather-fraglist: off
      
      * Add a slave device to bond0
        $ ip link set dev eth0 master bond0
      
      And now we can see that the bond0 device has got the scatter-gather
      feature, but the bond0.10 hasn't got it.
      [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
      scatter-gather: on
      	tx-scatter-gather: on
      	tx-scatter-gather-fraglist: on
      [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
      scatter-gather: off
      	tx-scatter-gather: off
      	tx-scatter-gather-fraglist: off
      
      With this patch the vlan device will get all new features from the
      bonding device.
      
      Here is a call trace how features which are set in this patch reach
      dev->wanted_features.
      
      register_netdevice
         vlan_dev_init
      	...
      	dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
      		       NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
      		       NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
      		       NETIF_F_ALL_FCOE;
      
      	dev->features |= dev->hw_features;
      	...
          dev->wanted_features = dev->features & dev->hw_features;
          __netdev_update_features(dev);
              vlan_dev_fix_features
      	   ...
      
      Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88997e42