1. 30 10月, 2019 1 次提交
    • T
      bonding: fix using uninitialized mode_lock · ad9bd8da
      Taehee Yoo 提交于
      When a bonding interface is being created, it setups its mode and options.
      At that moment, it uses mode_lock so mode_lock should be initialized
      before that moment.
      
      rtnl_newlink()
      	rtnl_create_link()
      		alloc_netdev_mqs()
      			->setup() //bond_setup()
      	->newlink //bond_newlink
      		bond_changelink()
      		register_netdevice()
      			->ndo_init() //bond_init()
      
      After commit 089bca2c ("bonding: use dynamic lockdep key instead of
      subclass"), mode_lock is initialized in bond_init().
      So in the bond_changelink(), un-initialized mode_lock can be used.
      mode_lock should be initialized in bond_setup().
      This patch partially reverts commit 089bca2c ("bonding: use dynamic
      lockdep key instead of subclass")
      
      Test command:
          ip link add bond0 type bond mode 802.3ad lacp_rate 0
      
      Splat looks like:
      [   60.615127] INFO: trying to register non-static key.
      [   60.615900] the code is fine but needs lockdep annotation.
      [   60.616697] turning off the locking correctness validator.
      [   60.617490] CPU: 1 PID: 957 Comm: ip Not tainted 5.4.0-rc3+ #109
      [   60.618350] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   60.619481] Call Trace:
      [   60.619918]  dump_stack+0x7c/0xbb
      [   60.620453]  register_lock_class+0x1215/0x14d0
      [   60.621131]  ? alloc_netdev_mqs+0x7b3/0xcc0
      [   60.621771]  ? is_bpf_text_address+0x86/0xf0
      [   60.622416]  ? is_dynamic_key+0x230/0x230
      [   60.623032]  ? unwind_get_return_address+0x5f/0xa0
      [   60.623757]  ? create_prof_cpu_mask+0x20/0x20
      [   60.624408]  ? arch_stack_walk+0x83/0xb0
      [   60.625023]  __lock_acquire+0xd8/0x3de0
      [   60.625616]  ? stack_trace_save+0x82/0xb0
      [   60.626225]  ? stack_trace_consume_entry+0x160/0x160
      [   60.626957]  ? deactivate_slab.isra.80+0x2c5/0x800
      [   60.627668]  ? register_lock_class+0x14d0/0x14d0
      [   60.628380]  ? alloc_netdev_mqs+0x7b3/0xcc0
      [   60.629020]  ? save_stack+0x69/0x80
      [   60.629574]  ? save_stack+0x19/0x80
      [   60.630121]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
      [   60.630859]  ? __kmalloc_node+0x16f/0x480
      [   60.631472]  ? alloc_netdev_mqs+0x7b3/0xcc0
      [   60.632121]  ? rtnl_create_link+0x2ed/0xad0
      [   60.634388]  ? __rtnl_newlink+0xad4/0x11b0
      [   60.635024]  lock_acquire+0x164/0x3b0
      [   60.635608]  ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
      [   60.636463]  _raw_spin_lock_bh+0x38/0x70
      [   60.637084]  ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
      [   60.637930]  bond_3ad_update_lacp_rate+0x91/0x200 [bonding]
      [   60.638753]  ? bond_3ad_lacpdu_recv+0xb30/0xb30 [bonding]
      [   60.639552]  ? bond_opt_get_val+0x180/0x180 [bonding]
      [   60.640307]  ? ___slab_alloc+0x5aa/0x610
      [   60.640925]  bond_option_lacp_rate_set+0x71/0x140 [bonding]
      [   60.641751]  __bond_opt_set+0x1ff/0xbb0 [bonding]
      [   60.643217]  ? kasan_unpoison_shadow+0x30/0x40
      [   60.643924]  bond_changelink+0x9a4/0x1700 [bonding]
      [   60.644653]  ? memset+0x1f/0x40
      [   60.742941]  ? bond_slave_changelink+0x1a0/0x1a0 [bonding]
      [   60.752694]  ? alloc_netdev_mqs+0x8ea/0xcc0
      [   60.753330]  ? rtnl_create_link+0x2ed/0xad0
      [   60.753964]  bond_newlink+0x1e/0x60 [bonding]
      [   60.754612]  __rtnl_newlink+0xb9f/0x11b0
      [ ... ]
      
      Reported-by: syzbot+8da67f407bcba2c72e6e@syzkaller.appspotmail.com
      Reported-by: syzbot+0d083911ab18b710da71@syzkaller.appspotmail.com
      Fixes: 089bca2c ("bonding: use dynamic lockdep key instead of subclass")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad9bd8da
  2. 25 10月, 2019 4 次提交
    • T
      net: remove unnecessary variables and callback · f3b0a18b
      Taehee Yoo 提交于
      This patch removes variables and callback these are related to the nested
      device structure.
      devices that can be nested have their own nest_level variable that
      represents the depth of nested devices.
      In the previous patch, new {lower/upper}_level variables are added and
      they replace old private nest_level variable.
      So, this patch removes all 'nest_level' variables.
      
      In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
      to get lockdep subclass value, which is actually lower nested depth value.
      But now, they use the dynamic lockdep key to avoid lockdep warning instead
      of the subclass.
      So, this patch removes ->ndo_get_lock_subclass() callback.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3b0a18b
    • T
      bonding: use dynamic lockdep key instead of subclass · 089bca2c
      Taehee Yoo 提交于
      All bonding device has same lockdep key and subclass is initialized with
      nest_level.
      But actual nest_level value can be changed when a lower device is attached.
      And at this moment, the subclass should be updated but it seems to be
      unsafe.
      So this patch makes bonding use dynamic lockdep key instead of the
      subclass.
      
      Test commands:
          ip link add bond0 type bond
      
          for i in {1..5}
          do
      	    let A=$i-1
      	    ip link add bond$i type bond
      	    ip link set bond$i master bond$A
          done
          ip link set bond5 master bond0
      
      Splat looks like:
      [  307.992912] WARNING: possible recursive locking detected
      [  307.993656] 5.4.0-rc3+ #96 Tainted: G        W
      [  307.994367] --------------------------------------------
      [  307.995092] ip/761 is trying to acquire lock:
      [  307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  307.997045]
      	       but task is already holding lock:
      [  307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  307.999215]
      	       other info that might help us debug this:
      [  308.000251]  Possible unsafe locking scenario:
      
      [  308.001137]        CPU0
      [  308.001533]        ----
      [  308.001915]   lock(&(&bond->stats_lock)->rlock#2/2);
      [  308.002609]   lock(&(&bond->stats_lock)->rlock#2/2);
      [  308.003302]
      		*** DEADLOCK ***
      
      [  308.004310]  May be due to missing lock nesting notation
      
      [  308.005319] 3 locks held by ip/761:
      [  308.005830]  #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
      [  308.006894]  #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  308.008243]  #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
      [  308.009422]
      	       stack backtrace:
      [  308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G        W         5.4.0-rc3+ #96
      [  308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  308.012179] Call Trace:
      [  308.012601]  dump_stack+0x7c/0xbb
      [  308.013089]  __lock_acquire+0x269d/0x3de0
      [  308.013669]  ? register_lock_class+0x14d0/0x14d0
      [  308.014318]  lock_acquire+0x164/0x3b0
      [  308.014858]  ? bond_get_stats+0xb8/0x500 [bonding]
      [  308.015520]  _raw_spin_lock_nested+0x2e/0x60
      [  308.016129]  ? bond_get_stats+0xb8/0x500 [bonding]
      [  308.017215]  bond_get_stats+0xb8/0x500 [bonding]
      [  308.018454]  ? bond_arp_rcv+0xf10/0xf10 [bonding]
      [  308.019710]  ? rcu_read_lock_held+0x90/0xa0
      [  308.020605]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [  308.021286]  ? bond_get_stats+0x9f/0x500 [bonding]
      [  308.021953]  dev_get_stats+0x1ec/0x270
      [  308.022508]  bond_get_stats+0x1d1/0x500 [bonding]
      
      Fixes: d3fff6c4 ("net: add netdev_lockdep_set_classes() helper")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      089bca2c
    • T
      bonding: fix unexpected IFF_BONDING bit unset · 65de65d9
      Taehee Yoo 提交于
      The IFF_BONDING means bonding master or bonding slave device.
      ->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets
      IFF_BONDING flag.
      
      bond0<--bond1
      
      Both bond0 and bond1 are bonding device and these should keep having
      IFF_BONDING flag until they are removed.
      But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine
      do not check whether the slave device is the bonding type or not.
      This patch adds the interface type check routine before removing
      IFF_BONDING flag.
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ip link set bond1 master bond0
          ip link set bond1 nomaster
          ip link del bond1 type bond
          ip link add bond1 type bond
      
      Splat looks like:
      [  226.665555] proc_dir_entry 'bonding/bond1' already registered
      [  226.666440] WARNING: CPU: 0 PID: 737 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0
      [  226.667571] Modules linked in: bonding af_packet sch_fq_codel ip_tables x_tables unix
      [  226.668662] CPU: 0 PID: 737 Comm: ip Not tainted 5.4.0-rc3+ #96
      [  226.669508] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  226.670652] RIP: 0010:proc_register+0x2a9/0x3e0
      [  226.671612] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 a0 0b 14 9f 48 8b b0 e
      0 00 00 00 e8 07 e7 88 ff <0f> 0b 48 c7 c7 40 2d a5 9f e8 59 d6 23 01 48 8b 4c 24 10 48 b8 00
      [  226.675007] RSP: 0018:ffff888050e17078 EFLAGS: 00010282
      [  226.675761] RAX: dffffc0000000008 RBX: ffff88805fdd0f10 RCX: ffffffff9dd344e2
      [  226.676757] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f6b8c
      [  226.677751] RBP: ffff8880507160f3 R08: ffffed100d940019 R09: ffffed100d940019
      [  226.678761] R10: 0000000000000001 R11: ffffed100d940018 R12: ffff888050716008
      [  226.679757] R13: ffff8880507160f2 R14: dffffc0000000000 R15: ffffed100a0e2c1e
      [  226.680758] FS:  00007fdc217cc0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
      [  226.681886] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  226.682719] CR2: 00007f49313424d0 CR3: 0000000050e46001 CR4: 00000000000606f0
      [  226.683727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  226.684725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  226.685681] Call Trace:
      [  226.687089]  proc_create_seq_private+0xb3/0xf0
      [  226.687778]  bond_create_proc_entry+0x1b3/0x3f0 [bonding]
      [  226.691458]  bond_netdev_event+0x433/0x970 [bonding]
      [  226.692139]  ? __module_text_address+0x13/0x140
      [  226.692779]  notifier_call_chain+0x90/0x160
      [  226.693401]  register_netdevice+0x9b3/0xd80
      [  226.694010]  ? alloc_netdev_mqs+0x854/0xc10
      [  226.694629]  ? netdev_change_features+0xa0/0xa0
      [  226.695278]  ? rtnl_create_link+0x2ed/0xad0
      [  226.695849]  bond_newlink+0x2a/0x60 [bonding]
      [  226.696422]  __rtnl_newlink+0xb9f/0x11b0
      [  226.696968]  ? rtnl_link_unregister+0x220/0x220
      [ ... ]
      
      Fixes: 0b680e75 ("[PATCH] bonding: Add priv_flag to avoid event mishandling")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65de65d9
    • T
      net: core: add generic lockdep keys · ab92d68f
      Taehee Yoo 提交于
      Some interface types could be nested.
      (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
      These interface types should set lockdep class because, without lockdep
      class key, lockdep always warn about unexisting circular locking.
      
      In the current code, these interfaces have their own lockdep class keys and
      these manage itself. So that there are so many duplicate code around the
      /driver/net and /net/.
      This patch adds new generic lockdep keys and some helper functions for it.
      
      This patch does below changes.
      a) Add lockdep class keys in struct net_device
         - qdisc_running, xmit, addr_list, qdisc_busylock
         - these keys are used as dynamic lockdep key.
      b) When net_device is being allocated, lockdep keys are registered.
         - alloc_netdev_mqs()
      c) When net_device is being free'd llockdep keys are unregistered.
         - free_netdev()
      d) Add generic lockdep key helper function
         - netdev_register_lockdep_key()
         - netdev_unregister_lockdep_key()
         - netdev_update_lockdep_key()
      e) Remove unnecessary generic lockdep macro and functions
      f) Remove unnecessary lockdep code of each interfaces.
      
      After this patch, each interface modules don't need to maintain
      their lockdep keys.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab92d68f
  3. 10 10月, 2019 1 次提交
    • E
      bonding: fix potential NULL deref in bond_update_slave_arr · a7137534
      Eric Dumazet 提交于
      syzbot got a NULL dereference in bond_update_slave_arr() [1],
      happening after a failure to allocate bond->slave_arr
      
      A workqueue (bond_slave_arr_handler) is supposed to retry
      the allocation later, but if the slave is removed before
      the workqueue had a chance to complete, bond->slave_arr
      can still be NULL.
      
      [1]
      
      Failed to build slave-array.
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      Modules linked in:
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:bond_update_slave_arr.cold+0xc6/0x198 drivers/net/bonding/bond_main.c:4039
      RSP: 0018:ffff88018fe33678 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000290b000
      RDX: 0000000000000000 RSI: ffffffff82b63037 RDI: ffff88019745ea20
      RBP: ffff88018fe33760 R08: ffff880170754280 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff88019745ea00 R14: 0000000000000000 R15: ffff88018fe338b0
      FS:  00007febd837d700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004540a0 CR3: 00000001c242e005 CR4: 00000000001626f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffff82b5b45e>] __bond_release_one+0x43e/0x500 drivers/net/bonding/bond_main.c:1923
       [<ffffffff82b5b966>] bond_release drivers/net/bonding/bond_main.c:2039 [inline]
       [<ffffffff82b5b966>] bond_do_ioctl+0x416/0x870 drivers/net/bonding/bond_main.c:3562
       [<ffffffff83ae25f4>] dev_ifsioc+0x6f4/0x940 net/core/dev_ioctl.c:328
       [<ffffffff83ae2e58>] dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:495
       [<ffffffff83995ffd>] sock_do_ioctl+0x1bd/0x300 net/socket.c:1088
       [<ffffffff83996a80>] sock_ioctl+0x300/0x5d0 net/socket.c:1196
       [<ffffffff81b124db>] vfs_ioctl fs/ioctl.c:47 [inline]
       [<ffffffff81b124db>] file_ioctl fs/ioctl.c:501 [inline]
       [<ffffffff81b124db>] do_vfs_ioctl+0xacb/0x1300 fs/ioctl.c:688
       [<ffffffff81b12dc6>] SYSC_ioctl fs/ioctl.c:705 [inline]
       [<ffffffff81b12dc6>] SyS_ioctl+0xb6/0xe0 fs/ioctl.c:696
       [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
       [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: ee637714 ("bonding: Simplify the xmit function for modes that use xmit_hash")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      a7137534
  4. 09 8月, 2019 1 次提交
    • Y
      bonding: Add vlan tx offload to hw_enc_features · d595b03d
      YueHaibing 提交于
      As commit 30d8177e ("bonding: Always enable vlan tx offload")
      said, we should always enable bonding's vlan tx offload, pass the
      vlan packets to the slave devices with vlan tci, let them to handle
      vlan implementation.
      
      Now if encapsulation protocols like VXLAN is used, skb->encapsulation
      may be set, then the packet is passed to vlan device which based on
      bonding device. However in netif_skb_features(), the check of
      hw_enc_features:
      
      	 if (skb->encapsulation)
                       features &= dev->hw_enc_features;
      
      clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
      in same issue in commit 30d8177e like this:
      
      vlan_dev_hard_start_xmit
        -->dev_queue_xmit
          -->validate_xmit_skb
            -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
            -->validate_xmit_vlan
              -->__vlan_hwaccel_push_inside //skb->tci is cleared
      ...
       --> bond_start_xmit
         --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
           --> __skb_flow_dissect // nhoff point to IP header
              -->  case htons(ETH_P_8021Q)
                   // skb_vlan_tag_present is false, so
                   vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
                   //vlan point to ip header wrongly
      
      Fixes: b2a103e6 ("bonding: convert to ndo_fix_features")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d595b03d
  5. 23 7月, 2019 1 次提交
    • T
      bonding: Force slave speed check after link state recovery for 802.3ad · 12185dfe
      Thomas Falcon 提交于
      The following scenario was encountered during testing of logical
      partition mobility on pseries partitions with bonded ibmvnic
      adapters in LACP mode.
      
      1. Driver receives a signal that the device has been
         swapped, and it needs to reset to initialize the new
         device.
      
      2. Driver reports loss of carrier and begins initialization.
      
      3. Bonding driver receives NETDEV_CHANGE notifier and checks
         the slave's current speed and duplex settings. Because these
         are unknown at the time, the bond sets its link state to
         BOND_LINK_FAIL and handles the speed update, clearing
         AD_PORT_LACP_ENABLE.
      
      4. Driver finishes recovery and reports that the carrier is on.
      
      5. Bond receives a new notification and checks the speed again.
         The speeds are valid but miimon has not altered the link
         state yet.  AD_PORT_LACP_ENABLE remains off.
      
      Because the slave's link state is still BOND_LINK_FAIL,
      no further port checks are made when it recovers. Though
      the slave devices are operational and have valid speed
      and duplex settings, the bond will not send LACPDU's. The
      simplest fix I can see is to force another speed check
      in bond_miimon_commit. This way the bond will update
      AD_PORT_LACP_ENABLE if needed when transitioning from
      BOND_LINK_FAIL to BOND_LINK_UP.
      
      CC: Jarod Wilson <jarod@redhat.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12185dfe
  6. 05 7月, 2019 1 次提交
    • V
      bonding: add an option to specify a delay between peer notifications · 07a4ddec
      Vincent Bernat 提交于
      Currently, gratuitous ARP/ND packets are sent every `miimon'
      milliseconds. This commit allows a user to specify a custom delay
      through a new option, `peer_notif_delay'.
      
      Like for `updelay' and `downdelay', this delay should be a multiple of
      `miimon' to avoid managing an additional work queue. The configuration
      logic is copied from `updelay' and `downdelay'. However, the default
      value cannot be set using a module parameter: Netlink or sysfs should
      be used to configure this feature.
      
      When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
      observe the 500 ms delay is respected:
      
          20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
      
      In bond_mii_monitor(), I have tried to keep the lock logic readable.
      The change is due to the fact we cannot rely on a notification to
      lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
      only triggered once every N times, while we need to decrement the
      counter each time.
      
      iproute2 also needs to be updated to be able to specify this new
      attribute through `ip link'.
      Signed-off-by: NVincent Bernat <vincent@bernat.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07a4ddec
  7. 04 7月, 2019 1 次提交
  8. 03 7月, 2019 1 次提交
    • E
      bonding/main: fix NULL dereference in bond_select_active_slave() · b8bd72d3
      Eric Dumazet 提交于
      A bonding master can be up while best_slave is NULL.
      
      [12105.636318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [12105.638204] mlx4_en: eth1: Linkstate event 1 -> 1
      [12105.648984] IP: bond_select_active_slave+0x125/0x250
      [12105.653977] PGD 0 P4D 0
      [12105.656572] Oops: 0000 [#1] SMP PTI
      [12105.660487] gsmi: Log Shutdown Reason 0x03
      [12105.664620] Modules linked in: kvm_intel loop act_mirred uhaul vfat fat stg_standard_ftl stg_megablocks stg_idt stg_hdi stg elephant_dev_num stg_idt_eeprom w1_therm wire i2c_mux_pca954x i2c_mux mlx4_i2c i2c_usb cdc_acm ehci_pci ehci_hcd i2c_iimc mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core [last unloaded: kvm_intel]
      [12105.685686] mlx4_core 0000:03:00.0: dispatching link up event for port 2
      [12105.685700] mlx4_en: eth2: Linkstate event 2 -> 1
      [12105.685700] mlx4_en: eth2: Link Up (linkstate)
      [12105.724452] Workqueue: bond0 bond_mii_monitor
      [12105.728854] RIP: 0010:bond_select_active_slave+0x125/0x250
      [12105.734355] RSP: 0018:ffffaf146a81fd88 EFLAGS: 00010246
      [12105.739637] RAX: 0000000000000003 RBX: ffff8c62b03c6900 RCX: 0000000000000000
      [12105.746838] RDX: 0000000000000000 RSI: ffffaf146a81fd08 RDI: ffff8c62b03c6000
      [12105.754054] RBP: ffffaf146a81fdb8 R08: 0000000000000001 R09: ffff8c517d387600
      [12105.761299] R10: 00000000001075d9 R11: ffffffffaceba92f R12: 0000000000000000
      [12105.768553] R13: ffff8c8240ae4800 R14: 0000000000000000 R15: 0000000000000000
      [12105.775748] FS:  0000000000000000(0000) GS:ffff8c62bfa40000(0000) knlGS:0000000000000000
      [12105.783892] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [12105.789716] CR2: 0000000000000000 CR3: 0000000d0520e001 CR4: 00000000001626f0
      [12105.796976] Call Trace:
      [12105.799446]  [<ffffffffac31d387>] bond_mii_monitor+0x497/0x6f0
      [12105.805317]  [<ffffffffabd42643>] process_one_work+0x143/0x370
      [12105.811225]  [<ffffffffabd42c7a>] worker_thread+0x4a/0x360
      [12105.816761]  [<ffffffffabd48bc5>] kthread+0x105/0x140
      [12105.821865]  [<ffffffffabd42c30>] ? rescuer_thread+0x380/0x380
      [12105.827757]  [<ffffffffabd48ac0>] ? kthread_associate_blkcg+0xc0/0xc0
      [12105.834266]  [<ffffffffac600241>] ret_from_fork+0x51/0x60
      
      Fixes: e2a7420d ("bonding/main: convert to using slave printk macros")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJohn Sperbeck <jsperbeck@google.com>
      Cc: Jarod Wilson <jarod@redhat.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8bd72d3
  9. 26 6月, 2019 1 次提交
    • Y
      bonding: Always enable vlan tx offload · 30d8177e
      YueHaibing 提交于
      We build vlan on top of bonding interface, which vlan offload
      is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is
      BOND_XMIT_POLICY_ENCAP34.
      
      Because vlan tx offload is off, vlan tci is cleared and skb push
      the vlan header in validate_xmit_vlan() while sending from vlan
      devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to
      get information from protocol headers encapsulated within vlan,
      because 'nhoff' is points to IP header, so bond hashing is based
      on layer 2 info, which fails to distribute packets across slaves.
      
      This patch always enable bonding's vlan tx offload, pass the vlan
      packets to the slave devices with vlan tci, let them to handle
      vlan implementation.
      
      Fixes: 278339a4 ("bonding: propogate vlan_features to bonding master")
      Suggested-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d8177e
  10. 10 6月, 2019 3 次提交
    • J
      bonding/main: convert to using slave printk macros · e2a7420d
      Jarod Wilson 提交于
      All of these printk instances benefit from having both master and slave
      device information included, so convert to using a standardized macro
      format and remove redundant information.
      Suggested-by: NJoe Perches <joe@perches.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2a7420d
    • J
      bonding: fix error messages in bond_do_fail_over_mac · f43b6530
      Jarod Wilson 提交于
      Passing the bond name again to debug output when referencing slave is wrong.
      We're trying to set the bond's MAC to that of the new_active slave, so adjust
      the error message slightly and pass in the slave's name, not the bond's.
      Then we're trying to set the MAC on the old active slave, but putting the
      new active slave's name in the output. While we're at it, clarify the
      error messages so you know which one actually triggered.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f43b6530
    • J
      bonding: improve event debug usability · 75466dce
      Jarod Wilson 提交于
      Seeing bonding debug log data along the lines of "event: 5" is a bit spartan,
      and often requires a lookup table if you don't remember what every event is.
      Make use of netdev_cmd_to_name for an improved debugging experience, so for
      the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER"
      instead (both are prefixed with the device for which the event pertains).
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75466dce
  11. 05 6月, 2019 1 次提交
    • A
      net: bonding: Inherit MPLS features from slave devices · 2e770b50
      Ariel Levkovich 提交于
      When setting the bonding interface net device features,
      the kernel code doesn't address the slaves' MPLS features
      and doesn't inherit them.
      
      Therefore, HW offloads that enhance performance such as
      checksumming and TSO are disabled for MPLS tagged traffic
      flowing via the bonding interface.
      
      The patch add the inheritance of the MPLS features from the
      slave devices with a similar logic to setting the bonding device's
      VLAN and encapsulation features.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e770b50
  12. 27 5月, 2019 1 次提交
    • J
      bonding/802.3ad: fix slave link initialization transition states · 33403121
      Jarod Wilson 提交于
      Once in a while, with just the right timing, 802.3ad slaves will fail to
      properly initialize, winding up in a weird state, with a partner system
      mac address of 00:00:00:00:00:00. This started happening after a fix to
      properly track link_failure_count tracking, where an 802.3ad slave that
      reported itself as link up in the miimon code, but wasn't able to get a
      valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
      BOND_LINK_DOWN. That was the proper thing to do for the general "my link
      went down" case, but has created a link initialization race that can put
      the interface in this odd state.
      
      The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
      if the link has never been up (last_link_up == 0), so the link state
      doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
      in this case, it simply hasn't been up yet, and this prevents the
      unnecessary state change from DOWN to FAIL and getting stuck in an init
      failure w/o a partner mac.
      
      Fixes: ea53abfa ("bonding/802.3ad: fix link_failure_count tracking")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Tested-by: NHeesoon Kim <Heesoon.Kim@stratus.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33403121
  13. 16 4月, 2019 1 次提交
    • S
      bonding: fix event handling for stacked bonds · 92480b39
      Sabrina Dubroca 提交于
      When a bond is enslaved to another bond, bond_netdev_event() only
      handles the event as if the bond is a master, and skips treating the
      bond as a slave.
      
      This leads to a refcount leak on the slave, since we don't remove the
      adjacency to its master and the master holds a reference on the slave.
      
      Reproducer:
        ip link add bondL type bond
        ip link add bondU type bond
        ip link set bondL master bondU
        ip link del bondL
      
      No "Fixes:" tag, this code is older than git history.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92480b39
  14. 21 3月, 2019 1 次提交
    • P
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni 提交于
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a350ecce
  15. 25 2月, 2019 1 次提交
  16. 22 2月, 2019 1 次提交
    • M
      bonding: fix PACKET_ORIGDEV regression · 3c963a33
      Michal Soltys 提交于
      This patch fixes a subtle PACKET_ORIGDEV regression which was a side
      effect of fixes introduced by:
      
      6a9e461f bonding: pass link-local packets to bonding master also.
      
      ... to:
      
      b89f04c6 bonding: deliver link-local packets with skb->dev set to link that packets arrived on
      
      While 6a9e461f restored pre-b89f04c6 presence of link-local
      packets on bonding masters (which is required e.g. by linux bridges
      participating in spanning tree or needed for lab-like setups created
      with group_fwd_mask) it also caused the originating device
      information to be lost due to cloning.
      
      Maciej Żenczykowski proposed another solution that doesn't require
      packet cloning and retains original device information - instead of
      returning RX_HANDLER_PASS for all link-local packets it's now limited
      only to packets from inactive slaves.
      
      At the same time, packets passed to bonding masters retain correct
      information about the originating device and PACKET_ORIGDEV can be used
      to determine it.
      
      This elegantly solves all issues so far:
      
      - link-local packets that were removed from bonding masters
      - LLDP daemons being forced to explicitly bind to slave interfaces
      - PACKET_ORIGDEV having no effect on bond interfaces
      
      Fixes: 6a9e461f (bonding: pass link-local packets to bonding master also.)
      Reported-by: NVincent Bernat <vincent@bernat.ch>
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c963a33
  17. 11 1月, 2019 1 次提交
    • W
      bonding: update nest level on unlink · 001e465f
      Willem de Bruijn 提交于
      A network device stack with multiple layers of bonding devices can
      trigger a false positive lockdep warning. Adding lockdep nest levels
      fixes this. Update the level on both enslave and unlink, to avoid the
      following series of events ..
      
          ip netns add test
          ip netns exec test bash
          ip link set dev lo addr 00:11:22:33:44:55
          ip link set dev lo down
      
          ip link add dev bond1 type bond
          ip link add dev bond2 type bond
      
          ip link set dev lo master bond1
          ip link set dev bond1 master bond2
      
          ip link set dev bond1 nomaster
          ip link set dev bond2 master bond1
      
      .. from still generating a splat:
      
          [  193.652127] ======================================================
          [  193.658231] WARNING: possible circular locking dependency detected
          [  193.664350] 4.20.0 #8 Not tainted
          [  193.668310] ------------------------------------------------------
          [  193.674417] ip/15577 is trying to acquire lock:
          [  193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
          [  193.687851]
          	       but task is already holding lock:
          [  193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290
      
          [..]
      
          [  193.851092]        lock_acquire+0xa7/0x190
          [  193.855138]        _raw_spin_lock_nested+0x2d/0x40
          [  193.859878]        bond_get_stats+0x58/0x290
          [  193.864093]        dev_get_stats+0x5a/0xc0
          [  193.868140]        bond_get_stats+0x105/0x290
          [  193.872444]        dev_get_stats+0x5a/0xc0
          [  193.876493]        rtnl_fill_stats+0x40/0x130
          [  193.880797]        rtnl_fill_ifinfo+0x6c5/0xdc0
          [  193.885271]        rtmsg_ifinfo_build_skb+0x86/0xe0
          [  193.890091]        rtnetlink_event+0x5b/0xa0
          [  193.894320]        raw_notifier_call_chain+0x43/0x60
          [  193.899225]        netdev_change_features+0x50/0xa0
          [  193.904044]        bond_compute_features.isra.46+0x1ab/0x270
          [  193.909640]        bond_enslave+0x141d/0x15b0
          [  193.913946]        do_set_master+0x89/0xa0
          [  193.918016]        do_setlink+0x37c/0xda0
          [  193.921980]        __rtnl_newlink+0x499/0x890
          [  193.926281]        rtnl_newlink+0x48/0x70
          [  193.930238]        rtnetlink_rcv_msg+0x171/0x4b0
          [  193.934801]        netlink_rcv_skb+0xd1/0x110
          [  193.939103]        rtnetlink_rcv+0x15/0x20
          [  193.943151]        netlink_unicast+0x3b5/0x520
          [  193.947544]        netlink_sendmsg+0x2fd/0x3f0
          [  193.951942]        sock_sendmsg+0x38/0x50
          [  193.955899]        ___sys_sendmsg+0x2ba/0x2d0
          [  193.960205]        __x64_sys_sendmsg+0xad/0x100
          [  193.964687]        do_syscall_64+0x5a/0x460
          [  193.968823]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 7e2556e4 ("bonding: avoid lockdep confusion in bond_get_stats()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      001e465f
  18. 14 12月, 2018 3 次提交
  19. 07 12月, 2018 1 次提交
  20. 05 11月, 2018 1 次提交
    • J
      bonding/802.3ad: fix link_failure_count tracking · ea53abfa
      Jarod Wilson 提交于
      Commit 4d2c0cda set slave->link to
      BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values
      were read, to fix a problem with slaves getting into weird states, but
      in the process, broke tracking of link failures, as going straight to
      BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted)
      means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because
      !link_state was already true, we never incremented commit, and never got
      a chance to call bond_miimon_commit(), where slave->link_failure_count
      would be incremented. I believe the simple fix here is to mark the slave
      as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from
      _FAIL to either _UP or _DOWN, and in the latter case, we now get proper
      incrementing of link_failure_count again.
      
      Fixes: 4d2c0cda ("bonding: speed/duplex update at NETDEV_UP event")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea53abfa
  21. 20 10月, 2018 1 次提交
    • D
      netpoll: allow cleanup to be synchronous · c9fbd71f
      Debabrata Banerjee 提交于
      This fixes a problem introduced by:
      commit 2cde6acd ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
      
      When using netconsole on a bond, __netpoll_cleanup can asynchronously
      recurse multiple times, each __netpoll_free_async call can result in
      more __netpoll_free_async's. This means there is now a race between
      cleanup_work queues on multiple netpoll_info's on multiple devices and
      the configuration of a new netpoll. For example if a netconsole is set
      to enable 0, reconfigured, and enable 1 immediately, this netconsole
      will likely not work.
      
      Given the reason for __netpoll_free_async is it can be called when rtnl
      is not locked, if it is locked, we should be able to execute
      synchronously. It appears to be locked everywhere it's called from.
      
      Generalize the design pattern from the teaming driver for current
      callers of __netpoll_free_async.
      
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fbd71f
  22. 03 10月, 2018 1 次提交
  23. 27 9月, 2018 2 次提交
    • M
      bonding: avoid possible dead-lock · d4859d74
      Mahesh Bandewar 提交于
      Syzkaller reported this on a slightly older kernel but it's still
      applicable to the current kernel -
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-next-20180823+ #46 Not tainted
      ------------------------------------------------------
      syz-executor4/26841 is trying to acquire lock:
      00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652
      
      but task is already holding lock:
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (rtnl_mutex){+.+.}:
             __mutex_lock_common kernel/locking/mutex.c:925 [inline]
             __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
             rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
             bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
             bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
             process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
             process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #0 ((wq_completion)bond_dev->name){+.+.}:
             lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
             flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
             drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
             destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
             __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
             bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
             register_netdevice+0x337/0x1100 net/core/dev.c:8410
             bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
             rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
             rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
             netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
             netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
             netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
             netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
             sock_sendmsg_nosec net/socket.c:622 [inline]
             sock_sendmsg+0xd5/0x120 net/socket.c:632
             ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
             __sys_sendmsg+0x11d/0x290 net/socket.c:2153
             __do_sys_sendmsg net/socket.c:2162 [inline]
             __se_sys_sendmsg net/socket.c:2160 [inline]
             __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
             do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock((work_completion)(&(&nnw->work)->work));
                                     lock(rtnl_mutex);
        lock((wq_completion)bond_dev->name);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor4/26841:
      
      stack backtrace:
      CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
       check_prev_add kernel/locking/lockdep.c:1862 [inline]
       check_prevs_add kernel/locking/lockdep.c:1975 [inline]
       validate_chain kernel/locking/lockdep.c:2416 [inline]
       __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457089
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4859d74
    • M
      bonding: pass link-local packets to bonding master also. · 6a9e461f
      Mahesh Bandewar 提交于
      Commit b89f04c6 ("bonding: deliver link-local packets with
      skb->dev set to link that packets arrived on") changed the behavior
      of how link-local-multicast packets are processed. The change in
      the behavior broke some legacy use cases where these packets are
      expected to arrive on bonding master device also.
      
      This patch passes the packet to the stack with the link it arrived
      on as well as passes to the bonding-master device to preserve the
      legacy use case.
      
      Fixes: b89f04c6 ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")
      Reported-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a9e461f
  24. 24 9月, 2018 1 次提交
  25. 02 8月, 2018 1 次提交
    • E
      bonding: avoid lockdep confusion in bond_get_stats() · 7e2556e4
      Eric Dumazet 提交于
      syzbot found that the following sequence produces a LOCKDEP splat [1]
      
      ip link add bond10 type bond
      ip link add bond11 type bond
      ip link set bond11 master bond10
      
      To fix this, we can use the already provided nest_level.
      
      This patch also provides correct nesting for dev->addr_list_lock
      
      [1]
      WARNING: possible recursive locking detected
      4.18.0-rc6+ #167 Not tainted
      --------------------------------------------
      syz-executor751/4439 is trying to acquire lock:
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
      
      but task is already holding lock:
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&bond->stats_lock)->rlock);
        lock(&(&bond->stats_lock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by syz-executor751/4439:
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
       #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
       #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
       #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215
      
      stack backtrace:
      CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
       check_deadlock kernel/locking/lockdep.c:1809 [inline]
       validate_chain kernel/locking/lockdep.c:2405 [inline]
       __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
       lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
       spin_lock include/linux/spinlock.h:310 [inline]
       bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
       dev_get_stats+0x10f/0x470 net/core/dev.c:8316
       bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432
       dev_get_stats+0x10f/0x470 net/core/dev.c:8316
       rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169
       rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611
       rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268
       rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300
       rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline]
       rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716
       notifier_call_chain+0x180/0x390 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
       call_netdevice_notifiers net/core/dev.c:1753 [inline]
       netdev_features_change net/core/dev.c:1321 [inline]
       netdev_change_features+0xb3/0x110 net/core/dev.c:7759
       bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120
       bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755
       bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528
       dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327
       dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493
       sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992
       sock_ioctl+0x30d/0x680 net/socket.c:1093
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440859
      Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859
      RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390
      R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2556e4
  26. 10 7月, 2018 1 次提交
  27. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  28. 25 5月, 2018 1 次提交
  29. 24 5月, 2018 1 次提交
    • W
      gso: limit udp gso to egress-only virtual devices · 8eea1ca8
      Willem de Bruijn 提交于
      Until the udp receive stack supports large packets (UDP GRO), GSO
      packets must not loop from the egress to the ingress path.
      
      Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
      devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
      may loop packets, such as veth and macvlan.
      
      Instead add it to specific devices that forward to another device's
      egress path, bonding and team.
      
      Fixes: 83aa025f ("udp: add gso support to virtual devices")
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eea1ca8
  30. 18 5月, 2018 1 次提交
  31. 17 5月, 2018 2 次提交
新手
引导
客服 返回
顶部