1. 04 6月, 2019 1 次提交
    • J
      bonding/802.3ad: fix slave link initialization transition states · 3cde0a25
      Jarod Wilson 提交于
      [ Upstream commit 334031219a84b9994594015aab85ed7754c80176 ]
      
      Once in a while, with just the right timing, 802.3ad slaves will fail to
      properly initialize, winding up in a weird state, with a partner system
      mac address of 00:00:00:00:00:00. This started happening after a fix to
      properly track link_failure_count tracking, where an 802.3ad slave that
      reported itself as link up in the miimon code, but wasn't able to get a
      valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
      BOND_LINK_DOWN. That was the proper thing to do for the general "my link
      went down" case, but has created a link initialization race that can put
      the interface in this odd state.
      
      The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
      if the link has never been up (last_link_up == 0), so the link state
      doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
      in this case, it simply hasn't been up yet, and this prevents the
      unnecessary state change from DOWN to FAIL and getting stuck in an init
      failure w/o a partner mac.
      
      Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Tested-by: NHeesoon Kim <Heesoon.Kim@stratus.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cde0a25
  2. 17 5月, 2019 1 次提交
    • J
      bonding: fix arp_validate toggling in active-backup mode · 9c2cda31
      Jarod Wilson 提交于
      [ Upstream commit a9b8a2b39ce65df45687cf9ef648885c2a99fe75 ]
      
      There's currently a problem with toggling arp_validate on and off with an
      active-backup bond. At the moment, you can start up a bond, like so:
      
      modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
      ip link set bond0 down
      echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
      echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
      ip link set bond0 up
      ip addr add 192.168.1.2/24 dev bond0
      
      Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
      
      echo 1 > /sys/class/net/bond0/bonding/arp_validate
      
      Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
      arp_validate off again, the link falls flat on it's face:
      
      echo 0 > /sys/class/net/bond0/bonding/arp_validate
      dmesg
      ...
      [133191.911987] bond0: Setting arp_validate to none (0)
      [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
      [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
      [133194.259000] bond0: making interface ens4f1 the new active one
      [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
      [133197.331191] bond0: now running without any active interface!
      
      The problem lies in bond_options.c, where passing in arp_validate=0
      results in bond->recv_probe getting set to NULL. This flies directly in
      the face of commit 3fe68df9, which says we need to set recv_probe =
      bond_arp_recv, even if we're not using arp_validate. Said commit fixed
      this in bond_option_arp_interval_set, but missed that we can get to that
      same state in bond_option_arp_validate_set as well.
      
      One solution would be to universally set recv_probe = bond_arp_recv here
      as well, but I don't think bond_option_arp_validate_set has any business
      touching recv_probe at all, and that should be left to the arp_interval
      code, so we can just make things much tidier here.
      
      Fixes: 3fe68df9 ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c2cda31
  3. 08 5月, 2019 1 次提交
  4. 27 4月, 2019 1 次提交
  5. 19 3月, 2019 1 次提交
    • M
      bonding: fix PACKET_ORIGDEV regression · 795cb33c
      Michal Soltys 提交于
      [ Upstream commit 3c963a3306eada999be5ebf4f293dfa3d3945487 ]
      
      This patch fixes a subtle PACKET_ORIGDEV regression which was a side
      effect of fixes introduced by:
      
      6a9e461f bonding: pass link-local packets to bonding master also.
      
      ... to:
      
      b89f04c6 bonding: deliver link-local packets with skb->dev set to link that packets arrived on
      
      While 6a9e461f restored pre-b89f04c6 presence of link-local
      packets on bonding masters (which is required e.g. by linux bridges
      participating in spanning tree or needed for lab-like setups created
      with group_fwd_mask) it also caused the originating device
      information to be lost due to cloning.
      
      Maciej Żenczykowski proposed another solution that doesn't require
      packet cloning and retains original device information - instead of
      returning RX_HANDLER_PASS for all link-local packets it's now limited
      only to packets from inactive slaves.
      
      At the same time, packets passed to bonding masters retain correct
      information about the originating device and PACKET_ORIGDEV can be used
      to determine it.
      
      This elegantly solves all issues so far:
      
      - link-local packets that were removed from bonding masters
      - LLDP daemons being forced to explicitly bind to slave interfaces
      - PACKET_ORIGDEV having no effect on bond interfaces
      
      Fixes: 6a9e461f (bonding: pass link-local packets to bonding master also.)
      Reported-by: NVincent Bernat <vincent@bernat.ch>
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      795cb33c
  6. 23 1月, 2019 1 次提交
    • W
      bonding: update nest level on unlink · d2898aae
      Willem de Bruijn 提交于
      [ Upstream commit 001e465f09a18857443489a57e74314a3368c805 ]
      
      A network device stack with multiple layers of bonding devices can
      trigger a false positive lockdep warning. Adding lockdep nest levels
      fixes this. Update the level on both enslave and unlink, to avoid the
      following series of events ..
      
          ip netns add test
          ip netns exec test bash
          ip link set dev lo addr 00:11:22:33:44:55
          ip link set dev lo down
      
          ip link add dev bond1 type bond
          ip link add dev bond2 type bond
      
          ip link set dev lo master bond1
          ip link set dev bond1 master bond2
      
          ip link set dev bond1 nomaster
          ip link set dev bond2 master bond1
      
      .. from still generating a splat:
      
          [  193.652127] ======================================================
          [  193.658231] WARNING: possible circular locking dependency detected
          [  193.664350] 4.20.0 #8 Not tainted
          [  193.668310] ------------------------------------------------------
          [  193.674417] ip/15577 is trying to acquire lock:
          [  193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
          [  193.687851]
          	       but task is already holding lock:
          [  193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290
      
          [..]
      
          [  193.851092]        lock_acquire+0xa7/0x190
          [  193.855138]        _raw_spin_lock_nested+0x2d/0x40
          [  193.859878]        bond_get_stats+0x58/0x290
          [  193.864093]        dev_get_stats+0x5a/0xc0
          [  193.868140]        bond_get_stats+0x105/0x290
          [  193.872444]        dev_get_stats+0x5a/0xc0
          [  193.876493]        rtnl_fill_stats+0x40/0x130
          [  193.880797]        rtnl_fill_ifinfo+0x6c5/0xdc0
          [  193.885271]        rtmsg_ifinfo_build_skb+0x86/0xe0
          [  193.890091]        rtnetlink_event+0x5b/0xa0
          [  193.894320]        raw_notifier_call_chain+0x43/0x60
          [  193.899225]        netdev_change_features+0x50/0xa0
          [  193.904044]        bond_compute_features.isra.46+0x1ab/0x270
          [  193.909640]        bond_enslave+0x141d/0x15b0
          [  193.913946]        do_set_master+0x89/0xa0
          [  193.918016]        do_setlink+0x37c/0xda0
          [  193.921980]        __rtnl_newlink+0x499/0x890
          [  193.926281]        rtnl_newlink+0x48/0x70
          [  193.930238]        rtnetlink_rcv_msg+0x171/0x4b0
          [  193.934801]        netlink_rcv_skb+0xd1/0x110
          [  193.939103]        rtnetlink_rcv+0x15/0x20
          [  193.943151]        netlink_unicast+0x3b5/0x520
          [  193.947544]        netlink_sendmsg+0x2fd/0x3f0
          [  193.951942]        sock_sendmsg+0x38/0x50
          [  193.955899]        ___sys_sendmsg+0x2ba/0x2d0
          [  193.960205]        __x64_sys_sendmsg+0xad/0x100
          [  193.964687]        do_syscall_64+0x5a/0x460
          [  193.968823]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 7e2556e4 ("bonding: avoid lockdep confusion in bond_get_stats()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2898aae
  7. 21 12月, 2018 1 次提交
    • T
      bonding: fix 802.3ad state sent to partner when unbinding slave · aa4540d8
      Toni Peltonen 提交于
      [ Upstream commit 3b5b3a33 ]
      
      Previously when unbinding a slave the 802.3ad implementation only told
      partner that the port is not suitable for aggregation by setting the port
      aggregation state from aggregatable to individual. This is not enough. If the
      physical layer still stays up and we only unbinded this port from the bond there
      is nothing in the aggregation status alone to prevent the partner from sending
      traffic towards us. To ensure that the partner doesn't consider this
      port at all anymore we should also disable collecting and distributing to
      signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to
      ensure partner exits collecting + distributing state.
      
      I have tested this behaviour againts Arista EOS switches with mlx5 cards
      (physical link stays up even when interface is down) and simulated
      the same situation virtually Linux <-> Linux with two network namespaces
      running two veth device pairs. In both cases setting aggregation to
      individual doesn't alone prevent traffic from being to sent towards this
      port given that the link stays up in partners end. Partner still keeps
      it's end in collecting + distributing state and continues until timeout is
      reached. In most cases this means we are losing the traffic partner sends
      towards our port while we wait for timeout. This is most visible with slow
      periodic time (LACP rate slow).
      
      Other open source implementations like Open VSwitch and libreswitch, and
      vendor implementations like Arista EOS, seem to disable collecting +
      distributing to when doing similar port disabling/detaching/removing change.
      With this patch kernel implementation would behave the same way and ensure
      partner doesn't consider our actor viable anymore.
      Signed-off-by: NToni Peltonen <peltzi@peltzi.fi>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: NJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      aa4540d8
  8. 21 11月, 2018 1 次提交
    • J
      bonding/802.3ad: fix link_failure_count tracking · 218b6e82
      Jarod Wilson 提交于
      commit ea53abfab960909d622ca37bcfb8e1c5378d21cc upstream.
      
      Commit 4d2c0cda set slave->link to
      BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values
      were read, to fix a problem with slaves getting into weird states, but
      in the process, broke tracking of link failures, as going straight to
      BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted)
      means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because
      !link_state was already true, we never incremented commit, and never got
      a chance to call bond_miimon_commit(), where slave->link_failure_count
      would be incremented. I believe the simple fix here is to mark the slave
      as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from
      _FAIL to either _UP or _DOWN, and in the latter case, we now get proper
      incrementing of link_failure_count again.
      
      Fixes: 4d2c0cda ("bonding: speed/duplex update at NETDEV_UP event")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      218b6e82
  9. 04 11月, 2018 1 次提交
  10. 03 10月, 2018 1 次提交
  11. 27 9月, 2018 2 次提交
    • M
      bonding: avoid possible dead-lock · d4859d74
      Mahesh Bandewar 提交于
      Syzkaller reported this on a slightly older kernel but it's still
      applicable to the current kernel -
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-next-20180823+ #46 Not tainted
      ------------------------------------------------------
      syz-executor4/26841 is trying to acquire lock:
      00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652
      
      but task is already holding lock:
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (rtnl_mutex){+.+.}:
             __mutex_lock_common kernel/locking/mutex.c:925 [inline]
             __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
             rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
             bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
             bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
             process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
             process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #0 ((wq_completion)bond_dev->name){+.+.}:
             lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
             flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
             drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
             destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
             __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
             bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
             register_netdevice+0x337/0x1100 net/core/dev.c:8410
             bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
             rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
             rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
             netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
             netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
             netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
             netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
             sock_sendmsg_nosec net/socket.c:622 [inline]
             sock_sendmsg+0xd5/0x120 net/socket.c:632
             ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
             __sys_sendmsg+0x11d/0x290 net/socket.c:2153
             __do_sys_sendmsg net/socket.c:2162 [inline]
             __se_sys_sendmsg net/socket.c:2160 [inline]
             __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
             do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock((work_completion)(&(&nnw->work)->work));
                                     lock(rtnl_mutex);
        lock((wq_completion)bond_dev->name);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor4/26841:
      
      stack backtrace:
      CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
       check_prev_add kernel/locking/lockdep.c:1862 [inline]
       check_prevs_add kernel/locking/lockdep.c:1975 [inline]
       validate_chain kernel/locking/lockdep.c:2416 [inline]
       __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457089
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4859d74
    • M
      bonding: pass link-local packets to bonding master also. · 6a9e461f
      Mahesh Bandewar 提交于
      Commit b89f04c6 ("bonding: deliver link-local packets with
      skb->dev set to link that packets arrived on") changed the behavior
      of how link-local-multicast packets are processed. The change in
      the behavior broke some legacy use cases where these packets are
      expected to arrive on bonding master device also.
      
      This patch passes the packet to the stack with the link it arrived
      on as well as passes to the bonding-master device to preserve the
      legacy use case.
      
      Fixes: b89f04c6 ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")
      Reported-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a9e461f
  12. 24 9月, 2018 1 次提交
  13. 02 8月, 2018 1 次提交
    • E
      bonding: avoid lockdep confusion in bond_get_stats() · 7e2556e4
      Eric Dumazet 提交于
      syzbot found that the following sequence produces a LOCKDEP splat [1]
      
      ip link add bond10 type bond
      ip link add bond11 type bond
      ip link set bond11 master bond10
      
      To fix this, we can use the already provided nest_level.
      
      This patch also provides correct nesting for dev->addr_list_lock
      
      [1]
      WARNING: possible recursive locking detected
      4.18.0-rc6+ #167 Not tainted
      --------------------------------------------
      syz-executor751/4439 is trying to acquire lock:
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
      
      but task is already holding lock:
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
      (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&bond->stats_lock)->rlock);
        lock(&(&bond->stats_lock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by syz-executor751/4439:
       #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
       #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline]
       #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
       #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215
      
      stack backtrace:
      CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
       check_deadlock kernel/locking/lockdep.c:1809 [inline]
       validate_chain kernel/locking/lockdep.c:2405 [inline]
       __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
       lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
       spin_lock include/linux/spinlock.h:310 [inline]
       bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426
       dev_get_stats+0x10f/0x470 net/core/dev.c:8316
       bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432
       dev_get_stats+0x10f/0x470 net/core/dev.c:8316
       rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169
       rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611
       rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268
       rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300
       rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline]
       rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716
       notifier_call_chain+0x180/0x390 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
       call_netdevice_notifiers net/core/dev.c:1753 [inline]
       netdev_features_change net/core/dev.c:1321 [inline]
       netdev_change_features+0xb3/0x110 net/core/dev.c:7759
       bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120
       bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755
       bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528
       dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327
       dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493
       sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992
       sock_ioctl+0x30d/0x680 net/socket.c:1093
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440859
      Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859
      RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390
      R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2556e4
  14. 23 7月, 2018 1 次提交
  15. 22 7月, 2018 1 次提交
    • J
      bonding: set default miimon value for non-arp modes if not set · c1f897ce
      Jarod Wilson 提交于
      For some time now, if you load the bonding driver and configure bond
      parameters via sysfs using minimal config options, such as specifying
      nothing but the mode, relying on defaults for everything else, modes
      that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
      wind up with both arp_interval=0 (as it should be) and miimon=0, which
      means the miimon monitor thread never actually runs. This is particularly
      problematic for 802.3ad.
      
      For example, from an LNST recipe I've set up:
      
      $ modprobe bonding max_bonds=0"
      $ echo "+t_bond0" > /sys/class/net/bonding_masters"
      $ ip link set t_bond0 down"
      $ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
      $ ip link set ens1f1 down"
      $ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
      $ ip link set ens1f0 down"
      $ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
      $ ethtool -i t_bond0"
      $ ip link set ens1f1 up"
      $ ip link set ens1f0 up"
      $ ip link set t_bond0 up"
      $ ip addr add 192.168.9.1/24 dev t_bond0"
      $ ip addr add 2002::1/64 dev t_bond0"
      
      This bond comes up okay, but things look slightly suspect in
      /proc/net/bonding/t_bond0 output:
      
      $ grep -i mii /proc/net/bonding/t_bond0
      MII Status: up
      MII Polling Interval (ms): 0
      MII Status: up
      MII Status: up
      
      Now, pull a cable on one of the ports in the bond, then reconnect it, and
      you'll see:
      
      Slave Interface: ens1f0
      MII Status: down
      Speed: 1000 Mbps
      Duplex: full
      
      I believe this became a major issue as of commit 4d2c0cda, which for
      802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
      relying on link monitoring via miimon to set it correctly, but since the
      miimon work queue never runs, the link just stays marked down.
      
      If we simply tweak bond_option_mode_set() slightly, we can check for the
      non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
      which gets things back in full working order. This problem exists as far
      back as 4.14, and might be worth fixing in all stable trees since, though
      the work-around is to simply specify an miimon value yourself.
      Reported-by: NBob Ball <ball@umich.edu>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1f897ce
  16. 10 7月, 2018 1 次提交
  17. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  18. 08 6月, 2018 1 次提交
  19. 25 5月, 2018 1 次提交
  20. 24 5月, 2018 1 次提交
    • W
      gso: limit udp gso to egress-only virtual devices · 8eea1ca8
      Willem de Bruijn 提交于
      Until the udp receive stack supports large packets (UDP GRO), GSO
      packets must not loop from the egress to the ingress path.
      
      Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
      devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
      may loop packets, such as veth and macvlan.
      
      Instead add it to specific devices that forward to another device's
      egress path, bonding and team.
      
      Fixes: 83aa025f ("udp: add gso support to virtual devices")
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eea1ca8
  21. 18 5月, 2018 1 次提交
  22. 17 5月, 2018 5 次提交
  23. 16 5月, 2018 1 次提交
  24. 12 5月, 2018 2 次提交
  25. 11 5月, 2018 2 次提交
  26. 23 4月, 2018 1 次提交
    • X
      bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave · ddea788c
      Xin Long 提交于
      After Commit 8a8efa22 ("bonding: sync netpoll code with bridge"), it
      would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
      if bond->dev->npinfo was set.
      
      However now slave_dev npinfo is set with bond->dev->npinfo before calling
      slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
      in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
      It causes that the lower dev of this slave dev can't set its npinfo.
      
      One way to reproduce it:
      
        # modprobe bonding
        # brctl addbr br0
        # brctl addif br0 eth1
        # ifconfig bond0 192.168.122.1/24 up
        # ifenslave bond0 eth2
        # systemctl restart netconsole
        # ifenslave bond0 br0
        # ifconfig eth2 down
        # systemctl restart netconsole
      
      The netpoll won't really work.
      
      This patch is to remove that slave_dev npinfo setting in bond_enslave().
      
      Fixes: 8a8efa22 ("bonding: sync netpoll code with bridge")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddea788c
  27. 28 3月, 2018 1 次提交
  28. 27 3月, 2018 4 次提交
  29. 28 2月, 2018 1 次提交
  30. 21 12月, 2017 1 次提交