1. 07 7月, 2021 3 次提交
    • T
      bonding: fix suspicious RCU usage in bond_ipsec_del_sa() · a22c39b8
      Taehee Yoo 提交于
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
          ip x s f
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
      usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ip/705:
       #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
       #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
      at: xfrm_state_delete+0x16/0x30
      
      stack backtrace:
      CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
       __xfrm_state_delete+0x51f/0x730
       xfrm_state_delete+0x1e/0x30
       xfrm_state_flush+0x22f/0x390
       xfrm_flush_sa+0xd8/0x260 [xfrm_user]
       ? xfrm_flush_policy+0x290/0x290 [xfrm_user]
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a22c39b8
    • T
      bonding: fix null dereference in bond_ipsec_add_sa() · 105cd17a
      Taehee Yoo 提交于
      If bond doesn't have real device, bond->curr_active_slave is null.
      But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
      null checking.
      So, null-ptr-deref would occur.
      
      Test commands:
          ip link add bond0 type bond
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
      0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding]
      Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14
      01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02
      00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48
      RSP: 0018:ffff88810946f508 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100
      RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11
      R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000
      R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330
      FS:  00007efc5552e680(0000) GS:ffff888119c00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ...]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      105cd17a
    • T
      bonding: fix suspicious RCU usage in bond_ipsec_add_sa() · b648eba4
      Taehee Yoo 提交于
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add bond0 type bond
          ip link set dummy0 master bond0
          ip link set dummy0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
      	    mode transport \
      	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
      	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
      	    dev bond0 dir in
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/684:
       #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
         55.191733][  T684] stack backtrace:
      CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b648eba4
  2. 24 6月, 2021 1 次提交
  3. 23 6月, 2021 1 次提交
    • D
      bonding: avoid adding slave device with IFF_MASTER flag · 3c9ef511
      Di Zhu 提交于
      The following steps will definitely cause the kernel to crash:
      	ip link add vrf1 type vrf table 1
      	modprobe bonding.ko max_bonds=1
      	echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
      	rmmod bonding
      
      The root cause is that: When the VRF is added to the slave device,
      it will fail, and some cleaning work will be done. because VRF device
      has IFF_MASTER flag, cleanup process  will not clear the IFF_BONDING flag.
      Then, when we unload the bonding module, unregister_netdevice_notifier()
      will treat the VRF device as a bond master device and treat netdev_priv()
      as struct bonding{} which actually is struct net_vrf{}.
      
      By analyzing the processing logic of bond_enslave(), it seems that
      it is not allowed to add the slave device with the IFF_MASTER flag, so
      we need to add a code check for this situation.
      Signed-off-by: NDi Zhu <zhudi21@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c9ef511
  4. 16 6月, 2021 1 次提交
    • J
      net: bonding: Use per-cpu rr_tx_counter · 848ca918
      Jussi Maki 提交于
      The round-robin rr_tx_counter was shared across CPUs leading to
      significant cache thrashing at high packet rates. This patch switches
      the round-robin packet counter to use a per-cpu variable to decide
      the destination slave.
      
      On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
      (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
      this patch.
      
      "perf top -e cache_misses" before:
          12.31%  [bonding]       [k] bond_xmit_roundrobin_slave_get
          10.59%  [sch_fq_codel]  [k] fq_codel_dequeue
           9.34%  [kernel]        [k] skb_release_data
      after:
          15.42%  [sch_fq_codel]  [k] fq_codel_dequeue
          10.06%  [kernel]        [k] __memset
           9.12%  [kernel]        [k] skb_release_data
      Signed-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      848ca918
  5. 04 6月, 2021 1 次提交
  6. 21 5月, 2021 2 次提交
  7. 18 5月, 2021 1 次提交
  8. 22 4月, 2021 1 次提交
    • J
      bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine · 83d686a6
      jinyiting 提交于
      The bond works in mode 4, and performs down/up operations on the bond
      that is normally negotiated. The probability of bond-> slave_arr is NULL
      
      Test commands:
         ifconfig bond1 down
         ifconfig bond1 up
      
      The conflict occurs in the following process:
      
      __dev_open (CPU A)
      --bond_open
        --queue_delayed_work(bond->wq,&bond->ad_work,0);
        --bond_update_slave_arr
          --bond_3ad_get_active_agg_info
      
      ad_work(CPU B)
      --bond_3ad_state_machine_handler
        --ad_agg_selection_logic
      
      ad_work runs on cpu B. In the function ad_agg_selection_logic, all
      agg->is_active will be cleared. Before the new active aggregator is
      selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
      bond->slave_arr will be set to NULL. The best aggregator in
      ad_agg_selection_logic has not changed, no need to update slave arr.
      
      The conflict occurred in that ad_agg_selection_logic clears
      agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr
      is inspecting agg->is_active outside the lock.
      
      Also, bond_update_slave_arr is normal for potential sleep when
      allocating memory, so replace the WARN_ON with a call to might_sleep.
      Signed-off-by: Njinyiting <jinyiting@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83d686a6
  9. 30 3月, 2021 1 次提交
  10. 13 3月, 2021 1 次提交
  11. 09 3月, 2021 1 次提交
  12. 20 1月, 2021 1 次提交
    • J
      bonding: add a vlan+srcmac tx hashing option · 7b8fc010
      Jarod Wilson 提交于
      This comes from an end-user request, where they're running multiple VMs on
      hosts with bonded interfaces connected to some interest switch topologies,
      where 802.3ad isn't an option. They're currently running a proprietary
      solution that effectively achieves load-balancing of VMs and bandwidth
      utilization improvements with a similar form of transmission algorithm.
      
      Basically, each VM has it's own vlan, so it always sends its traffic out
      the same interface, unless that interface fails. Traffic gets split
      between the interfaces, maintaining a consistent path, with failover still
      available if an interface goes down.
      
      Unlike bond_eth_hash(), this hash function is using the full source MAC
      address instead of just the last byte, as there are so few components to
      the hash, and in the no-vlan case, we would be returning just the last
      byte of the source MAC as the hash value. It's entirely possible to have
      two NICs in a bond with the same last byte of their MAC, but not the same
      MAC, so this adjustment should guarantee distinct hashes in all cases.
      
      This has been rudimetarily tested to provide similar results to the
      proprietary solution it is aiming to replace. A patch for iproute2 is also
      posted, to properly support the new mode there as well.
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Thomas Davis <tadavis@lbl.gov>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7b8fc010
  13. 19 1月, 2021 3 次提交
    • T
      net/bonding: Implement TLS TX device offload · 89df6a81
      Tariq Toukan 提交于
      Implement TLS TX device offload for bonding interfaces.
      This allows kTLS sockets running on a bond to benefit from the
      device offload on capable lower devices.
      
      To allow a simple and fast maintenance of the TLS context in SW and
      lower devices, we bind the TLS socket to a specific lower dev.
      To achieve a behavior similar to SW kTLS, we support only balance-xor
      and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced
      in bond_sk_check(), done in a previous patch.
      
      For the above configuration, the SW implementation keeps picking the
      same exact lower dev for all the socket's SKBs. The device offload
      behaves similarly, making the decision once at the connection creation.
      
      Per socket, the TLS module should work directly with the lowest netdev
      in chain, to call the tls_dev_ops operations.
      
      As the bond interface is being bypassed by the TLS module, interacting
      directly against the lower devs, there is no way for the bond interface
      to disable its device offload capabilities, as long as the mode/policy
      config allows it.
      Hence, the feature flag is not directly controllable, but just reflects
      the current offload status based on the logic under bond_sk_check().
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      89df6a81
    • T
      net/bonding: Implement ndo_sk_get_lower_dev · 007feb87
      Tariq Toukan 提交于
      Add ndo_sk_get_lower_dev() implementation for bond interfaces.
      
      Support only for the cases where the socket's and SKBs' hash
      yields identical value for the whole connection lifetime.
      
      Here we restrict it to L3+4 sockets only, with
      xmit_hash_policy==LAYER34 and bond modes xor/802.3ad.
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      007feb87
    • T
      net/bonding: Take IP hash logic into a helper · 5b998545
      Tariq Toukan 提交于
      Hash logic on L3 will be used in a downstream patch for one more use
      case.
      Take it to a function for a better code reuse.
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5b998545
  14. 15 1月, 2021 1 次提交
  15. 08 12月, 2020 1 次提交
  16. 22 11月, 2020 1 次提交
    • J
      bonding: wait for sysfs kobject destruction before freeing struct slave · b9ad3e9f
      Jamie Iles 提交于
      syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a
      struct slave device could result in the following splat:
      
        kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000)
        bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
        ------------[ cut here ]------------
        ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline]
        ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600
        WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S                5.9.0-rc8+ #96
        Hardware name: linux,dummy-virt (DT)
        Workqueue: netns cleanup_net
        Call trace:
         dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239
         show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x174/0x1f8 lib/dump_stack.c:118
         panic+0x360/0x7a0 kernel/panic.c:231
         __warn+0x244/0x2ec kernel/panic.c:600
         report_bug+0x240/0x398 lib/bug.c:198
         bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974
         call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322
         brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329
         do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864
         el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65
         el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93
         el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594
         debug_print_object+0x180/0x240 lib/debugobjects.c:485
         __debug_check_no_obj_freed lib/debugobjects.c:967 [inline]
         debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998
         slab_free_hook mm/slub.c:1536 [inline]
         slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577
         slab_free mm/slub.c:3138 [inline]
         kfree+0x13c/0x460 mm/slub.c:4119
         bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492
         __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190
         bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline]
         bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420
         notifier_call_chain+0xf0/0x200 kernel/notifier.c:83
         __raw_notifier_call_chain kernel/notifier.c:361 [inline]
         raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368
         call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033
         call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
         call_netdevice_notifiers net/core/dev.c:2059 [inline]
         rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347
         unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509
         unregister_netdevice_many net/core/dev.c:10508 [inline]
         default_device_exit_batch+0x294/0x338 net/core/dev.c:10992
         ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189
         cleanup_net+0x44c/0x888 net/core/net_namespace.c:603
         process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269
         worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415
         kthread+0x390/0x498 kernel/kthread.c:292
         ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925
      
      This is a potential use-after-free if the sysfs nodes are being accessed
      whilst removing the struct slave, so wait for the object destruction to
      complete before freeing the struct slave itself.
      
      Fixes: 07699f9a ("bonding: add sysfs /slave dir for bond slave devices.")
      Fixes: a068aab4 ("bonding: Fix reference count leak in bond_sysfs_slave_add.")
      Cc: Qiushi Wu <wu000273@umn.edu>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NJamie Iles <jamie@nuviainc.com>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b9ad3e9f
  17. 04 11月, 2020 1 次提交
  18. 29 9月, 2020 1 次提交
  19. 26 9月, 2020 1 次提交
    • E
      bonding: set dev->needed_headroom in bond_setup_by_slave() · f32f1933
      Eric Dumazet 提交于
      syzbot managed to crash a host by creating a bond
      with a GRE device.
      
      For non Ethernet device, bonding calls bond_setup_by_slave()
      instead of ether_setup(), and unfortunately dev->needed_headroom
      was not copied from the new added member.
      
      [  171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0
      [  171.243111] ------------[ cut here ]------------
      [  171.243112] kernel BUG at net/core/skbuff.c:112!
      [  171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  171.243469] gsmi: Log Shutdown Reason 0x03
      [  171.243505] Call Trace:
      [  171.243506]  <IRQ>
      [  171.243512]  [<ffffffffa171be59>] skb_push+0x49/0x50
      [  171.243516]  [<ffffffffa184b9ea>] ipgre_header+0x2a/0xf0
      [  171.243520]  [<ffffffffa17452d7>] neigh_connected_output+0xb7/0x100
      [  171.243524]  [<ffffffffa186f1d3>] ip6_finish_output2+0x383/0x490
      [  171.243528]  [<ffffffffa186ede2>] __ip6_finish_output+0xa2/0x110
      [  171.243531]  [<ffffffffa186acbc>] ip6_finish_output+0x2c/0xa0
      [  171.243534]  [<ffffffffa186abe9>] ip6_output+0x69/0x110
      [  171.243537]  [<ffffffffa186ac90>] ? ip6_output+0x110/0x110
      [  171.243541]  [<ffffffffa189d952>] mld_sendpack+0x1b2/0x2d0
      [  171.243544]  [<ffffffffa189d290>] ? mld_send_report+0xf0/0xf0
      [  171.243548]  [<ffffffffa189c797>] mld_ifc_timer_expire+0x2d7/0x3b0
      [  171.243551]  [<ffffffffa189c4c0>] ? mld_gq_timer_expire+0x50/0x50
      [  171.243556]  [<ffffffffa0fea270>] call_timer_fn+0x30/0x130
      [  171.243559]  [<ffffffffa0fea17c>] expire_timers+0x4c/0x110
      [  171.243563]  [<ffffffffa0fea0e3>] __run_timers+0x213/0x260
      [  171.243566]  [<ffffffffa0fecb7d>] ? ktime_get+0x3d/0xa0
      [  171.243570]  [<ffffffffa0ff9c4e>] ? clockevents_program_event+0x7e/0xe0
      [  171.243574]  [<ffffffffa0f7e5d5>] ? sched_clock_cpu+0x15/0x190
      [  171.243577]  [<ffffffffa0fe973d>] run_timer_softirq+0x1d/0x40
      [  171.243581]  [<ffffffffa1c00152>] __do_softirq+0x152/0x2f0
      [  171.243585]  [<ffffffffa0f44e1f>] irq_exit+0x9f/0xb0
      [  171.243588]  [<ffffffffa1a02e1d>] smp_apic_timer_interrupt+0xfd/0x1a0
      [  171.243591]  [<ffffffffa1a01ea6>] apic_timer_interrupt+0x86/0x90
      
      Fixes: f5184d26 ("net: Allow netdevices to specify needed head/tailroom")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f32f1933
  20. 24 8月, 2020 1 次提交
  21. 19 8月, 2020 1 次提交
    • J
      bonding: fix active-backup failover for current ARP slave · 0410d071
      Jiri Wiesner 提交于
      When the ARP monitor is used for link detection, ARP replies are
      validated for all slaves (arp_validate=3) and fail_over_mac is set to
      active, two slaves of an active-backup bond may get stuck in a state
      where both of them are active and pass packets that they receive to
      the bond. This state makes IPv6 duplicate address detection fail. The
      state is reached thus:
      1. The current active slave goes down because the ARP target
         is not reachable.
      2. The current ARP slave is chosen and made active.
      3. A new slave is enslaved. This new slave becomes the current active
         slave and can reach the ARP target.
      As a result, the current ARP slave stays active after the enslave
      action has finished and the log is littered with "PROBE BAD" messages:
      > bond0: PROBE: c_arp ens10 && cas ens11 BAD
      The workaround is to remove the slave with "going back" status from
      the bond and re-enslave it. This issue was encountered when DPDK PMD
      interfaces were being enslaved to an active-backup bond.
      
      I would be possible to fix the issue in bond_enslave() or
      bond_change_active_slave() but the ARP monitor was fixed instead to
      keep most of the actions changing the current ARP slave in the ARP
      monitor code. The current ARP slave is set as inactive and backup
      during the commit phase. A new state, BOND_LINK_FAIL, has been
      introduced for slaves in the context of the ARP monitor. This allows
      administrators to see how slaves are rotated for sending ARP requests
      and attempts are made to find a new active slave.
      
      Fixes: b2220cad ("bonding: refactor ARP active-backup monitor")
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0410d071
  22. 17 8月, 2020 1 次提交
    • C
      bonding: fix a potential double-unregister · 83270702
      Cong Wang 提交于
      When we tear down a network namespace, we unregister all
      the netdevices within it. So we may queue a slave device
      and a bonding device together in the same unregister queue.
      
      If the only slave device is non-ethernet, it would
      automatically unregister the bonding device as well. Thus,
      we may end up unregistering the bonding device twice.
      
      Workaround this special case by checking reg_state.
      
      Fixes: 9b5e383c ("net: Introduce unregister_netdevice_many()")
      Reported-by: syzbot+af23e7f3e0a7e10c8b67@syzkaller.appspotmail.com
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83270702
  23. 15 8月, 2020 2 次提交
    • L
      net: bonding: bond_main: Document 'proto' and rename 'new_active' parameters · 45a1553b
      Lee Jones 提交于
      Fixes the following W=1 kernel build warning(s):
      
       drivers/net/bonding/bond_main.c:329: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_add_vid'
       drivers/net/bonding/bond_main.c:362: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_kill_vid'
       drivers/net/bonding/bond_main.c:964: warning: Function parameter or member 'new_active' not described in 'bond_change_active_slave'
       drivers/net/bonding/bond_main.c:964: warning: Excess function parameter 'new' description in 'bond_change_active_slave'
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Thomas Davis <tadavis@lbl.gov>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45a1553b
    • J
      bonding: show saner speed for broadcast mode · 4ca0d9ac
      Jarod Wilson 提交于
      Broadcast mode bonds transmit a copy of all traffic simultaneously out of
      all interfaces, so the "speed" of the bond isn't really the aggregate of
      all interfaces, but rather, the speed of the slowest active interface.
      
      Also, the type of the speed field is u32, not unsigned long, so adjust
      that accordingly, as required to make min() function here without
      complaining about mismatching types.
      
      Fixes: bb5b052f ("bond: add support to read speed and duplex via ethtool")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ca0d9ac
  24. 20 7月, 2020 1 次提交
    • T
      bonding: check error value of register_netdevice() immediately · 544f287b
      Taehee Yoo 提交于
      If register_netdevice() is failed, net_device should not be used
      because variables are uninitialized or freed.
      So, the routine should be stopped immediately.
      But, bond_create() doesn't check return value of register_netdevice()
      immediately. That will result in a panic because of using uninitialized
      or freed memory.
      
      Test commands:
          modprobe netdev-notifier-error-inject
          echo -22 > /sys/kernel/debug/notifier-error-inject/netdev/\
      actions/NETDEV_REGISTER/error
          modprobe bonding max_bonds=3
      
      Splat looks like:
      [  375.028492][  T193] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
      [  375.033207][  T193] CPU: 2 PID: 193 Comm: kworker/2:2 Not tainted 5.8.0-rc4+ #645
      [  375.036068][  T193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  375.039673][  T193] Workqueue: events linkwatch_event
      [  375.041557][  T193] RIP: 0010:dev_activate+0x4a/0x340
      [  375.043381][  T193] Code: 40 a8 04 0f 85 db 00 00 00 8b 83 08 04 00 00 85 c0 0f 84 0d 01 00 00 31 d2 89 d0 48 8d 04 40 48 c1 e0 07 48 03 83 00 04 00 00 <48> 8b 48 10 f6 41 10 01 75 08 f0 80 a1 a0 01 00 00 fd 48 89 48 08
      [  375.050267][  T193] RSP: 0018:ffff9f8facfcfdd8 EFLAGS: 00010202
      [  375.052410][  T193] RAX: 6b6b6b6b6b6b6b6b RBX: ffff9f8fae6ea000 RCX: 0000000000000006
      [  375.055178][  T193] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f8fae6ea000
      [  375.057762][  T193] RBP: ffff9f8fae6ea000 R08: 0000000000000000 R09: 0000000000000000
      [  375.059810][  T193] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9f8facfcfe08
      [  375.061892][  T193] R13: ffffffff883587e0 R14: 0000000000000000 R15: ffff9f8fae6ea580
      [  375.063931][  T193] FS:  0000000000000000(0000) GS:ffff9f8fbae00000(0000) knlGS:0000000000000000
      [  375.066239][  T193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  375.067841][  T193] CR2: 00007f2f542167a0 CR3: 000000012cee6002 CR4: 00000000003606e0
      [  375.069657][  T193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  375.071471][  T193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  375.073269][  T193] Call Trace:
      [  375.074005][  T193]  linkwatch_do_dev+0x4d/0x50
      [  375.075052][  T193]  __linkwatch_run_queue+0x10b/0x200
      [  375.076244][  T193]  linkwatch_event+0x21/0x30
      [  375.077274][  T193]  process_one_work+0x252/0x600
      [  375.078379][  T193]  ? process_one_work+0x600/0x600
      [  375.079518][  T193]  worker_thread+0x3c/0x380
      [  375.080534][  T193]  ? process_one_work+0x600/0x600
      [  375.081668][  T193]  kthread+0x139/0x150
      [  375.082567][  T193]  ? kthread_park+0x90/0x90
      [  375.083567][  T193]  ret_from_fork+0x22/0x30
      
      Fixes: e826eafa ("bonding: Call netif_carrier_off after register_netdevice")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      544f287b
  25. 09 7月, 2020 2 次提交
    • J
      bonding: don't need RTNL for ipsec helpers · f548a476
      Jarod Wilson 提交于
      The bond_ipsec_* helpers don't need RTNL, and can potentially get called
      without it being held, so switch from rtnl_dereference() to
      rcu_dereference() to access bond struct data.
      
      Lightly tested with xfrm bonding, no problems found, should address the
      syzkaller bug referenced below.
      
      Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com
      CC: Huy Nguyen <huyn@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f548a476
    • J
      bonding: deal with xfrm state in all modes and add more error-checking · 5cd24cbe
      Jarod Wilson 提交于
      It's possible that device removal happens when the bond is in non-AB mode,
      and addition happens in AB mode, so bond_ipsec_del_sa() never gets called,
      which leaves security associations in an odd state if bond_ipsec_add_sa()
      then gets called after switching the bond into AB. Just call add and
      delete universally for all modes to keep things consistent.
      
      However, it's also possible that this code gets called when the system is
      shutting down, and the xfrm subsystem has already been disconnected from
      the bond device, so we need to do some error-checking and bail, lest we
      hit a null ptr deref.
      
      Fixes: a3b658cf ("bonding: allow xfrm offload setup post-module-load")
      CC: Huy Nguyen <huyn@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cd24cbe
  26. 02 7月, 2020 1 次提交
    • J
      bonding: allow xfrm offload setup post-module-load · a3b658cf
      Jarod Wilson 提交于
      At the moment, bonding xfrm crypto offload can only be set up if the bonding
      module is loaded with active-backup mode already set. We need to be able to
      make this work with bonds set to AB after the bonding driver has already
      been loaded.
      
      So what's done here is:
      
      1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used
      by both bond_main.c and bond_options.c
      2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than
      only when loading in AB mode
      3) wire up xfrmdev_ops universally too
      4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB
      5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a
      performance hit from traversing into the underlying drivers
      5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call
      netdev_change_features() from bond_option_mode_set()
      
      In my local testing, I can change bonding modes back and forth on the fly,
      have hardware offload work when I'm in AB, and see no performance penalty
      to non-AB software encryption, despite having xfrm bits all wired up for
      all modes now.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Reported-by: NHuy Nguyen <huyn@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3b658cf
  27. 27 6月, 2020 1 次提交
  28. 24 6月, 2020 1 次提交
    • J
      bonding/xfrm: use real_dev instead of slave_dev · bdfd2d1f
      Jarod Wilson 提交于
      Rather than requiring every hw crypto capable NIC driver to do a check for
      slave_dev being set, set real_dev in the xfrm layer and xso init time, and
      then override it in the bonding driver as needed. Then NIC drivers can
      always use real_dev, and at the same time, we eliminate the use of a
      variable name that probably shouldn't have been used in the first place,
      particularly given recent current events.
      
      CC: Boris Pismenny <borisp@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Leon Romanovsky <leon@kernel.org>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      Suggested-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdfd2d1f
  29. 23 6月, 2020 1 次提交
    • J
      bonding: support hardware encryption offload to slaves · 18cb261a
      Jarod Wilson 提交于
      Currently, this support is limited to active-backup mode, as I'm not sure
      about the feasilibity of mapping an xfrm_state's offload handle to
      multiple hardware devices simultaneously, and we rely on being able to
      pass some hints to both the xfrm and NIC driver about whether or not
      they're operating on a slave device.
      
      I've tested this atop an Intel x520 device (ixgbe) using libreswan in
      transport mode, succesfully achieving ~4.3Gbps throughput with netperf
      (more or less identical to throughput on a bare NIC in this system),
      as well as successful failover and recovery mid-netperf.
      
      v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18cb261a
  30. 10 6月, 2020 1 次提交
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
  31. 08 5月, 2020 1 次提交
    • E
      bonding: propagate transmit status · ae46f184
      Eric Dumazet 提交于
      Currently, bonding always returns NETDEV_TX_OK to its caller.
      
      It is worth trying to be more accurate : TCP for instance
      can have different recovery strategies if it can have more
      precise status, if packet was dropped by slave qdisc.
      
      This is especially important when host is under stress.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae46f184
  32. 05 5月, 2020 2 次提交