1. 13 12月, 2022 1 次提交
  2. 10 11月, 2022 4 次提交
  3. 27 9月, 2022 1 次提交
  4. 26 7月, 2022 1 次提交
  5. 21 6月, 2022 1 次提交
    • Z
      net/ns: put workqueue of cleanup_net sleep for a while when notify. · 758c9c45
      Zhengchao Shao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 186807 https://gitee.com/openeuler/kernel/issues/I5ATLD
      CVE: NA
      
      --------------------------------
      
      When we clean up namespace, we have to notify every netdevice that
      dev is down. If device that registered too many, the notify time will
      take too many CPU time, It will course CPU soft-lockup issue. The
      reprocedure is followed:
      NIFS=50
      for ((i=0; i<$NIFS; i++))
      do
              ip netns add dummy-ns$i
              ip netns exec dummy-ns$i ip link set lo up
      done
      
      for ((j=0; j<$NIFS; j++))
      do
              for ((i=0; i<1000; i++))
              do
                      if=eth$j$i
                      ip netns exec dummy-ns$j ip link add $if type dummy
                      ip netns exec dummy-ns$j ip link set $if up
                      done
      done
      
      for ((i=0; i<$NIFS; i++))
      do
              ip netns del dummy-ns$i
      done
      The test will result in the following stack. So clean up work must
      sleep for a while when notify device down/change.
      
      watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u8:5:288]
      Modules linked in:
      CPU: 0 PID: 288 Comm: kworker/u8:5 Tainted: G    B             5.10.0+ #5
      Hardware name: linux,dummy-virt (DT)
      Workqueue: netns cleanup_net
      pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
      pc : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
      pc : __alloc_skb+0x268/0x450 net/core/skbuff.c:241
      lr : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
      lr : __alloc_skb+0x268/0x450 net/core/skbuff.c:241
      sp : ffff000015607610
      x29: ffff000015607610 x28: 00000000ffffffff
      x27: 0000000000000001 x26: ffff0000cc9400e0
      x25: ffff0000c745c1be x24: 1fffe00002ac0ed0
      x23: 0000000000000000 x22: ffff0000cc9400c0
      x21: ffff0000c745c234 x20: ffff0000cc940000
      x19: ffff0000c745c140 x18: 0000000000000000
      x17: 0000000000000000 x16: 0000000000000000
      x15: 0000000000000000 x14: 1fffe00002ac0f00
      x13: 0000000000000000 x12: ffff80001992801d
      x11: 1fffe0001992801c x10: ffff80001992801c
      x9 : dfffa00000000000 x8 : ffff0000cc9400e3
      x7 : 0000000000000001 x6 : ffff80001992801c
      x5 : ffff0000cc9400e0 x4 : dfffa00000000000
      x3 : ffffa00011529a78 x2 : 0000000000000003
      x1 : 0000000000000000 x0 : ffff0000cc9400e0
      Call trace:
       atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
       __alloc_skb+0x268/0x450 net/core/skbuff.c:241
       alloc_skb include/linux/skbuff.h:1107 [inline]
       nlmsg_new include/net/netlink.h:958 [inline]
       rtmsg_ifa+0xf4/0x1e0 net/ipv4/devinet.c:1900
       __inet_del_ifa+0x328/0x650 net/ipv4/devinet.c:427
       inet_del_ifa net/ipv4/devinet.c:465 [inline]
       inetdev_destroy net/ipv4/devinet.c:318 [inline]
       inetdev_event+0x2ac/0xac0 net/ipv4/devinet.c:1599
       notifier_call_chain kernel/notifier.c:83 [inline]
       raw_notifier_call_chain+0x94/0xd0 kernel/notifier.c:410
       call_netdevice_notifiers_info+0x9c/0x14c net/core/dev.c:2047
       call_netdevice_notifiers_extack net/core/dev.c:2059 [inline]
       call_netdevice_notifiers net/core/dev.c:2073 [inline]
       rollback_registered_many+0x3d0/0x7dc net/core/dev.c:9558
       unregister_netdevice_many+0x40/0x1b0 net/core/dev.c:10779
       default_device_exit_batch+0x24c/0x2a0 net/core/dev.c:11262
       ops_exit_list+0xb4/0xd0 net/core/net_namespace.c:192
       cleanup_net+0x2b8/0x540 net/core/net_namespace.c:608
       process_one_work+0x3ec/0xa40 kernel/workqueue.c:2279
       worker_thread+0x110/0x8b0 kernel/workqueue.c:2425
       kthread+0x1ac/0x1fc kernel/kthread.c:313
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      758c9c45
  6. 31 5月, 2022 1 次提交
  7. 27 4月, 2022 1 次提交
  8. 08 3月, 2022 3 次提交
  9. 14 1月, 2022 1 次提交
    • E
      net: annotate data-races on txq->xmit_lock_owner · d7764824
      Eric Dumazet 提交于
      stable inclusion
      from stable-v5.10.84
      commit fa973bf5fd0fda6f0bf9a5d3d403078824dc27ac
      bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fa973bf5fd0fda6f0bf9a5d3d403078824dc27ac
      
      --------------------------------
      
      commit 7a10d8c8 upstream.
      
      syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
      without annotations.
      
      No serious issue there, let's document what is happening there.
      
      BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
      
      write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
       __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
       __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_hh_output include/net/neighbour.h:511 [inline]
       neigh_output include/net/neighbour.h:525 [inline]
       ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
       __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
       neigh_output include/net/neighbour.h:527 [inline]
       ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
       folio_test_anon include/linux/page-flags.h:581 [inline]
       PageAnon include/linux/page-flags.h:586 [inline]
       zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
       zap_pmd_range mm/memory.c:1467 [inline]
       zap_pud_range mm/memory.c:1496 [inline]
       zap_p4d_range mm/memory.c:1517 [inline]
       unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
       unmap_single_vma+0x157/0x210 mm/memory.c:1583
       unmap_vmas+0xd0/0x180 mm/memory.c:1615
       exit_mmap+0x23d/0x470 mm/mmap.c:3170
       __mmput+0x27/0x1b0 kernel/fork.c:1113
       mmput+0x3d/0x50 kernel/fork.c:1134
       exit_mm+0xdb/0x170 kernel/exit.c:507
       do_exit+0x608/0x17a0 kernel/exit.c:819
       do_group_exit+0xce/0x180 kernel/exit.c:929
       get_signal+0xfc3/0x1550 kernel/signal.c:2852
       arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0xffffffff
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d7764824
  10. 06 12月, 2021 2 次提交
  11. 15 11月, 2021 2 次提交
  12. 15 10月, 2021 4 次提交
  13. 12 10月, 2021 1 次提交
  14. 17 7月, 2021 1 次提交
  15. 14 7月, 2021 1 次提交
  16. 15 6月, 2021 2 次提交
    • Y
      net: sched: fix tx action reschedule issue with stopped queue · fc1bd7af
      Yunsheng Lin 提交于
      stable inclusion
      from stable-5.10.42
      commit f9fc21e2b11eb861a903aec8009dc03d9202933a
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit dcad9ee9 ]
      
      The netdev qeueue might be stopped when byte queue limit has
      reached or tx hw ring is full, net_tx_action() may still be
      rescheduled if STATE_MISSED is set, which consumes unnecessary
      cpu without dequeuing and transmiting any skb because the
      netdev queue is stopped, see qdisc_run_end().
      
      This patch fixes it by checking the netdev queue state before
      calling qdisc_run() and clearing STATE_MISSED if netdev queue is
      stopped during qdisc_run(), the net_tx_action() is rescheduled
      again when netdev qeueue is restarted, see netif_tx_wake_queue().
      
      As there is time window between netif_xmit_frozen_or_stopped()
      checking and STATE_MISSED clearing, between which STATE_MISSED
      may set by net_tx_action() scheduled by netif_tx_wake_queue(),
      so set the STATE_MISSED again if netdev queue is restarted.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fc1bd7af
    • Y
      net: sched: fix tx action rescheduling issue during deactivation · 006771a8
      Yunsheng Lin 提交于
      stable inclusion
      from stable-5.10.42
      commit 2f23d5bcd9f89c239da83abd6270f5f0d9dd95bc
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 102b55ee ]
      
      Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
      qdisc before calling __qdisc_run(), which ultimately clear the
      STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
      is set before clearing STATE_MISSED, there may be rescheduling
      of net_tx_action() at the end of qdisc_run_end(), see below:
      
      CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
                .                   .                     .
                .            set STATE_MISSED             .
                .           __netif_schedule()            .
                .                   .           set STATE_DEACTIVATED
                .                   .                qdisc_reset()
                .                   .                     .
                .<---------------   .              synchronize_net()
      clear __QDISC_STATE_SCHED  |  .                     .
                .                |  .                     .
                .                |  .            some_qdisc_is_busy()
                .                |  .               return *false*
                .                |  .                     .
        test STATE_DEACTIVATED   |  .                     .
      __qdisc_run() *not* called |  .                     .
                .                |  .                     .
         test STATE_MISS         |  .                     .
       __netif_schedule()--------|  .                     .
                .                   .                     .
                .                   .                     .
      
      __qdisc_run() is not called by net_tx_atcion() in CPU0 because
      CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
      STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
      is called at the end of qdisc_run_end(), causing tx action
      rescheduling problem.
      
      qdisc_run() called by net_tx_action() runs in the softirq context,
      which should has the same semantic as the qdisc_run() called by
      __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
      synchronize_net() between STATE_DEACTIVATED flag being set and
      qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
      bail out for the deactived lockless qdisc in net_tx_action(), and
      qdisc_reset() will reset all skb not dequeued yet.
      
      So add the rcu_read_lock() explicitly to protect the qdisc_run()
      and do the STATE_DEACTIVATED checking in net_tx_action() before
      calling qdisc_run_begin(). Another option is to do the checking in
      the qdisc_run_end(), but it will add unnecessary overhead for
      non-tx_action case, because __dev_queue_xmit() will not see qdisc
      with STATE_DEACTIVATED after synchronize_net(), the qdisc with
      STATE_DEACTIVATED can only be seen by net_tx_action() because of
      __netif_schedule().
      
      The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
      between net_tx_action() and qdisc_reset(), see:
      commit d518d2ed ("net/sched: fix race between deactivation
      and dequeue for NOLOCK qdisc"). As the bailout added above for
      deactived lockless qdisc in net_tx_action() provides better
      protection for the race without calling qdisc_run() at all, so
      remove the STATE_DEACTIVATED checking in qdisc_run().
      
      After qdisc_reset(), there is no skb in qdisc to be dequeued, so
      clear the STATE_MISSED in dev_reset_queue() too.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      V8: Clearing STATE_MISSED before calling __netif_schedule() has
          avoid the endless rescheduling problem, but there may still
          be a unnecessary rescheduling, so adjust the commit log.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      006771a8
  17. 03 6月, 2021 1 次提交
  18. 26 4月, 2021 1 次提交
  19. 19 4月, 2021 2 次提交
    • M
      can: dev: Move device back to init netns on owning netns delete · b04a8995
      Martin Willi 提交于
      stable inclusion
      from stable-5.10.27
      commit 8dc08a2962c855f4a88923017445799474ff6446
      bugzilla: 51493
      
      --------------------------------
      
      commit 3a5ca857 upstream.
      
      When a non-initial netns is destroyed, the usual policy is to delete
      all virtual network interfaces contained, but move physical interfaces
      back to the initial netns. This keeps the physical interface visible
      on the system.
      
      CAN devices are somewhat special, as they define rtnl_link_ops even
      if they are physical devices. If a CAN interface is moved into a
      non-initial netns, destroying that netns lets the interface vanish
      instead of moving it back to the initial netns. default_device_exit()
      skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
      
        ip netns add foo
        ip link set can0 netns foo
        ip netns delete foo
      
      WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
      CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
      Workqueue: netns cleanup_net
      [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
      [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
      [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
      [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
      [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
      [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
      [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
      [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
      [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
      [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
      
      To properly restore physical CAN devices to the initial netns on owning
      netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
      For CAN devices setting this flag, default_device_exit() considers them
      non-virtual, applying the usual namespace move.
      
      The issue was introduced in the commit mentioned below, as at that time
      CAN devices did not have a dellink() operation.
      
      Fixes: e008b5fc ("net: Simplfy default_device_exit and improve batching.")
      Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.orgSigned-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b04a8995
    • J
      net: check all name nodes in __dev_alloc_name · a11f1d2a
      Jiri Bohac 提交于
      stable inclusion
      from stable-5.10.27
      commit 943e1583bf8a5cbcedfc4a00d92d8aac9e7e436d
      bugzilla: 51493
      
      --------------------------------
      
      [ Upstream commit 6c015a22 ]
      
      __dev_alloc_name(), when supplied with a name containing '%d',
      will search for the first available device number to generate a
      unique device name.
      
      Since commit ff927412 ("net:
      introduce name_node struct to be used in hashlist") network
      devices may have alternate names.  __dev_alloc_name() does take
      these alternate names into account, possibly generating a name
      that is already taken and failing with -ENFILE as a result.
      
      This demonstrates the bug:
      
          # rmmod dummy 2>/dev/null
          # ip link property add dev lo altname dummy0
          # modprobe dummy numdummies=1
          modprobe: ERROR: could not insert 'dummy': Too many open files in system
      
      Instead of creating a device named dummy1, modprobe fails.
      
      Fix this by checking all the names in the d->name_node list, not just d->name.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a11f1d2a
  20. 09 4月, 2021 1 次提交
    • C
      net: fix dev_ifsioc_locked() race condition · fb49a7bf
      Cong Wang 提交于
      stable inclusion
      from stable-5.10.21
      commit 1fc205d9e400f069ebf30d3faa6ec2bab2cbd7b4
      bugzilla: 50609
      
      --------------------------------
      
      commit 3b23a32a upstream.
      
      dev_ifsioc_locked() is called with only RCU read lock, so when
      there is a parallel writer changing the mac address, it could
      get a partially updated mac address, as shown below:
      
      Thread 1			Thread 2
      // eth_commit_mac_addr_change()
      memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
      				// dev_ifsioc_locked()
      				memcpy(ifr->ifr_hwaddr.sa_data,
      					dev->dev_addr,...);
      
      Close this race condition by guarding them with a RW semaphore,
      like netdev_get_name(). We can not use seqlock here as it does not
      allow blocking. The writers already take RTNL anyway, so this does
      not affect the slow path. To avoid bothering existing
      dev_set_mac_address() callers in drivers, introduce a new wrapper
      just for user-facing callers on ioctl and rtnetlink paths.
      
      Note, bonding also changes slave mac addresses but that requires
      a separate patch due to the complexity of bonding code.
      
      Fixes: 3710becf ("net: RCU locking for simple ioctl()")
      Reported-by: N"Gong, Sishuai" <sishuai@purdue.edu>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fb49a7bf
  21. 09 3月, 2021 2 次提交
  22. 08 2月, 2021 1 次提交
  23. 09 12月, 2020 1 次提交
    • T
      xdp: Remove the xdp_attachment_flags_ok() callback · 998f1729
      Toke Høiland-Jørgensen 提交于
      Since commit 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF
      programs in net_device"), the XDP program attachment info is now maintained
      in the core code. This interacts badly with the xdp_attachment_flags_ok()
      check that prevents unloading an XDP program with different load flags than
      it was loaded with. In practice, two kinds of failures are seen:
      
      - An XDP program loaded without specifying a mode (and which then ends up
        in driver mode) cannot be unloaded if the program mode is specified on
        unload.
      
      - The dev_xdp_uninstall() hook always calls the driver callback with the
        mode set to the type of the program but an empty flags argument, which
        means the flags_ok() check prevents the program from being removed,
        leading to bpf prog reference leaks.
      
      The original reason this check was added was to avoid ambiguity when
      multiple programs were loaded. With the way the checks are done in the core
      now, this is quite simple to enforce in the core code, so let's add a check
      there and get rid of the xdp_attachment_flags_ok() callback entirely.
      
      Fixes: 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
      998f1729
  24. 25 11月, 2020 1 次提交
  25. 25 10月, 2020 1 次提交
    • W
      random32: add noise from network and scheduling activity · 3744741a
      Willy Tarreau 提交于
      With the removal of the interrupt perturbations in previous random32
      change (random32: make prandom_u32() output unpredictable), the PRNG
      has become 100% deterministic again. While SipHash is expected to be
      way more robust against brute force than the previous Tausworthe LFSR,
      there's still the risk that whoever has even one temporary access to
      the PRNG's internal state is able to predict all subsequent draws till
      the next reseed (roughly every minute). This may happen through a side
      channel attack or any data leak.
      
      This patch restores the spirit of commit f227e3ec ("random32: update
      the net random state on interrupt and activity") in that it will perturb
      the internal PRNG's statee using externally collected noise, except that
      it will not pick that noise from the random pool's bits nor upon
      interrupt, but will rather combine a few elements along the Tx path
      that are collectively hard to predict, such as dev, skb and txq
      pointers, packet length and jiffies values. These ones are combined
      using a single round of SipHash into a single long variable that is
      mixed with the net_rand_state upon each invocation.
      
      The operation was inlined because it produces very small and efficient
      code, typically 3 xor, 2 add and 2 rol. The performance was measured
      to be the same (even very slightly better) than before the switch to
      SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
      (i40e), the connection rate dropped from 556k/s to 555k/s while the
      SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
      
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      Cc: George Spelvin <lkml@sdf.org>
      Cc: Amit Klein <aksecurity@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      3744741a
  26. 19 10月, 2020 1 次提交
    • T
      net: core: use list_del_init() instead of list_del() in netdev_run_todo() · 0e8b8d6a
      Taehee Yoo 提交于
      dev->unlink_list is reused unless dev is deleted.
      So, list_del() should not be used.
      Due to using list_del(), dev->unlink_list can't be reused so that
      dev->nested_level update logic doesn't work.
      In order to fix this bug, list_del_init() should be used instead
      of list_del().
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ip link set bond0 master bond1
          ip link set bond0 nomaster
          ip link set bond1 master bond0
          ip link set bond1 nomaster
      
      Splat looks like:
      [  255.750458][ T1030] ============================================
      [  255.751967][ T1030] WARNING: possible recursive locking detected
      [  255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted
      [  255.754553][ T1030] --------------------------------------------
      [  255.756047][ T1030] ip/1030 is trying to acquire lock:
      [  255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150
      [  255.760056][ T1030]
      [  255.760056][ T1030] but task is already holding lock:
      [  255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.764581][ T1030]
      [  255.764581][ T1030] other info that might help us debug this:
      [  255.766645][ T1030]  Possible unsafe locking scenario:
      [  255.766645][ T1030]
      [  255.768566][ T1030]        CPU0
      [  255.769415][ T1030]        ----
      [  255.770259][ T1030]   lock(&dev_addr_list_lock_key/1);
      [  255.771629][ T1030]   lock(&dev_addr_list_lock_key/1);
      [  255.772994][ T1030]
      [  255.772994][ T1030]  *** DEADLOCK ***
      [  255.772994][ T1030]
      [  255.775091][ T1030]  May be due to missing lock nesting notation
      [  255.775091][ T1030]
      [  255.777182][ T1030] 2 locks held by ip/1030:
      [  255.778299][ T1030]  #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0
      [  255.780600][ T1030]  #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.783411][ T1030]
      [  255.783411][ T1030] stack backtrace:
      [  255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772
      [  255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  255.789030][ T1030] Call Trace:
      [  255.789850][ T1030]  dump_stack+0x99/0xd0
      [  255.790882][ T1030]  __lock_acquire.cold.71+0x166/0x3cc
      [  255.792285][ T1030]  ? register_lock_class+0x1a30/0x1a30
      [  255.793619][ T1030]  ? rcu_read_lock_sched_held+0x91/0xc0
      [  255.794963][ T1030]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  255.796246][ T1030]  lock_acquire+0x1b8/0x850
      [  255.797332][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
      [  255.798624][ T1030]  ? bond_enslave+0x3d4d/0x43e0 [bonding]
      [  255.800039][ T1030]  ? check_flags+0x50/0x50
      [  255.801143][ T1030]  ? lock_contended+0xd80/0xd80
      [  255.802341][ T1030]  _raw_spin_lock_nested+0x2e/0x70
      [  255.803592][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
      [  255.804897][ T1030]  dev_mc_sync_multiple+0xc2/0x150
      [  255.806168][ T1030]  bond_enslave+0x3d58/0x43e0 [bonding]
      [  255.807542][ T1030]  ? __lock_acquire+0xe53/0x51b0
      [  255.808824][ T1030]  ? bond_update_slave_arr+0xdc0/0xdc0 [bonding]
      [  255.810451][ T1030]  ? check_chain_key+0x236/0x5e0
      [  255.811742][ T1030]  ? mutex_is_locked+0x13/0x50
      [  255.812910][ T1030]  ? rtnl_is_locked+0x11/0x20
      [  255.814061][ T1030]  ? netdev_master_upper_dev_get+0xf/0x120
      [  255.815553][ T1030]  do_setlink+0x94c/0x3040
      [ ... ]
      
      Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com
      Fixes: 1fc70edb ("net: core: add nested_level variable in net_device")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0e8b8d6a
  27. 14 10月, 2020 1 次提交