1. 20 7月, 2022 1 次提交
  2. 09 4月, 2022 1 次提交
    • M
      net/sched: fix initialization order when updating chain 0 head · e65812fd
      Marcelo Ricardo Leitner 提交于
      Currently, when inserting a new filter that needs to sit at the head
      of chain 0, it will first update the heads pointer on all devices using
      the (shared) block, and only then complete the initialization of the new
      element so that it has a "next" element.
      
      This can lead to a situation that the chain 0 head is propagated to
      another CPU before the "next" initialization is done. When this race
      condition is triggered, packets being matched on that CPU will simply
      miss all other filters, and will flow through the stack as if there were
      no other filters installed. If the system is using OVS + TC, such
      packets will get handled by vswitchd via upcall, which results in much
      higher latency and reordering. For other applications it may result in
      packet drops.
      
      This is reproducible with a tc only setup, but it varies from system to
      system. It could be reproduced with a shared block amongst 10 veth
      tunnels, and an ingress filter mirroring packets to another veth.
      That's because using the last added veth tunnel to the shared block to
      do the actual traffic, it makes the race window bigger and easier to
      trigger.
      
      The fix is rather simple, to just initialize the next pointer of the new
      filter instance (tp) before propagating the head change.
      
      The fixes tag is pointing to the original code though this issue should
      only be observed when using it unlocked.
      
      Fixes: 2190d1d0 ("net: sched: introduce helpers to work with filter chains")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.1649341369.git.marcelo.leitner@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e65812fd
  3. 08 4月, 2022 2 次提交
  4. 14 2月, 2022 1 次提交
    • E
      net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e
      Eric Dumazet 提交于
      syzbot found a data-race [1] which lead me to add __rcu
      annotations to netdev->qdisc, and proper accessors
      to get LOCKDEP support.
      
      [1]
      BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
      
      write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
       attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
       dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
       __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
       __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
       rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
       __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
       rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
       qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
       __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
       tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
       rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5891cd5e
  5. 05 2月, 2022 1 次提交
  6. 02 2月, 2022 1 次提交
    • E
      net: sched: fix use-after-free in tc_new_tfilter() · 04c2a47f
      Eric Dumazet 提交于
      Whenever tc_new_tfilter() jumps back to replay: label,
      we need to make sure @q and @chain local variables are cleared again,
      or risk use-after-free as in [1]
      
      For consistency, apply the same fix in tc_ctl_chain()
      
      BUG: KASAN: use-after-free in mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
      Write of size 8 at addr ffff8880985c4b08 by task syz-executor.4/1945
      
      CPU: 0 PID: 1945 Comm: syz-executor.4 Not tainted 5.17.0-rc1-syzkaller-00495-gff58831f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
       tcf_chain_head_change_item net/sched/cls_api.c:372 [inline]
       tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:386
       tcf_chain_tp_insert net/sched/cls_api.c:1657 [inline]
       tcf_chain_tp_insert_unique net/sched/cls_api.c:1707 [inline]
       tc_new_tfilter+0x1e67/0x2350 net/sched/cls_api.c:2086
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f2647172059
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f2645aa5168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f2647285100 RCX: 00007f2647172059
      RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000006
      RBP: 00007f26471cc08d R08: 0000000000000000 R09: 0000000000000000
      R10: 9e00000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffb3f7f02f R14: 00007f2645aa5300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 1944:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:524
       kmalloc_node include/linux/slab.h:604 [inline]
       kzalloc_node include/linux/slab.h:726 [inline]
       qdisc_alloc+0xac/0xa10 net/sched/sch_generic.c:941
       qdisc_create.constprop.0+0xce/0x10f0 net/sched/sch_api.c:1211
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5592
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3609:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x130/0x160 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:236 [inline]
       slab_free_hook mm/slub.c:1728 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1754
       slab_free mm/slub.c:3509 [inline]
       kfree+0xcb/0x280 mm/slub.c:4562
       rcu_do_batch kernel/rcu/tree.c:2527 [inline]
       rcu_core+0x7b8/0x1540 kernel/rcu/tree.c:2778
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:3026 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3106
       qdisc_put_unlocked+0x6f/0x90 net/sched/sch_generic.c:1109
       tcf_block_release+0x86/0x90 net/sched/cls_api.c:1238
       tc_new_tfilter+0xc0d/0x2350 net/sched/cls_api.c:2148
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x331/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmmsg+0x195/0x470 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff8880985c4800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 776 bytes inside of
       1024-byte region [ffff8880985c4800, ffff8880985c4c00)
      The buggy address belongs to the page:
      page:ffffea0002617000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985c0
      head:ffffea0002617000 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888010c41dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 1941, ts 1038999441284, free_ts 1033444432829
       prep_new_page mm/page_alloc.c:2434 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
       alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271
       alloc_slab_page mm/slub.c:1799 [inline]
       allocate_slab mm/slub.c:1944 [inline]
       new_slab+0x28a/0x3b0 mm/slub.c:2004
       ___slab_alloc+0x87c/0xe90 mm/slub.c:3018
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105
       slab_alloc_node mm/slub.c:3196 [inline]
       slab_alloc mm/slub.c:3238 [inline]
       __kmalloc+0x2fb/0x340 mm/slub.c:4420
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:715 [inline]
       __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1335
       neigh_sysctl_register+0x2c8/0x5e0 net/core/neighbour.c:3787
       devinet_sysctl_register+0xb1/0x230 net/ipv4/devinet.c:2618
       inetdev_init+0x286/0x580 net/ipv4/devinet.c:278
       inetdev_event+0xa8a/0x15d0 net/ipv4/devinet.c:1532
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1919
       call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
       call_netdevice_notifiers net/core/dev.c:1945 [inline]
       register_netdevice+0x1073/0x1500 net/core/dev.c:9698
       veth_newlink+0x59c/0xa90 drivers/net/veth.c:1722
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1352 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
       free_unref_page_prepare mm/page_alloc.c:3325 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3404
       release_pages+0x748/0x1220 mm/swap.c:956
       tlb_batch_pages_flush mm/mmu_gather.c:50 [inline]
       tlb_flush_mmu_free mm/mmu_gather.c:243 [inline]
       tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:250
       zap_pte_range mm/memory.c:1441 [inline]
       zap_pmd_range mm/memory.c:1490 [inline]
       zap_pud_range mm/memory.c:1519 [inline]
       zap_p4d_range mm/memory.c:1540 [inline]
       unmap_page_range+0x1d1d/0x2a30 mm/memory.c:1561
       unmap_single_vma+0x198/0x310 mm/memory.c:1606
       unmap_vmas+0x16b/0x2f0 mm/memory.c:1638
       exit_mmap+0x201/0x670 mm/mmap.c:3178
       __mmput+0x122/0x4b0 kernel/fork.c:1114
       mmput+0x56/0x60 kernel/fork.c:1135
       exit_mm kernel/exit.c:507 [inline]
       do_exit+0xa3c/0x2a30 kernel/exit.c:793
       do_group_exit+0xd2/0x2f0 kernel/exit.c:935
       __do_sys_exit_group kernel/exit.c:946 [inline]
       __se_sys_exit_group kernel/exit.c:944 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff8880985c4a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880985c4a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880985c4b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff8880985c4b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880985c4c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220131172018.3704490-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      04c2a47f
  7. 10 1月, 2022 1 次提交
  8. 21 12月, 2021 1 次提交
  9. 19 12月, 2021 5 次提交
  10. 18 12月, 2021 2 次提交
  11. 14 12月, 2021 1 次提交
  12. 19 8月, 2021 1 次提交
  13. 11 8月, 2021 1 次提交
    • M
      net/sched: cls_api, reset flags on replay · a5397d68
      Mark Bloch 提交于
      tc_new_tfilter() can replay a request if it got EAGAIN. The cited commit
      didn't account for this when it converted TC action ->init() API
      to use flags instead of parameters. This can lead to passing stale flags
      down the call chain which results in trying to lock rtnl when it's
      already locked, deadlocking the entire system.
      
      Fix by making sure to reset flags on each replay.
      
      ============================================
      WARNING: possible recursive locking detected
      5.14.0-rc3-custom-49011-g3d2bbb4f104d #447 Not tainted
      --------------------------------------------
      tc/37605 is trying to acquire lock:
      ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_setup_cb_add+0x14b/0x4d0
      
      but task is already holding lock:
      ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_new_tfilter+0xb12/0x22e0
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
             CPU0
             ----
        lock(rtnl_mutex);
        lock(rtnl_mutex);
      
       *** DEADLOCK ***
       May be due to missing lock nesting notation
      1 lock held by tc/37605:
       #0: ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_new_tfilter+0xb12/0x22e0
      
      stack backtrace:
      CPU: 0 PID: 37605 Comm: tc Not tainted 5.14.0-rc3-custom-49011-g3d2bbb4f104d #447
      Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
      Call Trace:
       dump_stack_lvl+0x8b/0xb3
       __lock_acquire.cold+0x175/0x3cb
       lock_acquire+0x1a4/0x4f0
       __mutex_lock+0x136/0x10d0
       fl_hw_replace_filter+0x458/0x630 [cls_flower]
       fl_change+0x25f2/0x4a64 [cls_flower]
       tc_new_tfilter+0xa65/0x22e0
       rtnetlink_rcv_msg+0x86c/0xc60
       netlink_rcv_skb+0x14d/0x430
       netlink_unicast+0x539/0x7e0
       netlink_sendmsg+0x84d/0xd80
       ____sys_sendmsg+0x7ff/0x970
       ___sys_sendmsg+0xf8/0x170
       __sys_sendmsg+0xea/0x1b0
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f7b93b6c0a7
      Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48>
      RSP: 002b:00007ffe365b3818 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b93b6c0a7
      RDX: 0000000000000000 RSI: 00007ffe365b3880 RDI: 0000000000000003
      RBP: 00000000610a75f6 R08: 0000000000000001 R09: 0000000000000000
      R10: fffffffffffff3a9 R11: 0000000000000246 R12: 0000000000000001
      R13: 0000000000000000 R14: 00007ffe365b7b58 R15: 00000000004822c0
      
      Fixes: 695176bf ("net_sched: refactor TC action init API")
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Tested-by: NIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20210810034305.63997-1-mbloch@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a5397d68
  14. 02 8月, 2021 1 次提交
    • C
      net_sched: refactor TC action init API · 695176bf
      Cong Wang 提交于
      TC action ->init() API has 10 parameters, it becomes harder
      to read. Some of them are just boolean and can be replaced
      by flags. Similarly for the internal API tcf_action_init()
      and tcf_exts_validate().
      
      This patch converts them to flags and fold them into
      the upper 16 bits of "flags", whose lower 16 bits are still
      reserved for user-space. More specifically, the following
      kernel flags are introduced:
      
      TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
      distinguish whether it is compatible with policer.
      
      TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
      this action is bound to a filter.
      
      TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
      means we are replacing an existing action.
      
      TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
      opposite meaning, because we still hold RTNL in most
      cases.
      
      The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
      untouched and still stored as before.
      
      I have tested this patch with tdc and I do not see any
      failure related to this patch.
      Tested-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      695176bf
  15. 30 7月, 2021 1 次提交
  16. 22 7月, 2021 1 次提交
  17. 17 7月, 2021 1 次提交
  18. 26 5月, 2021 1 次提交
    • V
      net: zero-initialize tc skb extension on allocation · 9453d45e
      Vlad Buslov 提交于
      Function skb_ext_add() doesn't initialize created skb extension with any
      value and leaves it up to the user. However, since extension of type
      TC_SKB_EXT originally contained only single value tc_skb_ext->chain its
      users used to just assign the chain value without setting whole extension
      memory to zero first. This assumption changed when TC_SKB_EXT extension was
      extended with additional fields but not all users were updated to
      initialize the new fields which leads to use of uninitialized memory
      afterwards. UBSAN log:
      
      [  778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28
      [  778.301495] load of value 107 is not a valid value for type '_Bool'
      [  778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2
      [  778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  778.307901] Call Trace:
      [  778.308680]  <IRQ>
      [  778.309358]  dump_stack+0xbb/0x107
      [  778.310307]  ubsan_epilogue+0x5/0x40
      [  778.311167]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
      [  778.312454]  ? memset+0x20/0x40
      [  778.313230]  ovs_flow_key_extract.cold+0xf/0x14 [openvswitch]
      [  778.314532]  ovs_vport_receive+0x19e/0x2e0 [openvswitch]
      [  778.315749]  ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch]
      [  778.317188]  ? create_prof_cpu_mask+0x20/0x20
      [  778.318220]  ? arch_stack_walk+0x82/0xf0
      [  778.319153]  ? secondary_startup_64_no_verify+0xb0/0xbb
      [  778.320399]  ? stack_trace_save+0x91/0xc0
      [  778.321362]  ? stack_trace_consume_entry+0x160/0x160
      [  778.322517]  ? lock_release+0x52e/0x760
      [  778.323444]  netdev_frame_hook+0x323/0x610 [openvswitch]
      [  778.324668]  ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch]
      [  778.325950]  __netif_receive_skb_core+0x771/0x2db0
      [  778.327067]  ? lock_downgrade+0x6e0/0x6f0
      [  778.328021]  ? lock_acquire+0x565/0x720
      [  778.328940]  ? generic_xdp_tx+0x4f0/0x4f0
      [  778.329902]  ? inet_gro_receive+0x2a7/0x10a0
      [  778.330914]  ? lock_downgrade+0x6f0/0x6f0
      [  778.331867]  ? udp4_gro_receive+0x4c4/0x13e0
      [  778.332876]  ? lock_release+0x52e/0x760
      [  778.333808]  ? dev_gro_receive+0xcc8/0x2380
      [  778.334810]  ? lock_downgrade+0x6f0/0x6f0
      [  778.335769]  __netif_receive_skb_list_core+0x295/0x820
      [  778.336955]  ? process_backlog+0x780/0x780
      [  778.337941]  ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core]
      [  778.339613]  ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0
      [  778.341033]  ? kvm_clock_get_cycles+0x14/0x20
      [  778.342072]  netif_receive_skb_list_internal+0x5f5/0xcb0
      [  778.343288]  ? __kasan_kmalloc+0x7a/0x90
      [  778.344234]  ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core]
      [  778.345676]  ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core]
      [  778.347140]  ? __netif_receive_skb_list_core+0x820/0x820
      [  778.348351]  ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core]
      [  778.349688]  ? napi_gro_flush+0x26c/0x3c0
      [  778.350641]  napi_complete_done+0x188/0x6b0
      [  778.351627]  mlx5e_napi_poll+0x373/0x1b80 [mlx5_core]
      [  778.352853]  __napi_poll+0x9f/0x510
      [  778.353704]  ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core]
      [  778.355158]  net_rx_action+0x34c/0xa40
      [  778.356060]  ? napi_threaded_poll+0x3d0/0x3d0
      [  778.357083]  ? sched_clock_cpu+0x18/0x190
      [  778.358041]  ? __common_interrupt+0x8e/0x1a0
      [  778.359045]  __do_softirq+0x1ce/0x984
      [  778.359938]  __irq_exit_rcu+0x137/0x1d0
      [  778.360865]  irq_exit_rcu+0xa/0x20
      [  778.361708]  common_interrupt+0x80/0xa0
      [  778.362640]  </IRQ>
      [  778.363212]  asm_common_interrupt+0x1e/0x40
      [  778.364204] RIP: 0010:native_safe_halt+0xe/0x10
      [  778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00
      [  778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246
      [  778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468
      [  778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e
      [  778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb
      [  778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000
      [  778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000
      [  778.378491]  ? rcu_eqs_enter.constprop.0+0xb8/0xe0
      [  778.379606]  ? default_idle_call+0x5e/0xe0
      [  778.380578]  default_idle+0xa/0x10
      [  778.381406]  default_idle_call+0x96/0xe0
      [  778.382350]  do_idle+0x3d4/0x550
      [  778.383153]  ? arch_cpu_idle_exit+0x40/0x40
      [  778.384143]  cpu_startup_entry+0x19/0x20
      [  778.385078]  start_kernel+0x3c7/0x3e5
      [  778.385978]  secondary_startup_64_no_verify+0xb0/0xbb
      
      Fix the issue by providing new function tc_skb_ext_alloc() that allocates
      tc skb extension and initializes its memory to 0 before returning it to the
      caller. Change all existing users to use new API instead of calling
      skb_ext_add() directly.
      
      Fixes: 038ebb1a ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9453d45e
  19. 20 5月, 2021 1 次提交
  20. 09 4月, 2021 2 次提交
    • V
      net: sched: fix err handler in tcf_action_init() · b3650bf7
      Vlad Buslov 提交于
      With recent changes that separated action module load from action
      initialization tcf_action_init() function error handling code was modified
      to manually release the loaded modules if loading/initialization of any
      further action in same batch failed. For the case when all modules
      successfully loaded and some of the actions were initialized before one of
      them failed in init handler. In this case for all previous actions the
      module will be released twice by the error handler: First time by the loop
      that manually calls module_put() for all ops, and second time by the action
      destroy code that puts the module after destroying the action.
      
      Reproduction:
      
      $ sudo tc actions add action simple sdata \"2\" index 2
      $ sudo tc actions add action simple sdata \"1\" index 1 \
                            action simple sdata \"2\" index 2
      RTNETLINK answers: File exists
      We have an error talking to the kernel
      $ sudo tc actions ls action simple
      total acts 1
      
              action order 0: Simple <"2">
               index 2 ref 1 bind 0
      $ sudo tc actions flush action simple
      $ sudo tc actions ls action simple
      $ sudo tc actions add action simple sdata \"2\" index 2
      Error: Failed to load TC action module.
      We have an error talking to the kernel
      $ lsmod | grep simple
      act_simple             20480  -1
      
      Fix the issue by modifying module reference counting handling in action
      initialization code:
      
      - Get module reference in tcf_idr_create() and put it in tcf_idr_release()
      instead of taking over the reference held by the caller.
      
      - Modify users of tcf_action_init_1() to always release the module
      reference which they obtain before calling init function instead of
      assuming that created action takes over the reference.
      
      - Finally, modify tcf_action_init_1() to not release the module reference
      when overwriting existing action as this is no longer necessary since both
      upper and lower layers obtain and manage their own module references
      independently.
      
      Fixes: d349f997 ("net_sched: fix RTNL deadlock again caused by request_module()")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3650bf7
    • V
      net: sched: fix action overwrite reference counting · 87c750e8
      Vlad Buslov 提交于
      Action init code increments reference counter when it changes an action.
      This is the desired behavior for cls API which needs to obtain action
      reference for every classifier that points to action. However, act API just
      needs to change the action and releases the reference before returning.
      This sequence breaks when the requested action doesn't exist, which causes
      act API init code to create new action with specified index, but action is
      still released before returning and is deleted (unless it was referenced
      concurrently by cls API).
      
      Reproduction:
      
      $ sudo tc actions ls action gact
      $ sudo tc actions change action gact drop index 1
      $ sudo tc actions ls action gact
      
      Extend tcf_action_init() to accept 'init_res' array and initialize it with
      action->ops->init() result. In tcf_action_add() remove pointers to created
      actions from actions array before passing it to tcf_action_put_many().
      
      Fixes: cae422f3 ("net: sched: use reference counting action init")
      Reported-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87c750e8
  21. 03 4月, 2021 1 次提交
  22. 17 3月, 2021 1 次提交
  23. 14 3月, 2021 1 次提交
  24. 17 2月, 2021 1 次提交
    • V
      net: sched: fix police ext initialization · 396d7f23
      Vlad Buslov 提交于
      When police action is created by cls API tcf_exts_validate() first
      conditional that calls tcf_action_init_1() directly, the action idr is not
      updated according to latest changes in action API that require caller to
      commit newly created action to idr with tcf_idr_insert_many(). This results
      such action not being accessible through act API and causes crash reported
      by syzbot:
      
      ==================================================================
      BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
      BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
      BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
      BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
      Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204
      
      CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       __kasan_report mm/kasan/report.c:400 [inline]
       kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      ==================================================================
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G    B             5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       panic+0x306/0x73d kernel/panic.c:231
       end_report+0x58/0x5e mm/kasan/report.c:100
       __kasan_report mm/kasan/report.c:403 [inline]
       kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      Kernel Offset: disabled
      
      Fix the issue by calling tcf_idr_insert_many() after successful action
      initialization.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      396d7f23
  25. 19 1月, 2021 1 次提交
    • C
      net_sched: fix RTNL deadlock again caused by request_module() · d349f997
      Cong Wang 提交于
      tcf_action_init_1() loads tc action modules automatically with
      request_module() after parsing the tc action names, and it drops RTNL
      lock and re-holds it before and after request_module(). This causes a
      lot of troubles, as discovered by syzbot, because we can be in the
      middle of batch initializations when we create an array of tc actions.
      
      One of the problem is deadlock:
      
      CPU 0					CPU 1
      rtnl_lock();
      for (...) {
        tcf_action_init_1();
          -> rtnl_unlock();
          -> request_module();
      				rtnl_lock();
      				for (...) {
      				  tcf_action_init_1();
      				    -> tcf_idr_check_alloc();
      				   // Insert one action into idr,
      				   // but it is not committed until
      				   // tcf_idr_insert_many(), then drop
      				   // the RTNL lock in the _next_
      				   // iteration
      				   -> rtnl_unlock();
          -> rtnl_lock();
          -> a_o->init();
            -> tcf_idr_check_alloc();
            // Now waiting for the same index
            // to be committed
      				    -> request_module();
      				    -> rtnl_lock()
      				    // Now waiting for RTNL lock
      				}
      				rtnl_unlock();
      }
      rtnl_unlock();
      
      This is not easy to solve, we can move the request_module() before
      this loop and pre-load all the modules we need for this netlink
      message and then do the rest initializations. So the loop breaks down
      to two now:
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      struct tc_action_ops *a_o;
      
                      a_o = tc_action_load_ops(name, tb[i]...);
                      ops[i - 1] = a_o;
              }
      
              for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                      act = tcf_action_init_1(ops[i - 1]...);
              }
      
      Although this looks serious, it only has been reported by syzbot, so it
      seems hard to trigger this by humans. And given the size of this patch,
      I'd suggest to make it to net-next and not to backport to stable.
      
      This patch has been tested by syzbot and tested with tdc.py by me.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
      Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d349f997
  26. 02 12月, 2020 1 次提交
  27. 17 11月, 2020 2 次提交
  28. 31 10月, 2020 1 次提交
  29. 28 10月, 2020 1 次提交
    • L
      net: protect tcf_block_unbind with block lock · d6535dca
      Leon Romanovsky 提交于
      The tcf_block_unbind() expects that the caller will take block->cb_lock
      before calling it, however the code took RTNL lock and dropped cb_lock
      instead. This causes to the following kernel panic.
      
       WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420
       Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw]
       CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G        W         5.9.0+ #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:tcf_block_unbind+0x2db/0x420
       Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff <0f> 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd
       RSP: 0018:ffff888117d17968 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b
       RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878
       RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87
       R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000
       R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0
       FS:  00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0
       Call Trace:
        tc_block_indr_cleanup+0x3e0/0x5a0
        ? tcf_block_unbind+0x420/0x420
        ? __mutex_unlock_slowpath+0xe7/0x610
        flow_indr_dev_unregister+0x5e2/0x930
        ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
        ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
        ? flow_indr_block_cb_alloc+0x3c0/0x3c0
        ? mlx5_db_free+0x37c/0x4b0 [mlx5_core]
        mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core]
        mlx5e_detach_netdev+0xe5/0x120 [mlx5_core]
        mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core]
        esw_offloads_disable+0x227/0x2b0 [mlx5_core]
        mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core]
        mlx5_eswitch_disable+0x94/0xf0 [mlx5_core]
        mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core]
        mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core]
        sriov_numvfs_store+0x261/0x2f0
        ? sriov_drivers_autoprobe_store+0x110/0x110
        ? sysfs_file_ops+0x170/0x170
        ? sysfs_file_ops+0x117/0x170
        ? sysfs_file_ops+0x170/0x170
        kernfs_fop_write+0x1ff/0x3f0
        ? rcu_read_lock_any_held+0x6e/0x90
        vfs_write+0x1f3/0x620
        ksys_write+0xf9/0x1d0
        ? __x64_sys_read+0xb0/0xb0
        ? lockdep_hardirqs_on_prepare+0x273/0x3f0
        ? syscall_enter_from_user_mode+0x1d/0x50
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      <...>
      
       ---[ end trace bfdd028ada702879 ]---
      
      Fixes: 0fdcf78d ("net: use flow_indr_dev_setup_offload()")
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20201026123327.1141066-1-leon@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d6535dca
  30. 21 10月, 2020 1 次提交
  31. 04 8月, 2020 1 次提交
    • W
      net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct · 038ebb1a
      wenxu 提交于
      When openvswitch conntrack offload with act_ct action. Fragment packets
      defrag in the ingress tc act_ct action and miss the next chain. Then the
      packet pass to the openvswitch datapath without the mru. The over
      mtu packet will be dropped in output action in openvswitch for over mtu.
      
      "kernel: net2: dropped over-mtu packet: 1528 > 1500"
      
      This patch add mru in the tc_skb_ext for adefrag and miss next chain
      situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
      to the qdisc_skb_cb when the packet defrag. And When the chain miss,
      The mru is set to tc_skb_ext which can be got by ovs datapath.
      
      Fixes: b57dc7c1 ("net/sched: Introduce action ct")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      038ebb1a
  32. 25 7月, 2020 1 次提交