1. 26 12月, 2021 1 次提交
    • X
      sctp: use call_rcu to free endpoint · 5ec7d18d
      Xin Long 提交于
      This patch is to delay the endpoint free by calling call_rcu() to fix
      another use-after-free issue in sctp_sock_dump():
      
        BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
        Call Trace:
          __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
          lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
          _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
          spin_lock_bh include/linux/spinlock.h:334 [inline]
          __lock_sock+0x203/0x350 net/core/sock.c:2253
          lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
          lock_sock include/net/sock.h:1492 [inline]
          sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
          sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
          sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
          __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
          inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
          netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
          __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
          netlink_dump_start include/linux/netlink.h:216 [inline]
          inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
          __sock_diag_cmd net/core/sock_diag.c:232 [inline]
          sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
          netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
          sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
      
      This issue occurs when asoc is peeled off and the old sk is freed after
      getting it by asoc->base.sk and before calling lock_sock(sk).
      
      To prevent the sk free, as a holder of the sk, ep should be alive when
      calling lock_sock(). This patch uses call_rcu() and moves sock_put and
      ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
      hold the ep under rcu_read_lock in sctp_transport_traverse_process().
      
      If sctp_endpoint_hold() returns true, it means this ep is still alive
      and we have held it and can continue to dump it; If it returns false,
      it means this ep is dead and can be freed after rcu_read_unlock, and
      we should skip it.
      
      In sctp_sock_dump(), after locking the sk, if this ep is different from
      tsp->asoc->ep, it means during this dumping, this asoc was peeled off
      before calling lock_sock(), and the sk should be skipped; If this ep is
      the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
      and due to lock_sock, no peeloff will happen either until release_sock.
      
      Note that delaying endpoint free won't delay the port release, as the
      port release happens in sctp_endpoint_destroy() before calling call_rcu().
      Also, freeing endpoint by call_rcu() makes it safe to access the sk by
      asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
      
      Thanks Jones to bring this issue up.
      
      v1->v2:
        - improve the changelog.
        - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
      
      Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com
      Reported-by: NLee Jones <lee.jones@linaro.org>
      Fixes: d25adbeb ("sctp: fix an use-after-free issue in sctp_sock_dump")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ec7d18d
  2. 24 12月, 2021 3 次提交
  3. 21 12月, 2021 1 次提交
    • E
      inet: fully convert sk->sk_rx_dst to RCU rules · 8f905c0e
      Eric Dumazet 提交于
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8f905c0e
  4. 20 12月, 2021 2 次提交
  5. 18 12月, 2021 5 次提交
    • L
      ax25: NPD bug when detaching AX25 device · 1ade48d0
      Lin Ma 提交于
      The existing cleanup routine implementation is not well synchronized
      with the syscall routine. When a device is detaching, below race could
      occur.
      
      static int ax25_sendmsg(...) {
        ...
        lock_sock()
        ax25 = sk_to_ax25(sk);
        if (ax25->ax25_dev == NULL) // CHECK
        ...
        ax25_queue_xmit(skb, ax25->ax25_dev->dev); // USE
        ...
      }
      
      static void ax25_kill_by_device(...) {
        ...
        if (s->ax25_dev == ax25_dev) {
          s->ax25_dev = NULL;
          ...
      }
      
      Other syscall functions like ax25_getsockopt, ax25_getname,
      ax25_info_show also suffer from similar races. To fix them, this patch
      introduce lock_sock() into ax25_kill_by_device in order to guarantee
      that the nullify action in cleanup routine cannot proceed when another
      socket request is pending.
      Signed-off-by: NHanjie Wu <nagi@zju.edu.cn>
      Signed-off-by: NLin Ma <linma@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ade48d0
    • H
      Revert "tipc: use consistent GFP flags" · f845fe58
      Hoang Le 提交于
      This reverts commit 86c3a3e9.
      
      The tipc_aead_init() function can be calling from an interrupt routine.
      This allocation might sleep with GFP_KERNEL flag, hence the following BUG
      is reported.
      
      [   17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230
      [   17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3
      [   17.664093] preempt_count: 302, expected: 0
      [   17.665619] RCU nest depth: 2, expected: 0
      [   17.667163] Preemption disabled at:
      [   17.667165] [<0000000000000000>] 0x0
      [   17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         5.16.0-rc4+ #1
      [   17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [   17.675540] Call Trace:
      [   17.676285]  <IRQ>
      [   17.676913]  dump_stack_lvl+0x34/0x44
      [   17.678033]  __might_resched.cold+0xd6/0x10f
      [   17.679311]  kmem_cache_alloc_trace+0x14d/0x220
      [   17.680663]  tipc_crypto_start+0x4a/0x2b0 [tipc]
      [   17.682146]  ? kmem_cache_alloc_trace+0xd3/0x220
      [   17.683545]  tipc_node_create+0x2f0/0x790 [tipc]
      [   17.684956]  tipc_node_check_dest+0x72/0x680 [tipc]
      [   17.686706]  ? ___cache_free+0x31/0x350
      [   17.688008]  ? skb_release_data+0x128/0x140
      [   17.689431]  tipc_disc_rcv+0x479/0x510 [tipc]
      [   17.690904]  tipc_rcv+0x71c/0x730 [tipc]
      [   17.692219]  ? __netif_receive_skb_core+0xb7/0xf60
      [   17.693856]  tipc_l2_rcv_msg+0x5e/0x90 [tipc]
      [   17.695333]  __netif_receive_skb_list_core+0x20b/0x260
      [   17.697072]  netif_receive_skb_list_internal+0x1bf/0x2e0
      [   17.698870]  ? dev_gro_receive+0x4c2/0x680
      [   17.700255]  napi_complete_done+0x6f/0x180
      [   17.701657]  virtnet_poll+0x29c/0x42e [virtio_net]
      [   17.703262]  __napi_poll+0x2c/0x170
      [   17.704429]  net_rx_action+0x22f/0x280
      [   17.705706]  __do_softirq+0xfd/0x30a
      [   17.706921]  common_interrupt+0xa4/0xc0
      [   17.708206]  </IRQ>
      [   17.708922]  <TASK>
      [   17.709651]  asm_common_interrupt+0x1e/0x40
      [   17.711078] RIP: 0010:default_idle+0x18/0x20
      
      Fixes: 86c3a3e9 ("tipc: use consistent GFP flags")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Link: https://lore.kernel.org/r/20211217030059.5947-1-hoang.h.le@dektech.com.auSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      f845fe58
    • P
      net: openvswitch: Fix matching zone id for invalid conns arriving from tc · 635d448a
      Paul Blakey 提交于
      Zone id is not restored if we passed ct and ct rejected the connection,
      as there is no ct info on the skb.
      
      Save the zone from tc skb cb to tc skb extension and pass it on to
      ovs, use that info to restore the zone id for invalid connections.
      
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      635d448a
    • P
      net/sched: flow_dissector: Fix matching on zone id for invalid conns · 38495958
      Paul Blakey 提交于
      If ct rejects a flow, it removes the conntrack info from the skb.
      act_ct sets the post_ct variable so the dissector will see this case
      as an +tracked +invalid state, but the zone id is lost with the
      conntrack info.
      
      To restore the zone id on such cases, set the last executed zone,
      via the tc control block, when passing ct, and read it back in the
      dissector if there is no ct info on the skb (invalid connection).
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      38495958
    • P
      net/sched: Extend qdisc control block with tc control block · ec624fe7
      Paul Blakey 提交于
      BPF layer extends the qdisc control block via struct bpf_skb_data_end
      and because of that there is no more room to add variables to the
      qdisc layer control block without going over the skb->cb size.
      
      Extend the qdisc control block with a tc control block,
      and move all tc related variables to there as a pre-step for
      extending the tc control block with additional members.
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ec624fe7
  6. 17 12月, 2021 2 次提交
    • E
      sit: do not call ipip6_dev_free() from sit_init_net() · e28587cc
      Eric Dumazet 提交于
      ipip6_dev_free is sit dev->priv_destructor, already called
      by register_netdevice() if something goes wrong.
      
      Alternative would be to make ipip6_dev_free() robust against
      multiple invocations, but other drivers do not implement this
      strategy.
      
      syzbot reported:
      
      dst_release underflow
      WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
      Modules linked in:
      CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
      Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
      RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246
      RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000
      RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000
      RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c
      R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358
      R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000
      FS:  00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
       ipip6_dev_free net/ipv6/sit.c:1414 [inline]
       sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
       ops_init+0x313/0x430 net/core/net_namespace.c:140
       setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
       copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
       create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
       ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
       __do_sys_unshare kernel/fork.c:3146 [inline]
       __se_sys_unshare kernel/fork.c:3144 [inline]
       __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f66c882ce99
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200
      RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000
       </TASK>
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e28587cc
    • D
      net/smc: Prevent smc_release() from long blocking · 5c15b312
      D. Wythe 提交于
      In nginx/wrk benchmark, there's a hung problem with high probability
      on case likes that: (client will last several minutes to exit)
      
      server: smc_run nginx
      
      client: smc_run wrk -c 10000 -t 1 http://server
      
      Client hangs with the following backtrace:
      
      0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
      1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
      2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
      3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
      4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
      5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
      6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
      7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
      8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
      9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
      10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
      11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
      12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
      13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
      14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
      
      This issue dues to flush_work(), which is used to wait for
      smc_connect_work() to finish in smc_release(). Once lots of
      smc_connect_work() was pending or all executing work dangling,
      smc_release() has to block until one worker comes to free, which
      is equivalent to wait another smc_connnect_work() to finish.
      
      In order to fix this, There are two changes:
      
      1. For those idle smc_connect_work(), cancel it from the workqueue; for
         executing smc_connect_work(), waiting for it to finish. For that
         purpose, replace flush_work() with cancel_work_sync().
      
      2. Since smc_connect() hold a reference for passive closing, if
         smc_connect_work() has been cancelled, release the reference.
      
      Fixes: 24ac3a08 ("net/smc: rebuild nonblocking connect")
      Reported-by: NTony Lu <tonylu@linux.alibaba.com>
      Tested-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: ND. Wythe <alibuda@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5c15b312
  7. 16 12月, 2021 5 次提交
    • F
      netfilter: ctnetlink: remove expired entries first · 76f12e63
      Florian Westphal 提交于
      When dumping conntrack table to userspace via ctnetlink, check if the ct has
      already expired before doing any of the 'skip' checks.
      
      This expires dead entries faster.
      /proc handler also removes outdated entries first.
      Reported-by: NVitaly Zuevsky <vzuevsky@ns1.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      76f12e63
    • G
      net: Fix double 0x prefix print in SKB dump · 8a03ef67
      Gal Pressman 提交于
      When printing netdev features %pNF already takes care of the 0x prefix,
      remove the explicit one.
      
      Fixes: 6413139d ("skbuff: increase verbosity when dumping skb data")
      Signed-off-by: NGal Pressman <gal@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a03ef67
    • W
      net/packet: rx_owner_map depends on pg_vec · ec6af094
      Willem de Bruijn 提交于
      Packet sockets may switch ring versions. Avoid misinterpreting state
      between versions, whose fields share a union. rx_owner_map is only
      allocated with a packet ring (pg_vec) and both are swapped together.
      If pg_vec is NULL, meaning no packet ring was allocated, then neither
      was rx_owner_map. And the field may be old state from a tpacket_v3.
      
      Fixes: 61fad681 ("net/packet: tpacket_rcv: avoid a producer race condition")
      Reported-by: NSyzbot <syzbot+1ac0994a0a0c55151121@syzkaller.appspotmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211215143937.106178-1-willemdebruijn.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ec6af094
    • I
      netfilter: fix regression in looped (broad|multi)cast's MAC handling · ebb966d3
      Ignacy Gawędzki 提交于
      In commit 5648b5e1 ("netfilter: nfnetlink_queue: fix OOB when mac
      header was cleared"), the test for non-empty MAC header introduced in
      commit 2c38de4c ("netfilter: fix looped (broad|multi)cast's MAC
      handling") has been replaced with a test for a set MAC header.
      
      This breaks the case when the MAC header has been reset (using
      skb_reset_mac_header), as is the case with looped-back multicast
      packets.  As a result, the packets ending up in NFQUEUE get a bogus
      hwaddr interpreted from the first bytes of the IP header.
      
      This patch adds a test for a non-empty MAC header in addition to the
      test for a set MAC header.  The same two tests are also implemented in
      nfnetlink_log.c, where the initial code of commit 2c38de4c
      ("netfilter: fix looped (broad|multi)cast's MAC handling") has not been
      touched, but where supposedly the same situation may happen.
      
      Fixes: 5648b5e1 ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared")
      Signed-off-by: NIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ebb966d3
    • E
      netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy() · 0f7d9b31
      Eric Dumazet 提交于
      We need to use list_for_each_entry_safe() iterator
      because we can not access @catchall after kfree_rcu() call.
      
      syzbot reported:
      
      BUG: KASAN: use-after-free in nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline]
      BUG: KASAN: use-after-free in nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
      BUG: KASAN: use-after-free in nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493
      Read of size 8 at addr ffff8880716e5b80 by task syz-executor.3/8871
      
      CPU: 1 PID: 8871 Comm: syz-executor.3 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x2ed mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline]
       nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
       nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493
       __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626
       nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
       blocking_notifier_call_chain kernel/notifier.c:318 [inline]
       blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306
       netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788
       __sock_release+0xcd/0x280 net/socket.c:649
       sock_close+0x18/0x20 net/socket.c:1314
       __fput+0x286/0x9f0 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f75fbf28adb
      Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44
      RSP: 002b:00007ffd8da7ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f75fbf28adb
      RDX: 00007f75fc08e828 RSI: ffffffffffffffff RDI: 0000000000000003
      RBP: 00007f75fc08a960 R08: 0000000000000000 R09: 00007f75fc08e830
      R10: 00007ffd8da7ed10 R11: 0000000000000293 R12: 00000000002067c3
      R13: 00007ffd8da7ed10 R14: 00007f75fc088f60 R15: 0000000000000032
       </TASK>
      
      Allocated by task 8886:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       ____kasan_kmalloc mm/kasan/common.c:513 [inline]
       ____kasan_kmalloc mm/kasan/common.c:472 [inline]
       __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:522
       kasan_kmalloc include/linux/kasan.h:269 [inline]
       kmem_cache_alloc_trace+0x1ea/0x4a0 mm/slab.c:3575
       kmalloc include/linux/slab.h:590 [inline]
       nft_setelem_catchall_insert net/netfilter/nf_tables_api.c:5544 [inline]
       nft_setelem_insert net/netfilter/nf_tables_api.c:5562 [inline]
       nft_add_set_elem+0x232e/0x2f40 net/netfilter/nf_tables_api.c:5936
       nf_tables_newsetelem+0x6ff/0xbb0 net/netfilter/nf_tables_api.c:6032
       nfnetlink_rcv_batch+0x1710/0x25f0 net/netfilter/nfnetlink.c:513
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:652
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 15335:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xd1/0x110 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       __cache_free mm/slab.c:3445 [inline]
       kmem_cache_free_bulk+0x67/0x1e0 mm/slab.c:3766
       kfree_bulk include/linux/slab.h:446 [inline]
       kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3273
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xb5/0xe0 mm/kasan/generic.c:348
       kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3550
       nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4489 [inline]
       nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
       nft_set_destroy+0x34a/0x4f0 net/netfilter/nf_tables_api.c:4493
       __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626
       nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
       blocking_notifier_call_chain kernel/notifier.c:318 [inline]
       blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306
       netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788
       __sock_release+0xcd/0x280 net/socket.c:649
       sock_close+0x18/0x20 net/socket.c:1314
       __fput+0x286/0x9f0 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff8880716e5b80
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 0 bytes inside of
       64-byte region [ffff8880716e5b80, ffff8880716e5bc0)
      The buggy address belongs to the page:
      page:ffffea0001c5b940 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880716e5c00 pfn:0x716e5
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 ffffea0000911848 ffffea00007c4d48 ffff888010c40200
      raw: ffff8880716e5c00 ffff8880716e5000 000000010000001e 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x242040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 3638, ts 211086074437, free_ts 211031029429
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       __alloc_pages_node include/linux/gfp.h:570 [inline]
       kmem_getpages mm/slab.c:1377 [inline]
       cache_grow_begin+0x75/0x470 mm/slab.c:2593
       cache_alloc_refill+0x27f/0x380 mm/slab.c:2965
       ____cache_alloc mm/slab.c:3048 [inline]
       ____cache_alloc mm/slab.c:3031 [inline]
       __do_cache_alloc mm/slab.c:3275 [inline]
       slab_alloc mm/slab.c:3316 [inline]
       __do_kmalloc mm/slab.c:3700 [inline]
       __kmalloc+0x3b3/0x4d0 mm/slab.c:3711
       kmalloc include/linux/slab.h:595 [inline]
       kzalloc include/linux/slab.h:724 [inline]
       tomoyo_get_name+0x234/0x480 security/tomoyo/memory.c:173
       tomoyo_parse_name_union+0xbc/0x160 security/tomoyo/util.c:260
       tomoyo_update_path_number_acl security/tomoyo/file.c:687 [inline]
       tomoyo_write_file+0x629/0x7f0 security/tomoyo/file.c:1034
       tomoyo_write_domain2+0x116/0x1d0 security/tomoyo/common.c:1152
       tomoyo_add_entry security/tomoyo/common.c:2042 [inline]
       tomoyo_supervisor+0xbc7/0xf00 security/tomoyo/common.c:2103
       tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
       tomoyo_path_number_perm+0x419/0x590 security/tomoyo/file.c:734
       security_file_ioctl+0x50/0xb0 security/security.c:1541
       __do_sys_ioctl fs/ioctl.c:868 [inline]
       __se_sys_ioctl fs/ioctl.c:860 [inline]
       __x64_sys_ioctl+0xb3/0x200 fs/ioctl.c:860
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       slab_destroy mm/slab.c:1627 [inline]
       slabs_destroy+0x89/0xc0 mm/slab.c:1647
       cache_flusharray mm/slab.c:3418 [inline]
       ___cache_free+0x4cc/0x610 mm/slab.c:3480
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x4e/0x110 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0x97/0xb0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slab.c:3261 [inline]
       kmem_cache_alloc_node+0x2ea/0x590 mm/slab.c:3599
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       nlmsg_new include/net/netlink.h:953 [inline]
       rtmsg_ifinfo_build_skb+0x72/0x1a0 net/core/rtnetlink.c:3808
       rtmsg_ifinfo_event net/core/rtnetlink.c:3844 [inline]
       rtmsg_ifinfo_event net/core/rtnetlink.c:3835 [inline]
       rtmsg_ifinfo+0x83/0x120 net/core/rtnetlink.c:3853
       netdev_state_change net/core/dev.c:1395 [inline]
       netdev_state_change+0x114/0x130 net/core/dev.c:1386
       linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:167
       __linkwatch_run_queue+0x233/0x6a0 net/core/link_watch.c:213
       linkwatch_event+0x4a/0x60 net/core/link_watch.c:252
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
      
      Memory state around the buggy address:
       ffff8880716e5a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8880716e5b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
      >ffff8880716e5b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff8880716e5c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8880716e5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      
      Fixes: aaa31047 ("netfilter: nftables: add catch-all set element support")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0f7d9b31
  8. 15 12月, 2021 3 次提交
    • M
      mptcp: fix deadlock in __mptcp_push_pending() · 3d79e375
      Maxim Galaganov 提交于
      __mptcp_push_pending() may call mptcp_flush_join_list() with subflow
      socket lock held. If such call hits mptcp_sockopt_sync_all() then
      subsequently __mptcp_sockopt_sync() could try to lock the subflow
      socket for itself, causing a deadlock.
      
      sysrq: Show Blocked State
      task:ss-server       state:D stack:    0 pid:  938 ppid:     1 flags:0x00000000
      Call Trace:
       <TASK>
       __schedule+0x2d6/0x10c0
       ? __mod_memcg_state+0x4d/0x70
       ? csum_partial+0xd/0x20
       ? _raw_spin_lock_irqsave+0x26/0x50
       schedule+0x4e/0xc0
       __lock_sock+0x69/0x90
       ? do_wait_intr_irq+0xa0/0xa0
       __lock_sock_fast+0x35/0x50
       mptcp_sockopt_sync_all+0x38/0xc0
       __mptcp_push_pending+0x105/0x200
       mptcp_sendmsg+0x466/0x490
       sock_sendmsg+0x57/0x60
       __sys_sendto+0xf0/0x160
       ? do_wait_intr_irq+0xa0/0xa0
       ? fpregs_restore_userregs+0x12/0xd0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f9ba546c2d0
      RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
      RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
      RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
      R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8
       </TASK>
      
      Fix the issue by using __mptcp_flush_join_list() instead of plain
      mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
      Florian. The sockopt sync will be deferred to the workqueue.
      
      Fixes: 1b3e7ede ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244Suggested-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMaxim Galaganov <max@internet.ru>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3d79e375
    • F
      mptcp: clear 'kern' flag from fallback sockets · d6692b3b
      Florian Westphal 提交于
      The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
      It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
      working for plain tcp sockets (any userspace-exposed socket).
      
      But in case of fallback, accept() can return a plain tcp sk.
      In such case, sk is still tagged as 'kernel' and setsockopt will work.
      
      This will crash the kernel, The subflow extension has a NULL ctx->conn
      mptcp socket:
      
      BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
      Call Trace:
       tcp_data_ready+0xf8/0x370
       [..]
      
      Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d6692b3b
    • F
      mptcp: remove tcp ulp setsockopt support · 404cd9a2
      Florian Westphal 提交于
      TCP_ULP setsockopt cannot be used for mptcp because its already
      used internally to plumb subflow (tcp) sockets to the mptcp layer.
      
      syzbot managed to trigger a crash for mptcp connections that are
      in fallback mode:
      
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
      RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
      [..]
       __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
       tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
       do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
       mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638
      
      Remove support for TCP_ULP setsockopt.
      
      Fixes: d9e4c129 ("mptcp: only admit explicitly supported sockopt")
      Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      404cd9a2
  9. 14 12月, 2021 14 次提交
  10. 13 12月, 2021 1 次提交
    • D
      net/sched: sch_ets: don't remove idle classes from the round-robin list · c062f2a0
      Davide Caratti 提交于
      Shuang reported that the following script:
      
       1) tc qdisc add dev ddd0 handle 10: parent 1: ets bands 8 strict 4 priomap 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
       2) mausezahn ddd0  -A 10.10.10.1 -B 10.10.10.2 -c 0 -a own -b 00:c1:a0:c1:a0:00 -t udp &
       3) tc qdisc change dev ddd0 handle 10: ets bands 4 strict 2 quanta 2500 2500 priomap 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      
      crashes systematically when line 2) is commented:
      
       list_del corruption, ffff8e028404bd30->next is LIST_POISON1 (dead000000000100)
       ------------[ cut here ]------------
       kernel BUG at lib/list_debug.c:47!
       invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 0 PID: 954 Comm: tc Not tainted 5.16.0-rc4+ #478
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Call Trace:
        <TASK>
        ets_qdisc_change+0x58b/0xa70 [sch_ets]
        tc_modify_qdisc+0x323/0x880
        rtnetlink_rcv_msg+0x169/0x4a0
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1a5/0x280
        netlink_sendmsg+0x257/0x4d0
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1f2/0x260
        ___sys_sendmsg+0x7c/0xc0
        __sys_sendmsg+0x57/0xa0
        do_syscall_64+0x3a/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7efdc8031338
       Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
       RSP: 002b:00007ffdf1ce9828 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000061b37a97 RCX: 00007efdc8031338
       RDX: 0000000000000000 RSI: 00007ffdf1ce9890 RDI: 0000000000000003
       RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000078a940
       R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000688880 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev pcspkr i2c_i801 virtio_balloon i2c_smbus lpc_ich ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw ghash_clmulni_intel ahci libahci libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: sch_ets]
       ---[ end trace f35878d1912655c2 ]---
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x4e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      we can remove 'q->classes[i].alist' only if DRR class 'i' was part of the
      active list. In the ETS scheduler DRR classes belong to that list only if
      the queue length is greater than zero: we need to test for non-zero value
      of 'q->classes[i].qdisc->q.qlen' before removing from the list, similarly
      to what has been done elsewhere in the ETS code.
      
      Fixes: de6d2592 ("net/sched: sch_ets: don't peek at classes beyond 'nbands'")
      Reported-by: NShuang Li <shuali@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c062f2a0
  11. 11 12月, 2021 3 次提交
    • E
      inet_diag: fix kernel-infoleak for UDP sockets · 71ddeac8
      Eric Dumazet 提交于
      KMSAN reported a kernel-infoleak [1], that can exploited
      by unpriv users.
      
      After analysis it turned out UDP was not initializing
      r->idiag_expires. Other users of inet_sk_diag_fill()
      might make the same mistake in the future, so fix this
      in inet_sk_diag_fill().
      
      [1]
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
      BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:156 [inline]
      BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x69d/0x25c0 lib/iov_iter.c:670
       instrument_copy_to_user include/linux/instrumented.h:121 [inline]
       copyout lib/iov_iter.c:156 [inline]
       _copy_to_iter+0x69d/0x25c0 lib/iov_iter.c:670
       copy_to_iter include/linux/uio.h:155 [inline]
       simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519
       __skb_datagram_iter+0x2cb/0x1280 net/core/datagram.c:425
       skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533
       skb_copy_datagram_msg include/linux/skbuff.h:3657 [inline]
       netlink_recvmsg+0x660/0x1c60 net/netlink/af_netlink.c:1974
       sock_recvmsg_nosec net/socket.c:944 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1035
       call_read_iter include/linux/fs.h:2156 [inline]
       new_sync_read fs/read_write.c:400 [inline]
       vfs_read+0x1631/0x1980 fs/read_write.c:481
       ksys_read+0x28c/0x520 fs/read_write.c:619
       __do_sys_read fs/read_write.c:629 [inline]
       __se_sys_read fs/read_write.c:627 [inline]
       __x64_sys_read+0xdb/0x120 fs/read_write.c:627
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4974
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1126 [inline]
       netlink_dump+0x3d5/0x16a0 net/netlink/af_netlink.c:2245
       __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370
       netlink_dump_start include/linux/netlink.h:254 [inline]
       inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1343
       sock_diag_rcv_msg+0x24a/0x620
       netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491
       sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:276
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       sock_write_iter+0x594/0x690 net/socket.c:1057
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_write+0x52c/0x1500 fs/read_write.c:851
       vfs_writev fs/read_write.c:924 [inline]
       do_writev+0x63f/0xe30 fs/read_write.c:967
       __do_sys_writev fs/read_write.c:1040 [inline]
       __se_sys_writev fs/read_write.c:1037 [inline]
       __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Bytes 68-71 of 312 are uninitialized
      Memory access of size 312 starts at ffff88812ab54000
      Data copied to user address 0000000020001440
      
      CPU: 1 PID: 6365 Comm: syz-executor801 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3c4d05c8 ("inet_diag: Introduce the inet socket dumping routine")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211209185058.53917-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      71ddeac8
    • H
      phonet: refcount leak in pep_sock_accep · bcd0f933
      Hangyu Hua 提交于
      sock_hold(sk) is invoked in pep_sock_accept(), but __sock_put(sk) is not
      invoked in subsequent failure branches(pep_accept_conn() != 0).
      Signed-off-by: NHangyu Hua <hbh25y@gmail.com>
      Link: https://lore.kernel.org/r/20211209082839.33985-1-hbh25y@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bcd0f933
    • E
      sch_cake: do not call cake_destroy() from cake_init() · ab443c53
      Eric Dumazet 提交于
      qdiscs are not supposed to call their own destroy() method
      from init(), because core stack already does that.
      
      syzbot was able to trigger use after free:
      
      DEBUG_LOCKS_WARN_ON(lock->magic != lock)
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline]
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Modules linked in:
      CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline]
      RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8
      RSP: 0018:ffffc9000627f290 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44
      RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000
      FS:  0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0
      Call Trace:
       <TASK>
       tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810
       tcf_block_put_ext net/sched/cls_api.c:1381 [inline]
       tcf_block_put_ext net/sched/cls_api.c:1376 [inline]
       tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394
       cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695
       qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f1bb06badb9
      Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f.
      RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9
      RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003
      R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688
      R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2
       </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20211210142046.698336-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ab443c53