1. 17 12月, 2021 2 次提交
    • E
      sit: do not call ipip6_dev_free() from sit_init_net() · e28587cc
      Eric Dumazet 提交于
      ipip6_dev_free is sit dev->priv_destructor, already called
      by register_netdevice() if something goes wrong.
      
      Alternative would be to make ipip6_dev_free() robust against
      multiple invocations, but other drivers do not implement this
      strategy.
      
      syzbot reported:
      
      dst_release underflow
      WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
      Modules linked in:
      CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
      Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
      RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246
      RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000
      RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000
      RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c
      R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358
      R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000
      FS:  00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
       ipip6_dev_free net/ipv6/sit.c:1414 [inline]
       sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
       ops_init+0x313/0x430 net/core/net_namespace.c:140
       setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
       copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
       create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
       ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
       __do_sys_unshare kernel/fork.c:3146 [inline]
       __se_sys_unshare kernel/fork.c:3144 [inline]
       __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f66c882ce99
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200
      RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000
       </TASK>
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e28587cc
    • D
      net/smc: Prevent smc_release() from long blocking · 5c15b312
      D. Wythe 提交于
      In nginx/wrk benchmark, there's a hung problem with high probability
      on case likes that: (client will last several minutes to exit)
      
      server: smc_run nginx
      
      client: smc_run wrk -c 10000 -t 1 http://server
      
      Client hangs with the following backtrace:
      
      0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
      1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
      2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
      3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
      4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
      5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
      6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
      7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
      8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
      9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
      10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
      11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
      12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
      13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
      14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
      
      This issue dues to flush_work(), which is used to wait for
      smc_connect_work() to finish in smc_release(). Once lots of
      smc_connect_work() was pending or all executing work dangling,
      smc_release() has to block until one worker comes to free, which
      is equivalent to wait another smc_connnect_work() to finish.
      
      In order to fix this, There are two changes:
      
      1. For those idle smc_connect_work(), cancel it from the workqueue; for
         executing smc_connect_work(), waiting for it to finish. For that
         purpose, replace flush_work() with cancel_work_sync().
      
      2. Since smc_connect() hold a reference for passive closing, if
         smc_connect_work() has been cancelled, release the reference.
      
      Fixes: 24ac3a08 ("net/smc: rebuild nonblocking connect")
      Reported-by: NTony Lu <tonylu@linux.alibaba.com>
      Tested-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: ND. Wythe <alibuda@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5c15b312
  2. 16 12月, 2021 2 次提交
  3. 15 12月, 2021 3 次提交
    • M
      mptcp: fix deadlock in __mptcp_push_pending() · 3d79e375
      Maxim Galaganov 提交于
      __mptcp_push_pending() may call mptcp_flush_join_list() with subflow
      socket lock held. If such call hits mptcp_sockopt_sync_all() then
      subsequently __mptcp_sockopt_sync() could try to lock the subflow
      socket for itself, causing a deadlock.
      
      sysrq: Show Blocked State
      task:ss-server       state:D stack:    0 pid:  938 ppid:     1 flags:0x00000000
      Call Trace:
       <TASK>
       __schedule+0x2d6/0x10c0
       ? __mod_memcg_state+0x4d/0x70
       ? csum_partial+0xd/0x20
       ? _raw_spin_lock_irqsave+0x26/0x50
       schedule+0x4e/0xc0
       __lock_sock+0x69/0x90
       ? do_wait_intr_irq+0xa0/0xa0
       __lock_sock_fast+0x35/0x50
       mptcp_sockopt_sync_all+0x38/0xc0
       __mptcp_push_pending+0x105/0x200
       mptcp_sendmsg+0x466/0x490
       sock_sendmsg+0x57/0x60
       __sys_sendto+0xf0/0x160
       ? do_wait_intr_irq+0xa0/0xa0
       ? fpregs_restore_userregs+0x12/0xd0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f9ba546c2d0
      RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
      RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
      RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
      R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8
       </TASK>
      
      Fix the issue by using __mptcp_flush_join_list() instead of plain
      mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
      Florian. The sockopt sync will be deferred to the workqueue.
      
      Fixes: 1b3e7ede ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244Suggested-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMaxim Galaganov <max@internet.ru>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3d79e375
    • F
      mptcp: clear 'kern' flag from fallback sockets · d6692b3b
      Florian Westphal 提交于
      The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
      It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
      working for plain tcp sockets (any userspace-exposed socket).
      
      But in case of fallback, accept() can return a plain tcp sk.
      In such case, sk is still tagged as 'kernel' and setsockopt will work.
      
      This will crash the kernel, The subflow extension has a NULL ctx->conn
      mptcp socket:
      
      BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
      Call Trace:
       tcp_data_ready+0xf8/0x370
       [..]
      
      Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d6692b3b
    • F
      mptcp: remove tcp ulp setsockopt support · 404cd9a2
      Florian Westphal 提交于
      TCP_ULP setsockopt cannot be used for mptcp because its already
      used internally to plumb subflow (tcp) sockets to the mptcp layer.
      
      syzbot managed to trigger a crash for mptcp connections that are
      in fallback mode:
      
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
      RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
      [..]
       __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
       tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
       do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
       mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638
      
      Remove support for TCP_ULP setsockopt.
      
      Fixes: d9e4c129 ("mptcp: only admit explicitly supported sockopt")
      Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      404cd9a2
  4. 14 12月, 2021 13 次提交
  5. 13 12月, 2021 1 次提交
    • D
      net/sched: sch_ets: don't remove idle classes from the round-robin list · c062f2a0
      Davide Caratti 提交于
      Shuang reported that the following script:
      
       1) tc qdisc add dev ddd0 handle 10: parent 1: ets bands 8 strict 4 priomap 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
       2) mausezahn ddd0  -A 10.10.10.1 -B 10.10.10.2 -c 0 -a own -b 00:c1:a0:c1:a0:00 -t udp &
       3) tc qdisc change dev ddd0 handle 10: ets bands 4 strict 2 quanta 2500 2500 priomap 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      
      crashes systematically when line 2) is commented:
      
       list_del corruption, ffff8e028404bd30->next is LIST_POISON1 (dead000000000100)
       ------------[ cut here ]------------
       kernel BUG at lib/list_debug.c:47!
       invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 0 PID: 954 Comm: tc Not tainted 5.16.0-rc4+ #478
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Call Trace:
        <TASK>
        ets_qdisc_change+0x58b/0xa70 [sch_ets]
        tc_modify_qdisc+0x323/0x880
        rtnetlink_rcv_msg+0x169/0x4a0
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1a5/0x280
        netlink_sendmsg+0x257/0x4d0
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1f2/0x260
        ___sys_sendmsg+0x7c/0xc0
        __sys_sendmsg+0x57/0xa0
        do_syscall_64+0x3a/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7efdc8031338
       Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
       RSP: 002b:00007ffdf1ce9828 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000061b37a97 RCX: 00007efdc8031338
       RDX: 0000000000000000 RSI: 00007ffdf1ce9890 RDI: 0000000000000003
       RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000078a940
       R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000688880 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev pcspkr i2c_i801 virtio_balloon i2c_smbus lpc_ich ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw ghash_clmulni_intel ahci libahci libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: sch_ets]
       ---[ end trace f35878d1912655c2 ]---
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x4e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      we can remove 'q->classes[i].alist' only if DRR class 'i' was part of the
      active list. In the ETS scheduler DRR classes belong to that list only if
      the queue length is greater than zero: we need to test for non-zero value
      of 'q->classes[i].qdisc->q.qlen' before removing from the list, similarly
      to what has been done elsewhere in the ETS code.
      
      Fixes: de6d2592 ("net/sched: sch_ets: don't peek at classes beyond 'nbands'")
      Reported-by: NShuang Li <shuali@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c062f2a0
  6. 11 12月, 2021 3 次提交
    • E
      inet_diag: fix kernel-infoleak for UDP sockets · 71ddeac8
      Eric Dumazet 提交于
      KMSAN reported a kernel-infoleak [1], that can exploited
      by unpriv users.
      
      After analysis it turned out UDP was not initializing
      r->idiag_expires. Other users of inet_sk_diag_fill()
      might make the same mistake in the future, so fix this
      in inet_sk_diag_fill().
      
      [1]
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
      BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:156 [inline]
      BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x69d/0x25c0 lib/iov_iter.c:670
       instrument_copy_to_user include/linux/instrumented.h:121 [inline]
       copyout lib/iov_iter.c:156 [inline]
       _copy_to_iter+0x69d/0x25c0 lib/iov_iter.c:670
       copy_to_iter include/linux/uio.h:155 [inline]
       simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519
       __skb_datagram_iter+0x2cb/0x1280 net/core/datagram.c:425
       skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533
       skb_copy_datagram_msg include/linux/skbuff.h:3657 [inline]
       netlink_recvmsg+0x660/0x1c60 net/netlink/af_netlink.c:1974
       sock_recvmsg_nosec net/socket.c:944 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1035
       call_read_iter include/linux/fs.h:2156 [inline]
       new_sync_read fs/read_write.c:400 [inline]
       vfs_read+0x1631/0x1980 fs/read_write.c:481
       ksys_read+0x28c/0x520 fs/read_write.c:619
       __do_sys_read fs/read_write.c:629 [inline]
       __se_sys_read fs/read_write.c:627 [inline]
       __x64_sys_read+0xdb/0x120 fs/read_write.c:627
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4974
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1126 [inline]
       netlink_dump+0x3d5/0x16a0 net/netlink/af_netlink.c:2245
       __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370
       netlink_dump_start include/linux/netlink.h:254 [inline]
       inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1343
       sock_diag_rcv_msg+0x24a/0x620
       netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491
       sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:276
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       sock_write_iter+0x594/0x690 net/socket.c:1057
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_write+0x52c/0x1500 fs/read_write.c:851
       vfs_writev fs/read_write.c:924 [inline]
       do_writev+0x63f/0xe30 fs/read_write.c:967
       __do_sys_writev fs/read_write.c:1040 [inline]
       __se_sys_writev fs/read_write.c:1037 [inline]
       __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Bytes 68-71 of 312 are uninitialized
      Memory access of size 312 starts at ffff88812ab54000
      Data copied to user address 0000000020001440
      
      CPU: 1 PID: 6365 Comm: syz-executor801 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3c4d05c8 ("inet_diag: Introduce the inet socket dumping routine")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211209185058.53917-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      71ddeac8
    • H
      phonet: refcount leak in pep_sock_accep · bcd0f933
      Hangyu Hua 提交于
      sock_hold(sk) is invoked in pep_sock_accept(), but __sock_put(sk) is not
      invoked in subsequent failure branches(pep_accept_conn() != 0).
      Signed-off-by: NHangyu Hua <hbh25y@gmail.com>
      Link: https://lore.kernel.org/r/20211209082839.33985-1-hbh25y@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bcd0f933
    • E
      sch_cake: do not call cake_destroy() from cake_init() · ab443c53
      Eric Dumazet 提交于
      qdiscs are not supposed to call their own destroy() method
      from init(), because core stack already does that.
      
      syzbot was able to trigger use after free:
      
      DEBUG_LOCKS_WARN_ON(lock->magic != lock)
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline]
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Modules linked in:
      CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline]
      RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8
      RSP: 0018:ffffc9000627f290 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44
      RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000
      FS:  0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0
      Call Trace:
       <TASK>
       tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810
       tcf_block_put_ext net/sched/cls_api.c:1381 [inline]
       tcf_block_put_ext net/sched/cls_api.c:1376 [inline]
       tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394
       cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695
       qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f1bb06badb9
      Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f.
      RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9
      RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003
      R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688
      R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2
       </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20211210142046.698336-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ab443c53
  7. 10 12月, 2021 1 次提交
    • E
      net/sched: fq_pie: prevent dismantle issue · 61c24026
      Eric Dumazet 提交于
      For some reason, fq_pie_destroy() did not copy
      working code from pie_destroy() and other qdiscs,
      thus causing elusive bug.
      
      Before calling del_timer_sync(&q->adapt_timer),
      we need to ensure timer will not rearm itself.
      
      rcu: INFO: rcu_preempt self-detected stall on CPU
      rcu:    0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579
              (t=10501 jiffies g=13085 q=3989)
      NMI backtrace for cpu 0
      CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
       nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
       trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
       rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
       print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
       check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
       rcu_pending kernel/rcu/tree.c:3878 [inline]
       rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
       update_process_times+0x16d/0x200 kernel/time/timer.c:1785
       tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
       tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
       __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
       __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
       hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
       sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
      RIP: 0010:write_comp_data kernel/kcov.c:221 [inline]
      RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273
      Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 <e8> 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00
      RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000
      RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003
      RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000
       pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418
       fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383
       call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
       expire_timers kernel/time/timer.c:1466 [inline]
       __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
       __run_timers kernel/time/timer.c:1715 [inline]
       run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       run_ksoftirqd kernel/softirq.c:921 [inline]
       run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913
       smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
      Cc: Sachin D. Patil <sdp.sachin@gmail.com>
      Cc: V. Saicharan <vsaicharan1998@gmail.com>
      Cc: Mohit Bhasi <mohitbhasi1998@gmail.com>
      Cc: Leslie Monis <lesliemonis@gmail.com>
      Cc: Gautam Ramakrishnan <gautamramk@gmail.com>
      Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      61c24026
  8. 09 12月, 2021 5 次提交
    • A
      seg6: fix the iif in the IPv6 socket control block · ae68d933
      Andrea Mayer 提交于
      When an IPv4 packet is received, the ip_rcv_core(...) sets the receiving
      interface index into the IPv4 socket control block (v5.16-rc4,
      net/ipv4/ip_input.c line 510):
      
          IPCB(skb)->iif = skb->skb_iif;
      
      If that IPv4 packet is meant to be encapsulated in an outer IPv6+SRH
      header, the seg6_do_srh_encap(...) performs the required encapsulation.
      In this case, the seg6_do_srh_encap function clears the IPv6 socket control
      block (v5.16-rc4 net/ipv6/seg6_iptunnel.c line 163):
      
          memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
      
      The memset(...) was introduced in commit ef489749 ("ipv6: sr: clear
      IP6CB(skb) on SRH ip4ip6 encapsulation") a long time ago (2019-01-29).
      
      Since the IPv6 socket control block and the IPv4 socket control block share
      the same memory area (skb->cb), the receiving interface index info is lost
      (IP6CB(skb)->iif is set to zero).
      
      As a side effect, that condition triggers a NULL pointer dereference if
      commit 0857d6f8 ("ipv6: When forwarding count rx stats on the orig
      netdev") is applied.
      
      To fix that issue, we set the IP6CB(skb)->iif with the index of the
      receiving interface once again.
      
      Fixes: ef489749 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation")
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211208195409.12169-1-andrea.mayer@uniroma2.itSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ae68d933
    • K
      nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done · 4cd8371a
      Krzysztof Kozlowski 提交于
      The done() netlink callback nfc_genl_dump_ses_done() should check if
      received argument is non-NULL, because its allocation could fail earlier
      in dumpit() (nfc_genl_dump_ses()).
      
      Fixes: ac22ac46 ("NFC: Add a GET_SE netlink API")
      Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20211209081307.57337-1-krzysztof.kozlowski@canonical.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      4cd8371a
    • T
      nfc: fix segfault in nfc_genl_dump_devices_done · fd79a0cb
      Tadeusz Struk 提交于
      When kmalloc in nfc_genl_dump_devices() fails then
      nfc_genl_dump_devices_done() segfaults as below
      
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 0 PID: 25 Comm: kworker/0:1 Not tainted 5.16.0-rc4-01180-g2a987e65-dirty #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
      Workqueue: events netlink_sock_destruct_work
      RIP: 0010:klist_iter_exit+0x26/0x80
      Call Trace:
      <TASK>
      class_dev_iter_exit+0x15/0x20
      nfc_genl_dump_devices_done+0x3b/0x50
      genl_lock_done+0x84/0xd0
      netlink_sock_destruct+0x8f/0x270
      __sk_destruct+0x64/0x3b0
      sk_destruct+0xa8/0xd0
      __sk_free+0x2e8/0x3d0
      sk_free+0x51/0x90
      netlink_sock_destruct_work+0x1c/0x20
      process_one_work+0x411/0x710
      worker_thread+0x6fd/0xa80
      
      Link: https://syzkaller.appspot.com/bug?id=fc0fa5a53db9edd261d56e74325419faf18bd0df
      Reported-by: syzbot+f9f76f4a0766420b4a02@syzkaller.appspotmail.com
      Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
      Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20211208182742.340542-1-tadeusz.struk@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      fd79a0cb
    • J
      udp: using datalen to cap max gso segments · 158390e4
      Jianguo Wu 提交于
      The max number of UDP gso segments is intended to cap to UDP_MAX_SEGMENTS,
      this is checked in udp_send_skb():
      
          if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) {
              kfree_skb(skb);
              return -EINVAL;
          }
      
      skb->len contains network and transport header len here, we should use
      only data len instead.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/900742e5-81fb-30dc-6e0b-375c6cdd7982@163.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      158390e4
    • E
      net, neigh: clear whole pneigh_entry at alloc time · e195e9b5
      Eric Dumazet 提交于
      Commit 2c611ad9 ("net, neigh: Extend neigh->flags to 32 bit
      to allow for extensions") enables a new KMSAM warning [1]
      
      I think the bug is actually older, because the following intruction
      only occurred if ndm->ndm_flags had NTF_PROXY set.
      
      	pn->flags = ndm->ndm_flags;
      
      Let's clear all pneigh_entry fields at alloc time.
      
      [1]
      BUG: KMSAN: uninit-value in pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593
       pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593
       pneigh_dump_table net/core/neighbour.c:2715 [inline]
       neigh_dump_info+0x1e3f/0x2c60 net/core/neighbour.c:2832
       netlink_dump+0xaca/0x16a0 net/netlink/af_netlink.c:2265
       __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370
       netlink_dump_start include/linux/netlink.h:254 [inline]
       rtnetlink_rcv_msg+0x181b/0x18c0 net/core/rtnetlink.c:5534
       netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       sock_write_iter+0x594/0x690 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0x1318/0x2030 fs/read_write.c:590
       ksys_write+0x28c/0x520 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0xdb/0x120 fs/read_write.c:652
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       __kmalloc+0xc3c/0x12d0 mm/slub.c:4437
       kmalloc include/linux/slab.h:595 [inline]
       pneigh_lookup+0x60f/0xd70 net/core/neighbour.c:766
       arp_req_set_public net/ipv4/arp.c:1016 [inline]
       arp_req_set+0x430/0x10a0 net/ipv4/arp.c:1032
       arp_ioctl+0x8d4/0xb60 net/ipv4/arp.c:1232
       inet_ioctl+0x4ef/0x820 net/ipv4/af_inet.c:947
       sock_do_ioctl net/socket.c:1118 [inline]
       sock_ioctl+0xa3f/0x13e0 net/socket.c:1235
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:874 [inline]
       __se_sys_ioctl+0x2df/0x4a0 fs/ioctl.c:860
       __x64_sys_ioctl+0xd8/0x110 fs/ioctl.c:860
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU: 1 PID: 20001 Comm: syz-executor.0 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 62dd9318 ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211206165329.1049835-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e195e9b5
  9. 08 12月, 2021 3 次提交
    • E
      netfilter: conntrack: annotate data-races around ct->timeout · 802a7dc5
      Eric Dumazet 提交于
      (struct nf_conn)->timeout can be read/written locklessly,
      add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.
      
      BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get
      
      write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
       __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
       init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
       resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push_pending_frames include/net/tcp.h:1897 [inline]
       tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
       tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
       tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
       tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       __sys_sendto+0x21e/0x2c0 net/socket.c:2036
       __do_sys_sendto net/socket.c:2048 [inline]
       __se_sys_sendto net/socket.c:2044 [inline]
       __x64_sys_sendto+0x74/0x90 net/socket.c:2044
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
       nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
       ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
       __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
       resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
       tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
       __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
       tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
       tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
       inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
       rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
       rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
       rds_send_worker+0x43/0x200 net/rds/threads.c:200
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00027cc2 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: krdsd rds_send_worker
      
      Note: I chose an arbitrary commit for the Fixes: tag,
      because I do not think we need to backport this fix to very old kernels.
      
      Fixes: e37542ba ("netfilter: conntrack: avoid possible false sharing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      802a7dc5
    • P
      netfilter: nft_exthdr: break evaluation if setting TCP option fails · 962e5a40
      Pablo Neira Ayuso 提交于
      Break rule evaluation on malformed TCP options.
      
      Fixes: 99d1712b ("netfilter: exthdr: tcp option set support")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      962e5a40
    • S
      nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups · b7e945e2
      Stefano Brivio 提交于
      The sixth byte of packet data has to be looked up in the sixth group,
      not in the seventh one, even if we load the bucket data into ymm6
      (and not ymm5, for convenience of tracking stalls).
      
      Without this fix, matching on a MAC address as first field of a set,
      if 8-bit groups are selected (due to a small set size) would fail,
      that is, the given MAC address would never match.
      Reported-by: NNikita Yushchenko <nikita.yushchenko@virtuozzo.com>
      Cc: <stable@vger.kernel.org> # 5.6.x
      Fixes: 7400b063 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Tested-By: NNikita Yushchenko <nikita.yushchenko@virtuozzo.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b7e945e2
  10. 07 12月, 2021 2 次提交
  11. 03 12月, 2021 2 次提交
    • E
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet 提交于
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       __sk_rx_queue_set include/net/sock.h:1924 [inline]
       sk_rx_queue_update include/net/sock.h:1938 [inline]
       sk_mark_napi_id include/net/busy_poll.h:136 [inline]
       tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
       smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
       kthread+0x721/0x850 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
       sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
       inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
       tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
       tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
       tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Fixes: a37a0ee4 ("net: avoid uninit-value from tcp_conn_request")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Tested-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03cfda4f
    • E
      inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently · a9418924
      Eric Dumazet 提交于
      Since commit 4e1beecc ("net/sock: Add kernel config
      SOCK_RX_QUEUE_MAPPING"),
      sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING.
      
      Fixes: 54b92e84 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Tariq Toukan <tariqt@nvidia.com>
      Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9418924
  12. 02 12月, 2021 3 次提交