1. 25 2月, 2022 2 次提交
    • L
      Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks · 29fb6083
      Luiz Augusto von Dentz 提交于
      Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
      shall return the partial chunks it could allocate instead of freeing
      everything as otherwise it can cause problems like bellow.
      
      Fixes: 81be03e0 ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      29fb6083
    • P
      openvswitch: Fix setting ipv6 fields causing hw csum failure · d9b5ae5c
      Paul Blakey 提交于
      Ipv6 ttl, label and tos fields are modified without first
      pulling/pushing the ipv6 header, which would have updated
      the hw csum (if available). This might cause csum validation
      when sending the packet to the stack, as can be seen in
      the trace below.
      
      Fix this by updating skb->csum if available.
      
      Trace resulted by ipv6 ttl dec and then sending packet
      to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
      [295241.900063] s_pf0vf2: hw csum failure
      [295241.923191] Call Trace:
      [295241.925728]  <IRQ>
      [295241.927836]  dump_stack+0x5c/0x80
      [295241.931240]  __skb_checksum_complete+0xac/0xc0
      [295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
      [295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
      [295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
      [295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
      [295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
      [295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
      [295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
      [295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
      [295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
      [295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
      [295242.047498]  netif_receive_skb_internal+0x3e/0xc0
      [295242.052291]  napi_gro_receive+0xba/0xe0
      [295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
      [295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
      [295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
      [295242.077958]  net_rx_action+0x149/0x3b0
      [295242.086762]  __do_softirq+0xd7/0x2d6
      [295242.090427]  irq_exit+0xf7/0x100
      [295242.093748]  do_IRQ+0x7f/0xd0
      [295242.096806]  common_interrupt+0xf/0xf
      [295242.100559]  </IRQ>
      [295242.102750] RIP: 0033:0x7f9022e88cbd
      [295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
      [295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
      [295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
      [295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
      [295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
      [295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40
      
      Fixes: 3fdbd1ce ("openvswitch: add ipv6 'set' action")
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d9b5ae5c
  2. 20 2月, 2022 2 次提交
    • P
      netfilter: nf_tables_offload: incorrect flow offload action array size · b1a5983f
      Pablo Neira Ayuso 提交于
      immediate verdict expression needs to allocate one slot in the flow offload
      action array, however, immediate data expression does not need to do so.
      
      fwd and dup expression need to allocate one slot, this is missing.
      
      Add a new offload_action interface to report if this expression needs to
      allocate one slot in the flow offload action array.
      
      Fixes: be2861dc ("netfilter: nft_{fwd,dup}_netdev: add offload support")
      Reported-and-tested-by: NNick Gregory <Nick.Gregory@Sophos.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b1a5983f
    • C
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy 提交于
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5486f5bf
  3. 18 2月, 2022 2 次提交
    • E
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet 提交于
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • E
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet 提交于
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d95d6320
  4. 17 2月, 2022 1 次提交
  5. 15 2月, 2022 1 次提交
    • E
      bonding: fix data-races around agg_select_timer · 9ceaf6f7
      Eric Dumazet 提交于
      syzbot reported that two threads might write over agg_select_timer
      at the same time. Make agg_select_timer atomic to fix the races.
      
      BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
      
      read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
       bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0:
       bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998
       bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967
       __dev_open+0x274/0x3a0 net/core/dev.c:1407
       dev_open+0x54/0x190 net/core/dev.c:1443
       bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937
       do_set_master net/core/rtnetlink.c:2532 [inline]
       do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736
       __rtnl_newlink net/core/rtnetlink.c:3414 [inline]
       rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000050 -> 0x0000004f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G        W         5.17.0-rc4-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ceaf6f7
  6. 14 2月, 2022 2 次提交
    • V
      net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN · a2614140
      Vladimir Oltean 提交于
      mv88e6xxx is special among DSA drivers in that it requires the VTU to
      contain the VID of the FDB entry it modifies in
      mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
      
      Sometimes due to races this is not always satisfied even if external
      code does everything right (first deletes the FDB entries, then the
      VLAN), because DSA commits to hardware FDB entries asynchronously since
      commit c9eb3e0f ("net: dsa: Add support for learning FDB through
      notification").
      
      Therefore, the mv88e6xxx driver must close this race condition by
      itself, by asking DSA to flush the switchdev workqueue of any FDB
      deletions in progress, prior to exiting a VLAN.
      
      Fixes: c9eb3e0f ("net: dsa: Add support for learning FDB through notification")
      Reported-by: NRafael Richter <rafael.richter@gin.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2614140
    • I
      ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() · 26394fc1
      Ignat Korchagin 提交于
      Some time ago 8965779d ("ipv6,mcast: always hold idev->lock before mca_lock")
      switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe
      version. That was OK, because idev->lock was held for these codepaths.
      
      In 88e2ca30 ("mld: convert ifmcaddr6 to RCU") these external locks were
      removed, so we probably need to restore the original rcu-safe call.
      
      Otherwise, we occasionally get a machine crashed/stalled with the following
      in dmesg:
      
      [ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI
      [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G           O      5.15.19-cloudflare-2022.2.1 #1
      [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV
      [ 3406.009552][T230589] Workqueue: mld mld_ifc_work
      [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60
      [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b
      [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202
      [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040
      [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008
      [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000
      [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100
      [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000
      [ 3406.125730][T230589] FS:  0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000
      [ 3406.138992][T230589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0
      [ 3406.162421][T230589] Call Trace:
      [ 3406.170235][T230589]  <TASK>
      [ 3406.177736][T230589]  mld_newpack+0xfe/0x1a0
      [ 3406.186686][T230589]  add_grhead+0x87/0xa0
      [ 3406.195498][T230589]  add_grec+0x485/0x4e0
      [ 3406.204310][T230589]  ? newidle_balance+0x126/0x3f0
      [ 3406.214024][T230589]  mld_ifc_work+0x15d/0x450
      [ 3406.223279][T230589]  process_one_work+0x1e6/0x380
      [ 3406.232982][T230589]  worker_thread+0x50/0x3a0
      [ 3406.242371][T230589]  ? rescuer_thread+0x360/0x360
      [ 3406.252175][T230589]  kthread+0x127/0x150
      [ 3406.261197][T230589]  ? set_kthread_struct+0x40/0x40
      [ 3406.271287][T230589]  ret_from_fork+0x22/0x30
      [ 3406.280812][T230589]  </TASK>
      [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders]
      [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---
      
      Fixes: 88e2ca30 ("mld: convert ifmcaddr6 to RCU")
      Reported-by: NDavid Pinilla Caparros <dpini@cloudflare.com>
      Signed-off-by: NIgnat Korchagin <ignat@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26394fc1
  7. 09 2月, 2022 2 次提交
  8. 04 2月, 2022 1 次提交
  9. 03 2月, 2022 1 次提交
    • D
      net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work · 4a81f6da
      Daniel Borkmann 提交于
      syzkaller was able to trigger a deadlock for NTF_MANAGED entries [0]:
      
        kworker/0:16/14617 is trying to acquire lock:
        ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
        [...]
        but task is already holding lock:
        ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: neigh_managed_work+0x35/0x250 net/core/neighbour.c:1572
      
      The neighbor entry turned to NUD_FAILED state, where __neigh_event_send()
      triggered an immediate probe as per commit cd28ca0a ("neigh: reduce
      arp latency") via neigh_probe() given table lock was held.
      
      One option to fix this situation is to defer the neigh_probe() back to
      the neigh_timer_handler() similarly as pre cd28ca0a. For the case
      of NTF_MANAGED, this deferral is acceptable given this only happens on
      actual failure state and regular / expected state is NUD_VALID with the
      entry already present.
      
      The fix adds a parameter to __neigh_event_send() in order to communicate
      whether immediate probe is allowed or disallowed. Existing call-sites
      of neigh_event_send() default as-is to immediate probe. However, the
      neigh_managed_work() disables it via use of neigh_event_send_probe().
      
      [0] <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
        print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
        check_deadlock kernel/locking/lockdep.c:2999 [inline]
        validate_chain kernel/locking/lockdep.c:3788 [inline]
        __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5027
        lock_acquire kernel/locking/lockdep.c:5639 [inline]
        lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
        __raw_write_lock_bh include/linux/rwlock_api_smp.h:202 [inline]
        _raw_write_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:334
        ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
        ip6_finish_output2+0x1070/0x14f0 net/ipv6/ip6_output.c:123
        __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
        __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
        ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
        NF_HOOK_COND include/linux/netfilter.h:296 [inline]
        ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
        dst_output include/net/dst.h:451 [inline]
        NF_HOOK include/linux/netfilter.h:307 [inline]
        ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508
        ndisc_send_ns+0x3a9/0x840 net/ipv6/ndisc.c:650
        ndisc_solicit+0x2cd/0x4f0 net/ipv6/ndisc.c:742
        neigh_probe+0xc2/0x110 net/core/neighbour.c:1040
        __neigh_event_send+0x37d/0x1570 net/core/neighbour.c:1201
        neigh_event_send include/net/neighbour.h:470 [inline]
        neigh_managed_work+0x162/0x250 net/core/neighbour.c:1574
        process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
        worker_thread+0x657/0x1110 kernel/workqueue.c:2454
        kthread+0x2e9/0x3a0 kernel/kthread.c:377
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
        </TASK>
      
      Fixes: 7482e384 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries")
      Reported-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Tested-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220201193942.5055-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      4a81f6da
  10. 28 1月, 2022 3 次提交
  11. 27 1月, 2022 1 次提交
  12. 24 1月, 2022 1 次提交
  13. 21 1月, 2022 2 次提交
    • E
      ipv6: annotate accesses to fn->fn_sernum · aafc2e32
      Eric Dumazet 提交于
      struct fib6_node's fn_sernum field can be
      read while other threads change it.
      
      Add READ_ONCE()/WRITE_ONCE() annotations.
      
      Do not change existing smp barriers in fib6_get_cookie_safe()
      and __fib6_update_sernum_upto_root()
      
      syzbot reported:
      
      BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket
      
      write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
       fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
       fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
       fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
       fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
       __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
       fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
       rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
       addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
       addrconf_dad_work+0x908/0x1170
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:359
       ret_from_fork+0x1f/0x30
      
      read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
       fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
       rt6_get_cookie include/net/ip6_fib.h:306 [inline]
       ip6_dst_store include/net/ip6_route.h:234 [inline]
       inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
       inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
       mptcp_push_release net/mptcp/protocol.c:1491 [inline]
       __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
       mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       kernel_sendmsg+0x97/0xd0 net/socket.c:745
       sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
       inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
       kernel_sendpage+0x187/0x200 net/socket.c:3492
       sock_sendpage+0x5a/0x70 net/socket.c:1007
       pipe_to_sendpage+0x128/0x160 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x207/0x500 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x80/0xa0 fs/splice.c:936
       splice_direct_to_actor+0x345/0x650 fs/splice.c:891
       do_splice_direct+0x106/0x190 fs/splice.c:979
       do_sendfile+0x675/0xc40 fs/read_write.c:1245
       __do_sys_sendfile64 fs/read_write.c:1310 [inline]
       __se_sys_sendfile64 fs/read_write.c:1296 [inline]
       __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000026f -> 0x00000271
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      The Fixes tag I chose is probably arbitrary, I do not think
      we need to backport this patch to older kernels.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      aafc2e32
    • G
      tcp: Add a stub for sk_defer_free_flush() · 48cec899
      Gal Pressman 提交于
      When compiling the kernel with CONFIG_INET disabled, the
      sk_defer_free_flush() should be defined as a nop.
      
      This resolves the following compilation error:
        ld: net/core/sock.o: in function `sk_defer_free_flush':
        ./include/net/tcp.h:1378: undefined reference to `__sk_defer_free_flush'
      
      Fixes: 79074a72 ("net: Flush deferred skb free on socket destroy")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NGal Pressman <gal@nvidia.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220120123440.9088-1-gal@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      48cec899
  14. 14 1月, 2022 1 次提交
    • K
      net_sched: restore "mpu xxx" handling · fb80445c
      Kevin Bracey 提交于
      commit 56b765b7 ("htb: improved accuracy at high rates") broke
      "overhead X", "linklayer atm" and "mpu X" attributes.
      
      "overhead X" and "linklayer atm" have already been fixed. This restores
      the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping:
      
          tc class add ... htb rate X overhead 4 mpu 64
      
      The code being fixed is used by htb, tbf and act_police. Cake has its
      own mpu handling. qdisc_calculate_pkt_len still uses the size table
      containing values adjusted for mpu by user space.
      
      iproute2 tc has always passed mpu into the kernel via a tc_ratespec
      structure, but the kernel never directly acted on it, merely stored it
      so that it could be read back by `tc class show`.
      
      Rather, tc would generate length-to-time tables that included the mpu
      (and linklayer) in their construction, and the kernel used those tables.
      
      Since v3.7, the tables were no longer used. Along with "mpu", this also
      broke "overhead" and "linklayer" which were fixed in 01cb71d2
      ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84
      ("net_sched: restore "linklayer atm" handling", v3.11).
      
      "overhead" was fixed by simply restoring use of tc_ratespec::overhead -
      this had originally been used by the kernel but was initially omitted
      from the new non-table-based calculations.
      
      "linklayer" had been handled in the table like "mpu", but the mode was
      not originally passed in tc_ratespec. The new implementation was made to
      handle it by getting new versions of tc to pass the mode in an extended
      tc_ratespec, and for older versions of tc the table contents were analysed
      at load time to deduce linklayer.
      
      As "mpu" has always been given to the kernel in tc_ratespec,
      accompanying the mpu-based table, we can restore system functionality
      with no userspace change by making the kernel act on the tc_ratespec
      value.
      
      Fixes: 56b765b7 ("htb: improved accuracy at high rates")
      Signed-off-by: NKevin Bracey <kevin@bracey.fi>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Vimalkumar <j.vimal@gmail.com>
      Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fiSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      fb80445c
  15. 13 1月, 2022 1 次提交
  16. 12 1月, 2022 1 次提交
  17. 10 1月, 2022 9 次提交
  18. 07 1月, 2022 1 次提交
    • M
      net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() · 91a760b2
      Menglong Dong 提交于
      The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
      __inet_bind() is not handled properly. While the return value
      is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
      exit:
      
      	err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
      	if (err) {
      		inet->inet_saddr = inet->inet_rcv_saddr = 0;
      		goto out_release_sock;
      	}
      
      Let's take UDP for example and see what will happen. For UDP
      socket, it will be added to 'udp_prot.h.udp_table->hash' and
      'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
      called success. If 'inet->inet_rcv_saddr' is specified here,
      then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
      to (because inet_saddr is changed to 0), and UDP packet received
      will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
      specified here, the sock will work fine, as it can receive packet
      properly, which is wired, as the 'bind()' is already failed.
      
      To undo the get_port() operation, introduce the 'put_port' field
      for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
      proto, it is udp_lib_unhash(); For icmp proto, it is
      ping_unhash().
      
      Therefore, after sys_bind() fail caused by
      BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
      means that it can try to be binded to another port.
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220106132022.3470772-2-imagedong@tencent.com
      91a760b2
  19. 06 1月, 2022 6 次提交