1. 25 2月, 2022 2 次提交
    • L
      Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks · 29fb6083
      Luiz Augusto von Dentz 提交于
      Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
      shall return the partial chunks it could allocate instead of freeing
      everything as otherwise it can cause problems like bellow.
      
      Fixes: 81be03e0 ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      29fb6083
    • P
      openvswitch: Fix setting ipv6 fields causing hw csum failure · d9b5ae5c
      Paul Blakey 提交于
      Ipv6 ttl, label and tos fields are modified without first
      pulling/pushing the ipv6 header, which would have updated
      the hw csum (if available). This might cause csum validation
      when sending the packet to the stack, as can be seen in
      the trace below.
      
      Fix this by updating skb->csum if available.
      
      Trace resulted by ipv6 ttl dec and then sending packet
      to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
      [295241.900063] s_pf0vf2: hw csum failure
      [295241.923191] Call Trace:
      [295241.925728]  <IRQ>
      [295241.927836]  dump_stack+0x5c/0x80
      [295241.931240]  __skb_checksum_complete+0xac/0xc0
      [295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
      [295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
      [295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
      [295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
      [295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
      [295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
      [295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
      [295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
      [295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
      [295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
      [295242.047498]  netif_receive_skb_internal+0x3e/0xc0
      [295242.052291]  napi_gro_receive+0xba/0xe0
      [295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
      [295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
      [295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
      [295242.077958]  net_rx_action+0x149/0x3b0
      [295242.086762]  __do_softirq+0xd7/0x2d6
      [295242.090427]  irq_exit+0xf7/0x100
      [295242.093748]  do_IRQ+0x7f/0xd0
      [295242.096806]  common_interrupt+0xf/0xf
      [295242.100559]  </IRQ>
      [295242.102750] RIP: 0033:0x7f9022e88cbd
      [295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
      [295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
      [295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
      [295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
      [295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
      [295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40
      
      Fixes: 3fdbd1ce ("openvswitch: add ipv6 'set' action")
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d9b5ae5c
  2. 20 2月, 2022 2 次提交
    • P
      netfilter: nf_tables_offload: incorrect flow offload action array size · b1a5983f
      Pablo Neira Ayuso 提交于
      immediate verdict expression needs to allocate one slot in the flow offload
      action array, however, immediate data expression does not need to do so.
      
      fwd and dup expression need to allocate one slot, this is missing.
      
      Add a new offload_action interface to report if this expression needs to
      allocate one slot in the flow offload action array.
      
      Fixes: be2861dc ("netfilter: nft_{fwd,dup}_netdev: add offload support")
      Reported-and-tested-by: NNick Gregory <Nick.Gregory@Sophos.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b1a5983f
    • C
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy 提交于
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5486f5bf
  3. 18 2月, 2022 2 次提交
    • E
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet 提交于
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • E
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet 提交于
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d95d6320
  4. 17 2月, 2022 1 次提交
  5. 15 2月, 2022 1 次提交
    • E
      bonding: fix data-races around agg_select_timer · 9ceaf6f7
      Eric Dumazet 提交于
      syzbot reported that two threads might write over agg_select_timer
      at the same time. Make agg_select_timer atomic to fix the races.
      
      BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
      
      read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
       bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0:
       bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998
       bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967
       __dev_open+0x274/0x3a0 net/core/dev.c:1407
       dev_open+0x54/0x190 net/core/dev.c:1443
       bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937
       do_set_master net/core/rtnetlink.c:2532 [inline]
       do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736
       __rtnl_newlink net/core/rtnetlink.c:3414 [inline]
       rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000050 -> 0x0000004f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G        W         5.17.0-rc4-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ceaf6f7
  6. 14 2月, 2022 3 次提交
    • E
      net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e
      Eric Dumazet 提交于
      syzbot found a data-race [1] which lead me to add __rcu
      annotations to netdev->qdisc, and proper accessors
      to get LOCKDEP support.
      
      [1]
      BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
      
      write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
       attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
       dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
       __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
       __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
       rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
       __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
       rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
       qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
       __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
       tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
       rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5891cd5e
    • V
      net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN · a2614140
      Vladimir Oltean 提交于
      mv88e6xxx is special among DSA drivers in that it requires the VTU to
      contain the VID of the FDB entry it modifies in
      mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
      
      Sometimes due to races this is not always satisfied even if external
      code does everything right (first deletes the FDB entries, then the
      VLAN), because DSA commits to hardware FDB entries asynchronously since
      commit c9eb3e0f ("net: dsa: Add support for learning FDB through
      notification").
      
      Therefore, the mv88e6xxx driver must close this race condition by
      itself, by asking DSA to flush the switchdev workqueue of any FDB
      deletions in progress, prior to exiting a VLAN.
      
      Fixes: c9eb3e0f ("net: dsa: Add support for learning FDB through notification")
      Reported-by: NRafael Richter <rafael.richter@gin.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2614140
    • I
      ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() · 26394fc1
      Ignat Korchagin 提交于
      Some time ago 8965779d ("ipv6,mcast: always hold idev->lock before mca_lock")
      switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe
      version. That was OK, because idev->lock was held for these codepaths.
      
      In 88e2ca30 ("mld: convert ifmcaddr6 to RCU") these external locks were
      removed, so we probably need to restore the original rcu-safe call.
      
      Otherwise, we occasionally get a machine crashed/stalled with the following
      in dmesg:
      
      [ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI
      [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G           O      5.15.19-cloudflare-2022.2.1 #1
      [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV
      [ 3406.009552][T230589] Workqueue: mld mld_ifc_work
      [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60
      [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b
      [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202
      [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040
      [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008
      [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000
      [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100
      [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000
      [ 3406.125730][T230589] FS:  0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000
      [ 3406.138992][T230589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0
      [ 3406.162421][T230589] Call Trace:
      [ 3406.170235][T230589]  <TASK>
      [ 3406.177736][T230589]  mld_newpack+0xfe/0x1a0
      [ 3406.186686][T230589]  add_grhead+0x87/0xa0
      [ 3406.195498][T230589]  add_grec+0x485/0x4e0
      [ 3406.204310][T230589]  ? newidle_balance+0x126/0x3f0
      [ 3406.214024][T230589]  mld_ifc_work+0x15d/0x450
      [ 3406.223279][T230589]  process_one_work+0x1e6/0x380
      [ 3406.232982][T230589]  worker_thread+0x50/0x3a0
      [ 3406.242371][T230589]  ? rescuer_thread+0x360/0x360
      [ 3406.252175][T230589]  kthread+0x127/0x150
      [ 3406.261197][T230589]  ? set_kthread_struct+0x40/0x40
      [ 3406.271287][T230589]  ret_from_fork+0x22/0x30
      [ 3406.280812][T230589]  </TASK>
      [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders]
      [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---
      
      Fixes: 88e2ca30 ("mld: convert ifmcaddr6 to RCU")
      Reported-by: NDavid Pinilla Caparros <dpini@cloudflare.com>
      Signed-off-by: NIgnat Korchagin <ignat@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26394fc1
  7. 12 2月, 2022 4 次提交
    • P
      kfence: make test case compatible with run time set sample interval · 8913c610
      Peng Liu 提交于
      The parameter kfence_sample_interval can be set via boot parameter and
      late shell command, which is convenient for automated tests and KFENCE
      parameter optimization.  However, KFENCE test case just uses
      compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
      case not run as users desired.  Export kfence_sample_interval, so that
      KFENCE test case can use run-time-set sample interval.
      
      Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.comSigned-off-by: NPeng Liu <liupeng256@huawei.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Christian Knig <christian.koenig@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8913c610
    • R
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 0764db9b
      Roman Gushchin 提交于
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0764db9b
    • Y
      bpf: Fix a bpf_timer initialization issue · 5eaed6ee
      Yonghong Song 提交于
      The patch in [1] intends to fix a bpf_timer related issue,
      but the fix caused existing 'timer' selftest to fail with
      hang or some random errors. After some debug, I found
      an issue with check_and_init_map_value() in the hashtab.c.
      More specifically, in hashtab.c, we have code
        l_new = bpf_map_kmalloc_node(&htab->map, ...)
        check_and_init_map_value(&htab->map, l_new...)
      Note that bpf_map_kmalloc_node() does not do initialization
      so l_new contains random value.
      
      The function check_and_init_map_value() intends to zero the
      bpf_spin_lock and bpf_timer if they exist in the map.
      But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
      With [1], later copy_map_value() skips copying of
      bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
      random failures for 'timer' selftest.
      Without [1], for both bpf_spin_lock and bpf_timer case,
      bpf_timer will be zero'ed, so 'timer' self test is okay.
      
      For check_and_init_map_value(), why bpf_spin_lock is zero'ed
      properly while bpf_timer not. In bpf uapi header, we have
        struct bpf_spin_lock {
              __u32   val;
        };
        struct bpf_timer {
              __u64 :64;
              __u64 :64;
        } __attribute__((aligned(8)));
      
      The initialization code:
        *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
            (struct bpf_spin_lock){};
        *(struct bpf_timer *)(dst + map->timer_off) =
            (struct bpf_timer){};
      It appears the compiler has no obligation to initialize anonymous fields.
      For example, let us use clang with bpf target as below:
        $ cat t.c
        struct bpf_timer {
              unsigned long long :64;
        };
        struct bpf_timer2 {
              unsigned long long a;
        };
      
        void test(struct bpf_timer *t) {
          *t = (struct bpf_timer){};
        }
        void test2(struct bpf_timer2 *t) {
          *t = (struct bpf_timer2){};
        }
        $ clang -target bpf -O2 -c -g t.c
        $ llvm-objdump -d t.o
         ...
         0000000000000000 <test>:
             0:       95 00 00 00 00 00 00 00 exit
         0000000000000008 <test2>:
             1:       b7 02 00 00 00 00 00 00 r2 = 0
             2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
             3:       95 00 00 00 00 00 00 00 exit
      
      gcc11.2 does not have the above issue. But from
        INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
        Programming languages — C
        http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
        page 157:
        Except where explicitly stated otherwise, for the purposes of
        this subclause unnamed members of objects of structure and union
        type do not participate in initialization. Unnamed members of
        structure objects have indeterminate value even after initialization.
      
      To fix the problem, let use memset for bpf_timer case in
      check_and_init_map_value(). For consistency, memset is also
      used for bpf_spin_lock case.
      
        [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
      
      Fixes: 68134668 ("bpf: Add map side support for bpf timers.")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
      5eaed6ee
    • K
      bpf: Fix crash due to incorrect copy_map_value · a8abb0c3
      Kumar Kartikeya Dwivedi 提交于
      When both bpf_spin_lock and bpf_timer are present in a BPF map value,
      copy_map_value needs to skirt both objects when copying a value into and
      out of the map. However, the current code does not set both s_off and
      t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock
      is placed in map value with bpf_timer, as bpf_map_update_elem call will
      be able to overwrite the other timer object.
      
      When the issue is not fixed, an overwriting can produce the following
      splat:
      
      [root@(none) bpf]# ./test_progs -t timer_crash
      [   15.930339] bpf_testmod: loading out-of-tree module taints kernel.
      [   16.037849] ==================================================================
      [   16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325
      [   16.039399]
      [   16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G           OE     5.16.0+ #278
      [   16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
      [   16.040485] Call Trace:
      [   16.040645]  <TASK>
      [   16.040805]  dump_stack_lvl+0x59/0x73
      [   16.041069]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.041427]  kasan_report.cold+0x116/0x11b
      [   16.041673]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042040]  __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042328]  ? memcpy+0x39/0x60
      [   16.042552]  ? pv_hash+0xd0/0xd0
      [   16.042785]  ? lockdep_hardirqs_off+0x95/0xd0
      [   16.043079]  __bpf_spin_lock_irqsave+0xdf/0xf0
      [   16.043366]  ? bpf_get_current_comm+0x50/0x50
      [   16.043608]  ? jhash+0x11a/0x270
      [   16.043848]  bpf_timer_cancel+0x34/0xe0
      [   16.044119]  bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81
      [   16.044500]  bpf_trampoline_6442477838_0+0x36/0x1000
      [   16.044836]  __x64_sys_nanosleep+0x5/0x140
      [   16.045119]  do_syscall_64+0x59/0x80
      [   16.045377]  ? lock_is_held_type+0xe4/0x140
      [   16.045670]  ? irqentry_exit_to_user_mode+0xa/0x40
      [   16.046001]  ? mark_held_locks+0x24/0x90
      [   16.046287]  ? asm_exc_page_fault+0x1e/0x30
      [   16.046569]  ? asm_exc_page_fault+0x8/0x30
      [   16.046851]  ? lockdep_hardirqs_on+0x7e/0x100
      [   16.047137]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.047405] RIP: 0033:0x7f9e4831718d
      [   16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
      [   16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
      [   16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d
      [   16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0
      [   16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0
      [   16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30
      [   16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [   16.051608]  </TASK>
      [   16.051762] ==================================================================
      
      Fixes: 68134668 ("bpf: Add map side support for bpf timers.")
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
      a8abb0c3
  8. 09 2月, 2022 3 次提交
  9. 08 2月, 2022 2 次提交
    • R
      PM: s2idle: ACPI: Fix wakeup interrupts handling · cb1f65c1
      Rafael J. Wysocki 提交于
      After commit e3728b50 ("ACPI: PM: s2idle: Avoid possible race
      related to the EC GPE") wakeup interrupts occurring immediately after
      the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
      the SCI triggers again immediately after the rearming in
      acpi_s2idle_wake(), that wakeup may be missed too.
      
      The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
      when pm_wakeup_irq is 0, but that's not the case any more after the
      interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
      cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
      there may be wakeup interrupts occurring in that time frame and if
      that happens, they will be missed.
      
      To address that issue first move the clearing of pm_wakeup_irq to
      the point at which it is known that the interrupt causing
      acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
      for wakeup.  Moreover, because that only reduces the size of the
      time window in which the issue may manifest itself, allow
      pm_system_irq_wakeup() to register two second wakeup interrupts in
      a row and, when discarding the first one, replace it with the second
      one.  [Of course, this assumes that only one wakeup interrupt can be
      discarded in one go, but currently that is the case and I am not
      aware of any plans to change that.]
      
      Fixes: e3728b50 ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb1f65c1
    • M
      Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64) · 6bf625a4
      Michael Kelley 提交于
      Using DMA_BIT_MASK(64) as an initializer for a global variable
      causes problems with Clang 12.0.1. The compiler doesn't understand
      that value 64 is excluded from the shift at compile time, resulting
      in a build error.
      
      While this is a compiler problem, avoid the issue by setting up
      the dma_mask memory as part of struct hv_device, and initialize
      it using dma_set_mask().
      Reported-by: NNathan Chancellor <nathan@kernel.org>
      Reported-by: NVitaly Chikunov <vt@altlinux.org>
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Fixes: 743b237c ("scsi: storvsc: Add Isolation VM support for storvsc driver")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      6bf625a4
  10. 07 2月, 2022 1 次提交
    • D
      ata: libata-core: Fix ata_dev_config_cpr() · fda17afc
      Damien Le Moal 提交于
      The concurrent positioning ranges log page 47h is a general purpose log
      page and not a subpage of the indentify device log. Using
      ata_identify_page_supported() to test for concurrent positioning ranges
      support is thus wrong. ata_log_supported() must be used.
      
      Furthermore, unlike other advanced ATA features (e.g. NCQ priority),
      accesses to the concurrent positioning ranges log page are not gated by
      a feature bit from the device IDENTIFY data. Since many older drives
      react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
      to read device log pages, avoid problems with older drives by limiting
      the concurrent positioning ranges support detection to drives
      implementing at least the ACS-4 ATA standard (major version 11). This
      additional condition effectively turns ata_dev_config_cpr() into a nop
      for older drives, avoiding problems in the field.
      
      Fixes: fe22e1c2 ("libata: support concurrent positioning ranges log")
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519
      Cc: stable@vger.kernel.org
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Tested-by: NAbderraouf Adjal <adjal.arf@gmail.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      fda17afc
  11. 05 2月, 2022 3 次提交
  12. 04 2月, 2022 4 次提交
    • A
      ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage · ac9f0c81
      Anton Lundin 提交于
      06f6c4c6 ("ata: libata: add missing ata_identify_page_supported() calls")
      introduced additional calls to ata_identify_page_supported(), thus also
      adding indirectly accesses to the device log directory log page through
      ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices
      to lock up.
      
      Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to
      the log directory in ata_log_supported() and add a blacklist entry
      with this flag for "SATADOM-ML 3ME" devices.
      
      Fixes: 636f6e2a ("libata: add horkage for missing Identify Device log")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NAnton Lundin <glance@acc.umu.se>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      ac9f0c81
    • F
      netfilter: ctnetlink: disable helper autoassign · d1ca60ef
      Florian Westphal 提交于
      When userspace, e.g. conntrackd, inserts an entry with a specified helper,
      its possible that the helper is lost immediately after its added:
      
      ctnetlink_create_conntrack
        -> nf_ct_helper_ext_add + assign helper
          -> ctnetlink_setup_nat
            -> ctnetlink_parse_nat_setup
               -> parse_nat_setup -> nfnetlink_parse_nat_setup
      	                       -> nf_nat_setup_info
                                       -> nf_conntrack_alter_reply
                                         -> __nf_ct_try_assign_helper
      
      ... and __nf_ct_try_assign_helper will zero the helper again.
      
      Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
      when helper is assigned via ruleset.
      
      Dropped old 'not strictly necessary' comment, it referred to use of
      rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().
      
      NB: Fixes tag intentionally incorrect, this extends the referenced commit,
      but this change won't build without IPS_HELPER introduced there.
      
      Fixes: 6714cf54 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
      Reported-by: NPham Thanh Tuyen <phamtyn@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1ca60ef
    • D
      ax25: fix reference count leaks of ax25_dev · 87563a04
      Duoming Zhou 提交于
      The previous commit d01ffb9e ("ax25: add refcount in ax25_dev
      to avoid UAF bugs") introduces refcount into ax25_dev, but there
      are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
      ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
      
      This patch uses ax25_dev_put() and adjusts the position of
      ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
      
      Fixes: d01ffb9e ("ax25: add refcount in ax25_dev to avoid UAF bugs")
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      87563a04
    • I
      Revert "module, async: async_synchronize_full() on module init iff async is used" · 67d6212a
      Igor Pylypiv 提交于
      This reverts commit 774a1221.
      
      We need to finish all async code before the module init sequence is
      done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
      thread that called async_schedule().  Then the PF_USED_ASYNC flag was
      used to determine whether or not async_synchronize_full() needs to be
      invoked.  This works when modprobe thread is calling async_schedule(),
      but it does not work if module dispatches init code to a worker thread
      which then calls async_schedule().
      
      For example, PCI driver probing is invoked from a worker thread based on
      a node where device is attached:
      
      	if (cpu < nr_cpu_ids)
      		error = work_on_cpu(cpu, local_pci_probe, &ddi);
      	else
      		error = local_pci_probe(&ddi);
      
      We end up in a situation where a worker thread gets the PF_USED_ASYNC
      flag set instead of the modprobe thread.  As a result,
      async_synchronize_full() is not invoked and modprobe completes without
      waiting for the async code to finish.
      
      The issue was discovered while loading the pm80xx driver:
      (scsi_mod.scan=async)
      
      modprobe pm80xx                      worker
      ...
        do_init_module()
        ...
          pci_call_probe()
            work_on_cpu(local_pci_probe)
                                           local_pci_probe()
                                             pm8001_pci_probe()
                                               scsi_scan_host()
                                                 async_schedule()
                                                 worker->flags |= PF_USED_ASYNC;
                                           ...
            < return from worker >
        ...
        if (current->flags & PF_USED_ASYNC) <--- false
        	async_synchronize_full();
      
      Commit 21c3c5d2 ("block: don't request module during elevator init")
      fixed the deadlock issue which the reverted commit 774a1221
      ("module, async: async_synchronize_full() on module init iff async is
      used") tried to fix.
      
      Since commit 0fdff3ec ("async, kmod: warn on synchronous
      request_module() from async workers") synchronous module loading from
      async is not allowed.
      
      Given that the original deadlock issue is fixed and it is no longer
      allowed to call synchronous request_module() from async we can remove
      PF_USED_ASYNC flag to make module init consistently invoke
      async_synchronize_full() unless async module probe is requested.
      Signed-off-by: NIgor Pylypiv <ipylypiv@google.com>
      Reviewed-by: NChangyuan Lyu <changyuanl@google.com>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67d6212a
  13. 03 2月, 2022 9 次提交
  14. 02 2月, 2022 3 次提交
    • T
      NFS: Avoid duplicate uncached readdir calls on eof · e1d2699b
      Trond Myklebust 提交于
      If we've reached the end of the directory, then cache that information
      in the context so that we don't need to do an uncached readdir in order
      to rediscover that fact.
      
      Fixes: 794092c5 ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e1d2699b
    • D
      Partially revert "net/smc: Add netlink net namespace support" · c86d8613
      Dmitry V. Levin 提交于
      The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc5
      ("net/smc: Add netlink net namespace support") introduced an ABI
      regression: since struct smc_diag_lgrinfo contains an object of
      type "struct smc_diag_linkinfo", offset of all subsequent members
      of struct smc_diag_lgrinfo was changed by that change.
      
      As result, applications compiled with the old version
      of struct smc_diag_linkinfo will receive garbage in
      struct smc_diag_lgrinfo.role if the kernel implements
      this new version of struct smc_diag_linkinfo.
      
      Fix this regression by reverting the part of commit 79d39fc5 that
      changes struct smc_diag_linkinfo.  After all, there is SMC_GEN_NETLINK
      interface which is good enough, so there is probably no need to touch
      the smc_diag ABI in the first place.
      
      Fixes: 79d39fc5 ("net/smc: Add netlink net namespace support")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c86d8613
    • H
      Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" · 1148836f
      Helge Deller 提交于
      This reverts commit b3ec8cdf.
      
      Revert the second (of 2) commits which disabled scrolling acceleration
      in fbcon/fbdev.  It introduced a regression for fbdev-supported graphic
      cards because of the performance penalty by doing screen scrolling by
      software instead of using the existing graphic card 2D hardware
      acceleration.
      
      Console scrolling acceleration was disabled by dropping code which
      checked at runtime the driver hardware capabilities for the
      BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it
      enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move
      screen contents.  After dropping those checks scrollmode was hard-wired
      to SCROLL_REDRAW instead, which forces all graphic cards to redraw every
      character at the new screen position when scrolling.
      
      This change effectively disabled all hardware-based scrolling acceleration for
      ALL drivers, because now all kind of 2D hardware acceleration (bitblt,
      fillrect) in the drivers isn't used any longer.
      
      The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm
      and gma500) used hardware acceleration in the past and thus code for checking
      and using scrolling acceleration is obsolete.
      
      This statement is NOT TRUE, because beside the DRM drivers there are around 35
      other fbdev drivers which depend on fbdev/fbcon and still provide hardware
      acceleration for fbdev/fbcon.
      
      The original commit message also states that syzbot found lots of bugs in fbcon
      and thus it's "often the solution to just delete code and remove features".
      This is true, and the bugs - which actually affected all users of fbcon,
      including DRM - were fixed, or code was dropped like e.g. the support for
      software scrollback in vgacon (commit 973c096f).
      
      So to further analyze which bugs were found by syzbot, I've looked through all
      patches in drivers/video which were tagged with syzbot or syzkaller back to
      year 2005. The vast majority fixed the reported issues on a higher level, e.g.
      when screen is to be resized, or when font size is to be changed. The few ones
      which touched driver code fixed a real driver bug, e.g. by adding a check.
      
      But NONE of those patches touched code of either the SCROLL_MOVE or the
      SCROLL_REDRAW case.
      
      That means, there was no real reason why SCROLL_MOVE had to be ripped-out and
      just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far
      was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it
      could go away. That argument completely missed the fact that SCROLL_MOVE is
      still heavily used by fbdev (non-DRM) drivers.
      
      Some people mention that using memcpy() instead of the hardware acceleration is
      pretty much the same speed. But that's not true, at least not for older graphic
      cards and machines where we see speed decreases by factor 10 and more and thus
      this change leads to console responsiveness way worse than before.
      
      That's why the original commit is to be reverted. By reverting we
      reintroduce hardware-based scrolling acceleration and fix the
      performance regression for fbdev drivers.
      
      There isn't any impact on DRM when reverting those patches.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-2-deller@gmx.de
      1148836f