1. 21 2月, 2022 1 次提交
  2. 19 2月, 2022 1 次提交
  3. 18 2月, 2022 1 次提交
    • E
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet 提交于
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d95d6320
  4. 17 2月, 2022 2 次提交
  5. 15 2月, 2022 1 次提交
    • E
      bonding: fix data-races around agg_select_timer · 9ceaf6f7
      Eric Dumazet 提交于
      syzbot reported that two threads might write over agg_select_timer
      at the same time. Make agg_select_timer atomic to fix the races.
      
      BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
      
      read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
       bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0:
       bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998
       bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967
       __dev_open+0x274/0x3a0 net/core/dev.c:1407
       dev_open+0x54/0x190 net/core/dev.c:1443
       bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937
       do_set_master net/core/rtnetlink.c:2532 [inline]
       do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736
       __rtnl_newlink net/core/rtnetlink.c:3414 [inline]
       rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000050 -> 0x0000004f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G        W         5.17.0-rc4-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ceaf6f7
  6. 14 2月, 2022 3 次提交
    • E
      net_sched: add __rcu annotation to netdev->qdisc · 5891cd5e
      Eric Dumazet 提交于
      syzbot found a data-race [1] which lead me to add __rcu
      annotations to netdev->qdisc, and proper accessors
      to get LOCKDEP support.
      
      [1]
      BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu
      
      write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
       attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
       dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
       __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
       __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
       rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
       __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
       rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
       qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
       __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
       tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
       rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 470502de ("net: sched: unlock rules update API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5891cd5e
    • V
      net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN · a2614140
      Vladimir Oltean 提交于
      mv88e6xxx is special among DSA drivers in that it requires the VTU to
      contain the VID of the FDB entry it modifies in
      mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
      
      Sometimes due to races this is not always satisfied even if external
      code does everything right (first deletes the FDB entries, then the
      VLAN), because DSA commits to hardware FDB entries asynchronously since
      commit c9eb3e0f ("net: dsa: Add support for learning FDB through
      notification").
      
      Therefore, the mv88e6xxx driver must close this race condition by
      itself, by asking DSA to flush the switchdev workqueue of any FDB
      deletions in progress, prior to exiting a VLAN.
      
      Fixes: c9eb3e0f ("net: dsa: Add support for learning FDB through notification")
      Reported-by: NRafael Richter <rafael.richter@gin.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2614140
    • I
      ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() · 26394fc1
      Ignat Korchagin 提交于
      Some time ago 8965779d ("ipv6,mcast: always hold idev->lock before mca_lock")
      switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe
      version. That was OK, because idev->lock was held for these codepaths.
      
      In 88e2ca30 ("mld: convert ifmcaddr6 to RCU") these external locks were
      removed, so we probably need to restore the original rcu-safe call.
      
      Otherwise, we occasionally get a machine crashed/stalled with the following
      in dmesg:
      
      [ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI
      [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G           O      5.15.19-cloudflare-2022.2.1 #1
      [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV
      [ 3406.009552][T230589] Workqueue: mld mld_ifc_work
      [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60
      [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b
      [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202
      [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040
      [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008
      [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000
      [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100
      [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000
      [ 3406.125730][T230589] FS:  0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000
      [ 3406.138992][T230589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0
      [ 3406.162421][T230589] Call Trace:
      [ 3406.170235][T230589]  <TASK>
      [ 3406.177736][T230589]  mld_newpack+0xfe/0x1a0
      [ 3406.186686][T230589]  add_grhead+0x87/0xa0
      [ 3406.195498][T230589]  add_grec+0x485/0x4e0
      [ 3406.204310][T230589]  ? newidle_balance+0x126/0x3f0
      [ 3406.214024][T230589]  mld_ifc_work+0x15d/0x450
      [ 3406.223279][T230589]  process_one_work+0x1e6/0x380
      [ 3406.232982][T230589]  worker_thread+0x50/0x3a0
      [ 3406.242371][T230589]  ? rescuer_thread+0x360/0x360
      [ 3406.252175][T230589]  kthread+0x127/0x150
      [ 3406.261197][T230589]  ? set_kthread_struct+0x40/0x40
      [ 3406.271287][T230589]  ret_from_fork+0x22/0x30
      [ 3406.280812][T230589]  </TASK>
      [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders]
      [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---
      
      Fixes: 88e2ca30 ("mld: convert ifmcaddr6 to RCU")
      Reported-by: NDavid Pinilla Caparros <dpini@cloudflare.com>
      Signed-off-by: NIgnat Korchagin <ignat@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26394fc1
  7. 12 2月, 2022 2 次提交
    • P
      kfence: make test case compatible with run time set sample interval · 8913c610
      Peng Liu 提交于
      The parameter kfence_sample_interval can be set via boot parameter and
      late shell command, which is convenient for automated tests and KFENCE
      parameter optimization.  However, KFENCE test case just uses
      compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
      case not run as users desired.  Export kfence_sample_interval, so that
      KFENCE test case can use run-time-set sample interval.
      
      Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.comSigned-off-by: NPeng Liu <liupeng256@huawei.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Christian Knig <christian.koenig@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8913c610
    • R
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 0764db9b
      Roman Gushchin 提交于
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0764db9b
  8. 09 2月, 2022 3 次提交
  9. 08 2月, 2022 2 次提交
    • R
      PM: s2idle: ACPI: Fix wakeup interrupts handling · cb1f65c1
      Rafael J. Wysocki 提交于
      After commit e3728b50 ("ACPI: PM: s2idle: Avoid possible race
      related to the EC GPE") wakeup interrupts occurring immediately after
      the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
      the SCI triggers again immediately after the rearming in
      acpi_s2idle_wake(), that wakeup may be missed too.
      
      The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
      when pm_wakeup_irq is 0, but that's not the case any more after the
      interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
      cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
      there may be wakeup interrupts occurring in that time frame and if
      that happens, they will be missed.
      
      To address that issue first move the clearing of pm_wakeup_irq to
      the point at which it is known that the interrupt causing
      acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
      for wakeup.  Moreover, because that only reduces the size of the
      time window in which the issue may manifest itself, allow
      pm_system_irq_wakeup() to register two second wakeup interrupts in
      a row and, when discarding the first one, replace it with the second
      one.  [Of course, this assumes that only one wakeup interrupt can be
      discarded in one go, but currently that is the case and I am not
      aware of any plans to change that.]
      
      Fixes: e3728b50 ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb1f65c1
    • M
      Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64) · 6bf625a4
      Michael Kelley 提交于
      Using DMA_BIT_MASK(64) as an initializer for a global variable
      causes problems with Clang 12.0.1. The compiler doesn't understand
      that value 64 is excluded from the shift at compile time, resulting
      in a build error.
      
      While this is a compiler problem, avoid the issue by setting up
      the dma_mask memory as part of struct hv_device, and initialize
      it using dma_set_mask().
      Reported-by: NNathan Chancellor <nathan@kernel.org>
      Reported-by: NVitaly Chikunov <vt@altlinux.org>
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Fixes: 743b237c ("scsi: storvsc: Add Isolation VM support for storvsc driver")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      6bf625a4
  10. 07 2月, 2022 1 次提交
    • D
      ata: libata-core: Fix ata_dev_config_cpr() · fda17afc
      Damien Le Moal 提交于
      The concurrent positioning ranges log page 47h is a general purpose log
      page and not a subpage of the indentify device log. Using
      ata_identify_page_supported() to test for concurrent positioning ranges
      support is thus wrong. ata_log_supported() must be used.
      
      Furthermore, unlike other advanced ATA features (e.g. NCQ priority),
      accesses to the concurrent positioning ranges log page are not gated by
      a feature bit from the device IDENTIFY data. Since many older drives
      react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
      to read device log pages, avoid problems with older drives by limiting
      the concurrent positioning ranges support detection to drives
      implementing at least the ACS-4 ATA standard (major version 11). This
      additional condition effectively turns ata_dev_config_cpr() into a nop
      for older drives, avoiding problems in the field.
      
      Fixes: fe22e1c2 ("libata: support concurrent positioning ranges log")
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519
      Cc: stable@vger.kernel.org
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Tested-by: NAbderraouf Adjal <adjal.arf@gmail.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      fda17afc
  11. 05 2月, 2022 3 次提交
  12. 04 2月, 2022 4 次提交
    • A
      ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage · ac9f0c81
      Anton Lundin 提交于
      06f6c4c6 ("ata: libata: add missing ata_identify_page_supported() calls")
      introduced additional calls to ata_identify_page_supported(), thus also
      adding indirectly accesses to the device log directory log page through
      ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices
      to lock up.
      
      Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to
      the log directory in ata_log_supported() and add a blacklist entry
      with this flag for "SATADOM-ML 3ME" devices.
      
      Fixes: 636f6e2a ("libata: add horkage for missing Identify Device log")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NAnton Lundin <glance@acc.umu.se>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      ac9f0c81
    • F
      netfilter: ctnetlink: disable helper autoassign · d1ca60ef
      Florian Westphal 提交于
      When userspace, e.g. conntrackd, inserts an entry with a specified helper,
      its possible that the helper is lost immediately after its added:
      
      ctnetlink_create_conntrack
        -> nf_ct_helper_ext_add + assign helper
          -> ctnetlink_setup_nat
            -> ctnetlink_parse_nat_setup
               -> parse_nat_setup -> nfnetlink_parse_nat_setup
      	                       -> nf_nat_setup_info
                                       -> nf_conntrack_alter_reply
                                         -> __nf_ct_try_assign_helper
      
      ... and __nf_ct_try_assign_helper will zero the helper again.
      
      Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
      when helper is assigned via ruleset.
      
      Dropped old 'not strictly necessary' comment, it referred to use of
      rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().
      
      NB: Fixes tag intentionally incorrect, this extends the referenced commit,
      but this change won't build without IPS_HELPER introduced there.
      
      Fixes: 6714cf54 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
      Reported-by: NPham Thanh Tuyen <phamtyn@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1ca60ef
    • D
      ax25: fix reference count leaks of ax25_dev · 87563a04
      Duoming Zhou 提交于
      The previous commit d01ffb9e ("ax25: add refcount in ax25_dev
      to avoid UAF bugs") introduces refcount into ax25_dev, but there
      are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
      ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
      
      This patch uses ax25_dev_put() and adjusts the position of
      ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
      
      Fixes: d01ffb9e ("ax25: add refcount in ax25_dev to avoid UAF bugs")
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      87563a04
    • I
      Revert "module, async: async_synchronize_full() on module init iff async is used" · 67d6212a
      Igor Pylypiv 提交于
      This reverts commit 774a1221.
      
      We need to finish all async code before the module init sequence is
      done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
      thread that called async_schedule().  Then the PF_USED_ASYNC flag was
      used to determine whether or not async_synchronize_full() needs to be
      invoked.  This works when modprobe thread is calling async_schedule(),
      but it does not work if module dispatches init code to a worker thread
      which then calls async_schedule().
      
      For example, PCI driver probing is invoked from a worker thread based on
      a node where device is attached:
      
      	if (cpu < nr_cpu_ids)
      		error = work_on_cpu(cpu, local_pci_probe, &ddi);
      	else
      		error = local_pci_probe(&ddi);
      
      We end up in a situation where a worker thread gets the PF_USED_ASYNC
      flag set instead of the modprobe thread.  As a result,
      async_synchronize_full() is not invoked and modprobe completes without
      waiting for the async code to finish.
      
      The issue was discovered while loading the pm80xx driver:
      (scsi_mod.scan=async)
      
      modprobe pm80xx                      worker
      ...
        do_init_module()
        ...
          pci_call_probe()
            work_on_cpu(local_pci_probe)
                                           local_pci_probe()
                                             pm8001_pci_probe()
                                               scsi_scan_host()
                                                 async_schedule()
                                                 worker->flags |= PF_USED_ASYNC;
                                           ...
            < return from worker >
        ...
        if (current->flags & PF_USED_ASYNC) <--- false
        	async_synchronize_full();
      
      Commit 21c3c5d2 ("block: don't request module during elevator init")
      fixed the deadlock issue which the reverted commit 774a1221
      ("module, async: async_synchronize_full() on module init iff async is
      used") tried to fix.
      
      Since commit 0fdff3ec ("async, kmod: warn on synchronous
      request_module() from async workers") synchronous module loading from
      async is not allowed.
      
      Given that the original deadlock issue is fixed and it is no longer
      allowed to call synchronous request_module() from async we can remove
      PF_USED_ASYNC flag to make module init consistently invoke
      async_synchronize_full() unless async module probe is requested.
      Signed-off-by: NIgor Pylypiv <ipylypiv@google.com>
      Reviewed-by: NChangyuan Lyu <changyuanl@google.com>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67d6212a
  13. 03 2月, 2022 9 次提交
  14. 02 2月, 2022 6 次提交
    • T
      NFS: Avoid duplicate uncached readdir calls on eof · e1d2699b
      Trond Myklebust 提交于
      If we've reached the end of the directory, then cache that information
      in the context so that we don't need to do an uncached readdir in order
      to rediscover that fact.
      
      Fixes: 794092c5 ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e1d2699b
    • D
      Partially revert "net/smc: Add netlink net namespace support" · c86d8613
      Dmitry V. Levin 提交于
      The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc5
      ("net/smc: Add netlink net namespace support") introduced an ABI
      regression: since struct smc_diag_lgrinfo contains an object of
      type "struct smc_diag_linkinfo", offset of all subsequent members
      of struct smc_diag_lgrinfo was changed by that change.
      
      As result, applications compiled with the old version
      of struct smc_diag_linkinfo will receive garbage in
      struct smc_diag_lgrinfo.role if the kernel implements
      this new version of struct smc_diag_linkinfo.
      
      Fix this regression by reverting the part of commit 79d39fc5 that
      changes struct smc_diag_linkinfo.  After all, there is SMC_GEN_NETLINK
      interface which is good enough, so there is probably no need to touch
      the smc_diag ABI in the first place.
      
      Fixes: 79d39fc5 ("net/smc: Add netlink net namespace support")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c86d8613
    • H
      Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" · 1148836f
      Helge Deller 提交于
      This reverts commit b3ec8cdf.
      
      Revert the second (of 2) commits which disabled scrolling acceleration
      in fbcon/fbdev.  It introduced a regression for fbdev-supported graphic
      cards because of the performance penalty by doing screen scrolling by
      software instead of using the existing graphic card 2D hardware
      acceleration.
      
      Console scrolling acceleration was disabled by dropping code which
      checked at runtime the driver hardware capabilities for the
      BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it
      enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move
      screen contents.  After dropping those checks scrollmode was hard-wired
      to SCROLL_REDRAW instead, which forces all graphic cards to redraw every
      character at the new screen position when scrolling.
      
      This change effectively disabled all hardware-based scrolling acceleration for
      ALL drivers, because now all kind of 2D hardware acceleration (bitblt,
      fillrect) in the drivers isn't used any longer.
      
      The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm
      and gma500) used hardware acceleration in the past and thus code for checking
      and using scrolling acceleration is obsolete.
      
      This statement is NOT TRUE, because beside the DRM drivers there are around 35
      other fbdev drivers which depend on fbdev/fbcon and still provide hardware
      acceleration for fbdev/fbcon.
      
      The original commit message also states that syzbot found lots of bugs in fbcon
      and thus it's "often the solution to just delete code and remove features".
      This is true, and the bugs - which actually affected all users of fbcon,
      including DRM - were fixed, or code was dropped like e.g. the support for
      software scrollback in vgacon (commit 973c096f).
      
      So to further analyze which bugs were found by syzbot, I've looked through all
      patches in drivers/video which were tagged with syzbot or syzkaller back to
      year 2005. The vast majority fixed the reported issues on a higher level, e.g.
      when screen is to be resized, or when font size is to be changed. The few ones
      which touched driver code fixed a real driver bug, e.g. by adding a check.
      
      But NONE of those patches touched code of either the SCROLL_MOVE or the
      SCROLL_REDRAW case.
      
      That means, there was no real reason why SCROLL_MOVE had to be ripped-out and
      just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far
      was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it
      could go away. That argument completely missed the fact that SCROLL_MOVE is
      still heavily used by fbdev (non-DRM) drivers.
      
      Some people mention that using memcpy() instead of the hardware acceleration is
      pretty much the same speed. But that's not true, at least not for older graphic
      cards and machines where we see speed decreases by factor 10 and more and thus
      this change leads to console responsiveness way worse than before.
      
      That's why the original commit is to be reverted. By reverting we
      reintroduce hardware-based scrolling acceleration and fix the
      performance regression for fbdev drivers.
      
      There isn't any impact on DRM when reverting those patches.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-2-deller@gmx.de
      1148836f
    • M
      perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures · ddecd228
      Marco Elver 提交于
      Due to the alignment requirements of siginfo_t, as described in
      3ddb3fd8 ("signal, perf: Fix siginfo_t by avoiding u64 on 32-bit
      architectures"), siginfo_t::si_perf_data is limited to an unsigned long.
      
      However, perf_event_attr::sig_data is an u64, to avoid having to deal
      with compat conversions. Due to being an u64, it may not immediately be
      clear to users that sig_data is truncated on 32 bit architectures.
      
      Add a comment to explicitly point this out, and hopefully help some
      users save time by not having to deduce themselves what's happening.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NMarco Elver <elver@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Link: https://lore.kernel.org/r/20220131103407.1971678-3-elver@google.com
      ddecd228
    • K
      net/mlx5e: Use struct_group() for memcpy() region · 6d5c900e
      Kees Cook 提交于
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally writing across neighboring fields.
      
      Use struct_group() in struct vlan_ethhdr around members h_dest and
      h_source, so they can be referenced together. This will allow memcpy()
      and sizeof() to more easily reason about sizes, improve readability,
      and avoid future warnings about writing beyond the end of h_dest.
      
      "pahole" shows no size nor member offset changes to struct vlan_ethhdr.
      "objdump -d" shows no object code changes.
      
      Fixes: 34802a42 ("net/mlx5e: Do not modify the TX SKB")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      6d5c900e
    • D
      netfs, cachefiles: Add a method to query presence of data in the cache · bee9f655
      David Howells 提交于
      Add a netfs_cache_ops method by which a network filesystem can ask the
      cache about what data it has available and where so that it can make a
      multipage read more efficient.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: linux-cachefs@redhat.com
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: NRohith Surabattula <rohiths@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      bee9f655
  15. 01 2月, 2022 1 次提交
    • M
      kvm: add guest_state_{enter,exit}_irqoff() · ef9989af
      Mark Rutland 提交于
      When transitioning to/from guest mode, it is necessary to inform
      lockdep, tracing, and RCU in a specific order, similar to the
      requirements for transitions to/from user mode. Additionally, it is
      necessary to perform vtime accounting for a window around running the
      guest, with RCU enabled, such that timer interrupts taken from the guest
      can be accounted as guest time.
      
      Most architectures don't handle all the necessary pieces, and a have a
      number of common bugs, including unsafe usage of RCU during the window
      between guest_enter() and guest_exit().
      
      On x86, this was dealt with across commits:
      
        87fa7f3e ("x86/kvm: Move context tracking where it belongs")
        0642391e ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
        9fc975e9 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
        3ebccdf3 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
        135961e0 ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
        16045714 ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
        bc908e09 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
      
      ... but those fixes are specific to x86, and as the resulting logic
      (while correct) is split across generic helper functions and
      x86-specific helper functions, it is difficult to see that the
      entry/exit accounting is balanced.
      
      This patch adds generic helpers which architectures can use to handle
      guest entry/exit consistently and correctly. The guest_{enter,exit}()
      helpers are split into guest_timing_{enter,exit}() to perform vtime
      accounting, and guest_context_{enter,exit}() to perform the necessary
      context tracking and RCU management. The existing guest_{enter,exit}()
      heleprs are left as wrappers of these.
      
      Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
      helpers are added to handle the ordering of lockdep, tracing, and RCU
      manageent. These are inteneded to mirror exit_to_user_mode() and
      enter_from_user_mode().
      
      Subsequent patches will migrate architectures over to the new helpers,
      following a sequence:
      
      	guest_timing_enter_irqoff();
      
      	guest_state_enter_irqoff();
      	< run the vcpu >
      	guest_state_exit_irqoff();
      
      	< take any pending IRQs >
      
      	guest_timing_exit_irqoff();
      
      This sequences handles all of the above correctly, and more clearly
      balances the entry and exit portions, making it easier to understand.
      
      The existing helpers are marked as deprecated, and will be removed once
      all architectures have been converted.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Message-Id: <20220201132926.3301912-2-mark.rutland@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef9989af