1. 25 7月, 2021 1 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · bca1d4de
      Linus Torvalds 提交于
      Merge misc mm fixes from Andrew Morton:
       "15 patches.
      
        VM subsystems affected by this patch series: userfaultfd, kfence,
        highmem, pagealloc, memblock, pagecache, secretmem, pagemap, and
        hugetlbfs"
      
      * akpm:
        hugetlbfs: fix mount mode command line processing
        mm: fix the deadlock in finish_fault()
        mm: mmap_lock: fix disabling preemption directly
        mm/secretmem: wire up ->set_page_dirty
        writeback, cgroup: do not reparent dax inodes
        writeback, cgroup: remove wb from offline list before releasing refcnt
        memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
        mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction
        mm: use kmap_local_page in memzero_page
        mm: call flush_dcache_page() in memcpy_to_page() and memzero_page()
        kfence: skip all GFP_ZONEMASK allocations
        kfence: move the size check to the beginning of __kfence_alloc()
        kfence: defer kfence_test_init to ensure that kunit debugfs is created
        selftest: use mmap instead of posix_memalign to allocate memory
        userfaultfd: do not untag user pointers
      bca1d4de
  2. 24 7月, 2021 24 次提交
    • M
      hugetlbfs: fix mount mode command line processing · e0f7e2b2
      Mike Kravetz 提交于
      In commit 32021982 ("hugetlbfs: Convert to fs_context") processing
      of the mount mode string was changed from match_octal() to fsparam_u32.
      
      This changed existing behavior as match_octal does not require octal
      values to have a '0' prefix, but fsparam_u32 does.
      
      Use fsparam_u32oct which provides the same behavior as match_octal.
      
      Link: https://lkml.kernel.org/r/20210721183326.102716-1-mike.kravetz@oracle.com
      Fixes: 32021982 ("hugetlbfs: Convert to fs_context")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: NDennis Camera <bugs+kernel.org@dtnr.ch>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0f7e2b2
    • Q
      mm: fix the deadlock in finish_fault() · e4dc3489
      Qi Zheng 提交于
      Commit 63f3655f ("mm, memcg: fix reclaim deadlock with writeback")
      fix the following ABBA deadlock by pre-allocating the pte page table
      without holding the page lock.
      
      	                                lock_page(A)
                                              SetPageWriteback(A)
                                              unlock_page(A)
        lock_page(B)
                                              lock_page(B)
        pte_alloc_one
          shrink_page_list
            wait_on_page_writeback(A)
                                              SetPageWriteback(B)
                                              unlock_page(B)
      
                                              # flush A, B to clear the writeback
      
      Commit f9ce0be7 ("mm: Cleanup faultaround and finish_fault()
      codepaths") reworked the relevant code but ignored this race.  This will
      cause the deadlock above to appear again, so fix it.
      
      Link: https://lkml.kernel.org/r/20210721074849.57004-1-zhengqi.arch@bytedance.com
      Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths")
      Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4dc3489
    • M
      mm: mmap_lock: fix disabling preemption directly · e904c2cc
      Muchun Song 提交于
      Commit 832b5072 ("mm: mmap_lock: use local locks instead of
      disabling preemption") fixed a bug by using local locks.
      
      But commit d01079f3 ("mm/mmap_lock: remove dead code for
      !CONFIG_TRACING configurations") changed those lines back to the
      original version.
      
      I guess it was introduced by fixing conflicts.
      
      Link: https://lkml.kernel.org/r/20210720074228.76342-1-songmuchun@bytedance.com
      Fixes: d01079f3 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Reviewed-by: NPankaj Gupta <pankaj.gupta@ionos.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e904c2cc
    • M
      mm/secretmem: wire up ->set_page_dirty · af642374
      Mike Rapoport 提交于
      Make secretmem up to date with the changes done in commit 0af57378
      ("mm: require ->set_page_dirty to be explicitly wired up") so that
      unconditional call to this method won't cause crashes.
      
      Link: https://lkml.kernel.org/r/20210716063933.31633-1-rppt@kernel.org
      Fixes: 0af57378 ("mm: require ->set_page_dirty to be explicitly wired up")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af642374
    • R
      writeback, cgroup: do not reparent dax inodes · 593311e8
      Roman Gushchin 提交于
      The inode switching code is not suited for dax inodes.  An attempt to
      switch a dax inode to a parent writeback structure (as a part of a
      writeback cleanup procedure) results in a panic like this:
      
        run fstests generic/270 at 2021-07-15 05:54:02
        XFS (pmem0p2): EXPERIMENTAL big timestamp feature in use.  Use at your own risk!
        XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
        XFS (pmem0p2): EXPERIMENTAL inode btree counters feature in use. Use at your own risk!
        XFS (pmem0p2): Mounting V5 Filesystem
        XFS (pmem0p2): Ending clean mount
        XFS (pmem0p2): Quotacheck needed: Please wait.
        XFS (pmem0p2): Quotacheck: Done.
        XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
        XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
        XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
        BUG: unable to handle page fault for address: 0000000005b0f669
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted 5.14.0-rc1-master-8096acd7+ #8
        Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016
        Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
        RIP: 0010:inode_do_switch_wbs+0xaf/0x470
        Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85
        RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002
        RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0
        RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff
        RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228
        R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130
        R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0
        FS:  0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0
        Call Trace:
         inode_switch_wbs_work_fn+0xb6/0x2a0
         process_one_work+0x1e6/0x380
         worker_thread+0x53/0x3d0
         kthread+0x10f/0x130
         ret_from_fork+0x22/0x30
        Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3 ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod
        CR2: 0000000005b0f669
        ---[ end trace ed2105faff8384f3 ]---
        RIP: 0010:inode_do_switch_wbs+0xaf/0x470
        Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85
        RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002
        RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0
        RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff
        RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228
        R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130
        R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0
        FS:  0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0
        Kernel panic - not syncing: Fatal exception
        Kernel Offset: 0x15200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
        ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      The crash happens on an attempt to iterate over attached pagecache pages
      and check the dirty flag: a dax inode's xarray contains pfn's instead of
      generic struct page pointers.
      
      This happens for DAX and not for other kinds of non-page entries in the
      inodes because it's a tagged iteration, and shadow/swap entries are
      never tagged; only DAX entries get tagged.
      
      Fix the problem by bailing out (with the false return value) of
      inode_prepare_sbs_switch() if a dax inode is passed.
      
      [willy@infradead.org: changelog addition]
      
      Link: https://lkml.kernel.org/r/20210719171350.3876830-1-guro@fb.com
      Fixes: c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Reported-by: NDarrick J. Wong <djwong@kernel.org>
      Tested-by: NDarrick J. Wong <djwong@kernel.org>
      Tested-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Acked-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      593311e8
    • R
      writeback, cgroup: remove wb from offline list before releasing refcnt · b43a9e76
      Roman Gushchin 提交于
      Boyang reported that the commit c22d70a1 ("writeback, cgroup:
      release dying cgwbs by switching attached inodes") causes the kernel to
      crash while running xfstests generic/256 on ext4 on aarch64 and ppc64le.
      
        run fstests generic/256 at 2021-07-12 05:41:40
        EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: . Quota mode: none.
        Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
        Mem abort info:
           ESR = 0x96000005
           EC = 0x25: DABT (current EL), IL = 32 bits
           SET = 0, FnV = 0
           EA = 0, S1PTW = 0
           FSC = 0x05: level 1 translation fault
        Data abort info:
           ISV = 0, ISS = 0x00000005
           CM = 0, WnR = 0
        user pgtable: 64k pages, 48-bit VAs, pgdp=00000000b0502000
        [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
        Internal error: Oops: 96000005 [#1] SMP
        Modules linked in: dm_flakey dm_snapshot dm_bufio dm_zero dm_mod loop tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc ext4 vfat fat mbcache jbd2 drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover virtio_console failover virtio_mmio aes_neon_bs [last unloaded: scsi_debug]
        CPU: 0 PID: 408468 Comm: kworker/u8:5 Tainted: G X --------- ---  5.14.0-0.rc1.15.bx.el9.aarch64 #1
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        Workqueue: events_unbound cleanup_offline_cgwbs_workfn
        pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO BTYPE=--)
        pc : cleanup_offline_cgwbs_workfn+0x320/0x394
        lr : cleanup_offline_cgwbs_workfn+0xe0/0x394
        sp : ffff80001554fd10
        x29: ffff80001554fd10 x28: 0000000000000000 x27: 0000000000000001
        x26: 0000000000000000 x25: 00000000000000e0 x24: ffffd2a2fbe671a8
        x23: ffff80001554fd88 x22: ffffd2a2fbe67198 x21: ffffd2a2fc25a730
        x20: ffff210412bc3000 x19: ffff210412bc3280 x18: 0000000000000000
        x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
        x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000040
        x11: ffff210481572238 x10: ffff21048157223a x9 : ffffd2a2fa276c60
        x8 : ffff210484106b60 x7 : 0000000000000000 x6 : 000000000007d18a
        x5 : ffff210416a86400 x4 : ffff210412bc0280 x3 : 0000000000000000
        x2 : ffff80001554fd88 x1 : ffff210412bc0280 x0 : 0000000000000003
        Call trace:
           cleanup_offline_cgwbs_workfn+0x320/0x394
           process_one_work+0x1f4/0x4b0
           worker_thread+0x184/0x540
           kthread+0x114/0x120
           ret_from_fork+0x10/0x18
        Code: d63f0020 97f99963 17ffffa6 f8588263 (f9400061)
        ---[ end trace e250fe289272792a ]---
        Kernel panic - not syncing: Oops: Fatal exception
        SMP: stopping secondary CPUs
        SMP: failed to stop secondary CPUs 0-2
        Kernel Offset: 0x52a2e9fa0000 from 0xffff800010000000
        PHYS_OFFSET: 0xfff0defca0000000
        CPU features: 0x00200251,23200840
        Memory Limit: none
        ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
      
      The problem happens when cgwb_release_workfn() races with
      cleanup_offline_cgwbs_workfn(): wb_tryget() in
      cleanup_offline_cgwbs_workfn() can be called after percpu_ref_exit() is
      cgwb_release_workfn(), which is basically a use-after-free error.
      
      Fix the problem by making removing the writeback structure from the
      offline list before releasing the percpu reference counter.  It will
      guarantee that cleanup_offline_cgwbs_workfn() will not see and not
      access writeback structures which are about to be released.
      
      Link: https://lkml.kernel.org/r/20210716201039.3762203-1-guro@fb.com
      Fixes: c22d70a1 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NBoyang Xue <bxue@redhat.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Tested-by: NDarrick J. Wong <djwong@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Murphy Zhou <jencce.kernel@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b43a9e76
    • M
      memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions · 79e482e9
      Mike Rapoport 提交于
      Commit b10d6bca ("arch, drivers: replace for_each_membock() with
      for_each_mem_range()") didn't take into account that when there is
      movable_node parameter in the kernel command line, for_each_mem_range()
      would skip ranges marked with MEMBLOCK_HOTPLUG.
      
      The page table setup code in POWER uses for_each_mem_range() to create
      the linear mapping of the physical memory and since the regions marked
      as MEMORY_HOTPLUG are skipped, they never make it to the linear map.
      
      A later access to the memory in those ranges will fail:
      
        BUG: Unable to handle kernel data access on write at 0xc000000400000000
        Faulting instruction address: 0xc00000000008a3c0
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7
        NIP:  c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040
        REGS: c000000008a57770 TRAP: 0300   Not tainted  (5.13.0)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 84222202  XER: 20040000
        CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0
        GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000
        GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200
        GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300
        GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000
        GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0
        GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680
        NIP clear_user_page+0x50/0x80
        LR __handle_mm_fault+0xc88/0x1910
        Call Trace:
          __handle_mm_fault+0xc44/0x1910 (unreliable)
          handle_mm_fault+0x130/0x2a0
          __get_user_pages+0x248/0x610
          __get_user_pages_remote+0x12c/0x3e0
          get_arg_page+0x54/0xf0
          copy_string_kernel+0x11c/0x210
          kernel_execve+0x16c/0x220
          call_usermodehelper_exec_async+0x1b0/0x2f0
          ret_from_kernel_thread+0x5c/0x70
        Instruction dump:
        79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050
        7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec
        ---[ end trace 490b8c67e6075e09 ]---
      
      Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the
      traversal fixes this issue.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100
      Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org
      Fixes: b10d6bca ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>	[5.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79e482e9
    • S
      mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction · 69e5d322
      Sergei Trofimovich 提交于
      To reproduce the failure we need the following system:
      
       - kernel command: page_poison=1 init_on_free=0 init_on_alloc=0
      
       - kernel config:
          * CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
          * CONFIG_INIT_ON_FREE_DEFAULT_ON=y
          * CONFIG_PAGE_POISONING=y
      
      Resulting in:
      
          0000000085629bdd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          0000000022861832: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00000000c597f5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          CPU: 11 PID: 15195 Comm: bash Kdump: loaded Tainted: G     U     O      5.13.1-gentoo-x86_64 #1
          Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2801 01/13/2021
          Call Trace:
           dump_stack+0x64/0x7c
           __kernel_unpoison_pages.cold+0x48/0x84
           post_alloc_hook+0x60/0xa0
           get_page_from_freelist+0xdb8/0x1000
           __alloc_pages+0x163/0x2b0
           __get_free_pages+0xc/0x30
           pgd_alloc+0x2e/0x1a0
           mm_init+0x185/0x270
           dup_mm+0x6b/0x4f0
           copy_process+0x190d/0x1b10
           kernel_clone+0xba/0x3b0
           __do_sys_clone+0x8f/0xb0
           do_syscall_64+0x68/0x80
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Before commit 51cba1eb ("init_on_alloc: Optimize static branches")
      init_on_alloc never enabled static branch by default.  It could only be
      enabed explicitly by init_mem_debugging_and_hardening().
      
      But after commit 51cba1eb, a static branch could already be enabled
      by default.  There was no code to ever disable it.  That caused
      page_poison=1 / init_on_free=1 conflict.
      
      This change extends init_mem_debugging_and_hardening() to also disable
      static branch disabling.
      
      Link: https://lkml.kernel.org/r/20210714031935.4094114-1-keescook@chromium.org
      Link: https://lore.kernel.org/r/20210712215816.1512739-1-slyfox@gentoo.org
      Fixes: 51cba1eb ("init_on_alloc: Optimize static branches")
      Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Co-developed-by: NKees Cook <keescook@chromium.org>
      Reported-by: NMikhail Morfikov <mmorfikov@gmail.com>
      Reported-by: <bowsingbetee@pm.me>
      Tested-by: <bowsingbetee@protonmail.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69e5d322
    • C
      mm: use kmap_local_page in memzero_page · d9a42b53
      Christoph Hellwig 提交于
      The commit message introducing the global memzero_page explicitly
      mentions switching to kmap_local_page in the commit log but doesn't
      actually do that.
      
      Link: https://lkml.kernel.org/r/20210713055231.137602-3-hch@lst.de
      Fixes: 28961998 ("iov_iter: lift memzero_page() to highmem.h")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9a42b53
    • C
      mm: call flush_dcache_page() in memcpy_to_page() and memzero_page() · 8dad53a1
      Christoph Hellwig 提交于
      memcpy_to_page and memzero_page can write to arbitrary pages, which
      could be in the page cache or in high memory, so call
      flush_kernel_dcache_pages to flush the dcache.
      
      This is a problem when using these helpers on dcache challeneged
      architectures.  Right now there are just a few users, chances are no one
      used the PC floppy driver, the aha1542 driver for an ISA SCSI HBA, and a
      few advanced and optional btrfs and ext4 features on those platforms yet
      since the conversion.
      
      Link: https://lkml.kernel.org/r/20210713055231.137602-2-hch@lst.de
      Fixes: bb90d4bc ("mm/highmem: Lift memcpy_[to|from]_page to core")
      Fixes: 28961998 ("iov_iter: lift memzero_page() to highmem.h")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dad53a1
    • A
      kfence: skip all GFP_ZONEMASK allocations · 236e9f15
      Alexander Potapenko 提交于
      Allocation requests outside ZONE_NORMAL (MOVABLE, HIGHMEM or DMA) cannot
      be fulfilled by KFENCE, because KFENCE memory pool is located in a zone
      different from the requested one.
      
      Because callers of kmem_cache_alloc() may actually rely on the
      allocation to reside in the requested zone (e.g.  memory allocations
      done with __GFP_DMA must be DMAable), skip all allocations done with
      GFP_ZONEMASK and/or respective SLAB flags (SLAB_CACHE_DMA and
      SLAB_CACHE_DMA32).
      
      Link: https://lkml.kernel.org/r/20210714092222.1890268-2-glider@google.com
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Acked-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: <stable@vger.kernel.org>	[5.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      236e9f15
    • A
      kfence: move the size check to the beginning of __kfence_alloc() · 235a85cb
      Alexander Potapenko 提交于
      Check the allocation size before toggling kfence_allocation_gate.
      
      This way allocations that can't be served by KFENCE will not result in
      waiting for another CONFIG_KFENCE_SAMPLE_INTERVAL without allocating
      anything.
      
      Link: https://lkml.kernel.org/r/20210714092222.1890268-1-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Suggested-by: NMarco Elver <elver@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[5.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      235a85cb
    • W
      kfence: defer kfence_test_init to ensure that kunit debugfs is created · 32ae8a06
      Weizhao Ouyang 提交于
      kfence_test_init and kunit_init both use the same level late_initcall,
      which means if kfence_test_init linked ahead of kunit_init,
      kfence_test_init will get a NULL debugfs_rootdir as parent dentry, then
      kfence_test_init and kfence_debugfs_init both create a debugfs node
      named "kfence" under debugfs_mount->mnt_root, and it will throw out
      "debugfs: Directory 'kfence' with parent '/' already present!" with
      EEXIST.  So kfence_test_init should be deferred.
      
      Link: https://lkml.kernel.org/r/20210714113140.2949995-1-o451686892@gmail.comSigned-off-by: NWeizhao Ouyang <o451686892@gmail.com>
      Tested-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32ae8a06
    • P
      selftest: use mmap instead of posix_memalign to allocate memory · 0db282ba
      Peter Collingbourne 提交于
      This test passes pointers obtained from anon_allocate_area to the
      userfaultfd and mremap APIs.  This causes a problem if the system
      allocator returns tagged pointers because with the tagged address ABI
      the kernel rejects tagged addresses passed to these APIs, which would
      end up causing the test to fail.  To make this test compatible with such
      system allocators, stop using the system allocator to allocate memory in
      anon_allocate_area, and instead just use mmap.
      
      Link: https://lkml.kernel.org/r/20210714195437.118982-3-pcc@google.com
      Link: https://linux-review.googlesource.com/id/Icac91064fcd923f77a83e8e133f8631c5b8fc241
      Fixes: c47174fc ("userfaultfd: selftest")
      Co-developed-by: NLokesh Gidra <lokeshgidra@google.com>
      Signed-off-by: NLokesh Gidra <lokeshgidra@google.com>
      Signed-off-by: NPeter Collingbourne <pcc@google.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Alistair Delva <adelva@google.com>
      Cc: William McVicker <willmcvicker@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Mitch Phillips <mitchp@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: <stable@vger.kernel.org>	[5.4]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0db282ba
    • P
      userfaultfd: do not untag user pointers · e71e2ace
      Peter Collingbourne 提交于
      Patch series "userfaultfd: do not untag user pointers", v5.
      
      If a user program uses userfaultfd on ranges of heap memory, it may end
      up passing a tagged pointer to the kernel in the range.start field of
      the UFFDIO_REGISTER ioctl.  This can happen when using an MTE-capable
      allocator, or on Android if using the Tagged Pointers feature for MTE
      readiness [1].
      
      When a fault subsequently occurs, the tag is stripped from the fault
      address returned to the application in the fault.address field of struct
      uffd_msg.  However, from the application's perspective, the tagged
      address *is* the memory address, so if the application is unaware of
      memory tags, it may get confused by receiving an address that is, from
      its point of view, outside of the bounds of the allocation.  We observed
      this behavior in the kselftest for userfaultfd [2] but other
      applications could have the same problem.
      
      Address this by not untagging pointers passed to the userfaultfd ioctls.
      Instead, let the system call fail.  Also change the kselftest to use
      mmap so that it doesn't encounter this problem.
      
      [1] https://source.android.com/devices/tech/debug/tagged-pointers
      [2] tools/testing/selftests/vm/userfaultfd.c
      
      This patch (of 2):
      
      Do not untag pointers passed to the userfaultfd ioctls.  Instead, let
      the system call fail.  This will provide an early indication of problems
      with tag-unaware userspace code instead of letting the code get confused
      later, and is consistent with how we decided to handle brk/mmap/mremap
      in commit dcde2373 ("mm: Avoid creating virtual address aliases in
      brk()/mmap()/mremap()"), as well as being consistent with the existing
      tagged address ABI documentation relating to how ioctl arguments are
      handled.
      
      The code change is a revert of commit 7d032574 ("userfaultfd: untag
      user pointers") plus some fixups to some additional calls to
      validate_range that have appeared since then.
      
      [1] https://source.android.com/devices/tech/debug/tagged-pointers
      [2] tools/testing/selftests/vm/userfaultfd.c
      
      Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
      Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
      Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
      Fixes: 63f0c603 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
      Signed-off-by: NPeter Collingbourne <pcc@google.com>
      Reviewed-by: NAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Alistair Delva <adelva@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mitch Phillips <mitchp@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: William McVicker <willmcvicker@google.com>
      Cc: <stable@vger.kernel.org>	[5.4]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e71e2ace
    • L
      Merge tag 'for-5.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f0fddcec
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "A few fixes and one patch to help some block layer API cleanups:
      
         - skip missing device when running fstrim
      
         - fix unpersisted i_size on fsync after expanding truncate
      
         - fix lock inversion problem when doing qgroup extent tracing
      
         - replace bdgrab/bdput usage, replace gendisk by block_device"
      
      * tag 'for-5.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: store a block_device in struct btrfs_ordered_extent
        btrfs: fix lock inversion problem when doing qgroup extent tracing
        btrfs: check for missing device in btrfs_trim_fs
        btrfs: fix unpersisted i_size on fsync after expanding truncate
      f0fddcec
    • L
      Merge tag 'ceph-for-5.14-rc3' of git://github.com/ceph/ceph-client · 704f4cba
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "A subtle deadlock on lock_rwsem (marked for stable) and rbd fixes for
        a -rc1 regression.
      
        Also included a rare WARN condition tweak"
      
      * tag 'ceph-for-5.14-rc3' of git://github.com/ceph/ceph-client:
        rbd: resurrect setting of disk->private_data in rbd_init_disk()
        ceph: don't WARN if we're still opening a session to an MDS
        rbd: don't hold lock_rwsem while running_list is being drained
        rbd: always kick acquire on "acquired" and "released" notifications
      704f4cba
    • L
      Merge tag 'trace-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 05daae0f
      Linus Torvalds 提交于
      Pull tracing fixes from Steven Rostedt:
      
       - Fix deadloop in ring buffer because of using stale "read" variable
      
       - Fix synthetic event use of field_pos as boolean and not an index
      
       - Fixed histogram special var "cpu" overriding event fields called
         "cpu"
      
       - Cleaned up error prone logic in alloc_synth_event()
      
       - Removed call to synchronize_rcu_tasks_rude() when not needed
      
       - Removed redundant initialization of a local variable "ret"
      
       - Fixed kernel crash when updating tracepoint callbacks of different
         priorities.
      
      * tag 'trace-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracepoints: Update static_call before tp_funcs when adding a tracepoint
        ftrace: Remove redundant initialization of variable ret
        ftrace: Avoid synchronize_rcu_tasks_rude() call when not necessary
        tracing: Clean up alloc_synth_event()
        tracing/histogram: Rename "cpu" to "common_cpu"
        tracing: Synthetic event field_pos is an index not a boolean
        tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
      05daae0f
    • L
      Merge tag 'm68k-for-v5.14-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 1af09ed5
      Linus Torvalds 提交于
      Pull m68k fix from Geert Uytterhoeven:
      
       - Fix a Mac defconfig regression due to the IDE -> ATA switch
      
      * tag 'm68k-for-v5.14-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: MAC should select HAVE_PATA_PLATFORM
      1af09ed5
    • L
      Merge tag 'acpi-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ec6badfb
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These fix a recently broken Kconfig dependency and ACPI device
        reference counting in an iterator macro.
      
        Specifics:
      
         - Fix recently broken Kconfig dependency for the ACPI table override
           via built-in initrd (Robert Richter)
      
         - Fix ACPI device reference counting in the for_each_acpi_dev_match()
           helper macro to avoid use-after-free (Andy Shevchenko)"
      
      * tag 'acpi-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: utils: Fix reference counting in for_each_acpi_dev_match()
        ACPI: Kconfig: Fix table override from built-in initrd
      ec6badfb
    • L
      Merge tag 'driver-core-5.14-rc3' of... · 1d597682
      Linus Torvalds 提交于
      Merge tag 'driver-core-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are two small driver core fixes to resolve some reported problems
        for 5.14-rc3. They include:
      
         - aux bus memory leak fix
      
         - unneeded warning message removed when removing a device link.
      
        Both have been in linux-next with no reported problems"
      
      * tag 'driver-core-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Prevent warning when removing a device link from unregistered consumer
        driver core: auxiliary bus: Fix memory leak when driver_register() fail
      1d597682
    • L
      Merge tag 'char-misc-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 8072911b
      Linus Torvalds 提交于
      Pull char/misc fixes from Greg KH:
       "Here are some small char/misc driver fixes for 5.14-rc3.
      
        Included in here are:
      
         - MAINTAINERS file updates for two changes in different driver
           subsystems
      
         - mhi bus bugfixes
      
         - nds32 bugfix that resolves a reported problem
      
        All have been in linux-next with no reported problems"
      
      * tag 'char-misc-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        nds32: fix up stack guard gap
        MAINTAINERS: Change ACRN HSM driver maintainer
        MAINTAINERS: Update for VMCI driver
        bus: mhi: pci_generic: Fix inbound IPCR channel
        bus: mhi: core: Validate channel ID when processing command completions
        bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean
      8072911b
    • L
      Merge tag 'usb-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 74738c55
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Here are some USB fixes for 5.14-rc3 to resolve a bunch of tiny
        problems reported. Included in here are:
      
         - dtsi revert to resolve a problem which broke android systems that
           relied on the dts name to find the USB controller device.
      
           People are still working out the "real" solution for this, but for
           now the revert is needed.
      
         - core USB fix for pipe calculation found by syzbot
      
         - typec fixes
      
         - gadget driver fixes
      
         - new usb-serial device ids
      
         - new USB quirks
      
         - xhci fixes
      
         - usb hub fixes for power management issues reported
      
         - other tiny fixes
      
        All have been in linux-next with no reported problems"
      
      * tag 'usb-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (27 commits)
        USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
        Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
        usb: cdc-wdm: fix build error when CONFIG_WWAN_CORE is not set
        Revert "arm64: dts: qcom: Harmonize DWC USB3 DT nodes name"
        usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
        usb: dwc2: Skip clock gating on Samsung SoCs
        usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
        usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
        usb: phy: Fix page fault from usb_phy_uevent
        usb: xhci: avoid renesas_usb_fw.mem when it's unusable
        usb: gadget: u_serial: remove WARN_ON on null port
        usb: dwc3: avoid NULL access of usb_gadget_driver
        usb: max-3421: Prevent corruption of freed memory
        usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
        MAINTAINERS: repair reference in USB IP DRIVER FOR HISILICON KIRIN 970
        usb: typec: stusb160x: Don't block probing of consumer of "connector" nodes
        usb: typec: stusb160x: register role switch before interrupt registration
        USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
        usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
        usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
        ...
      74738c55
    • L
      Merge tag 'sound-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e7562a00
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, mostly covering device-specific
        regressions and bugs over ASoC, HD-audio and USB-audio, while
        the ALSA PCM core received a few additional fixes for the
        possible (new and old) regressions"
      
      * tag 'sound-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
        ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
        ALSA: hda/hdmi: Add quirk to force pin connectivity on NUC10
        ALSA: pcm: Fix mmap without buffer preallocation
        ALSA: pcm: Fix mmap capability check
        ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
        ASoC: ti: j721e-evm: Check for not initialized parent_clk_id
        ASoC: ti: j721e-evm: Fix unbalanced domain activity tracking during startup
        ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
        ALSA: hdmi: Expose all pins on MSI MS-7C94 board
        ALSA: sb: Fix potential ABBA deadlock in CSP driver
        ASoC: rt5682: Fix the issue of garbled recording after powerd_dbus_suspend
        ASoC: amd: reverse stop sequence for stoneyridge platform
        ASoC: soc-pcm: add a flag to reverse the stop sequence
        ASoC: codecs: wcd938x: setup irq during component bind
        ASoC: dt-bindings: renesas: rsnd: Fix incorrect 'port' regex schema
        ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
        ASoC: codecs: wcd938x: make sdw dependency explicit in Kconfig
        ASoC: SOF: Intel: Update ADL descriptor to use ACPI power states
        ASoC: rt5631: Fix regcache sync errors on resume
        ALSA: pcm: Call substream ack() method upon compat mmap commit
        ...
      e7562a00
  3. 23 7月, 2021 15 次提交
    • R
      Merge branch 'acpi-utils' · 0b8a53a8
      Rafael J. Wysocki 提交于
      * acpi-utils:
        ACPI: utils: Fix reference counting in for_each_acpi_dev_match()
      0b8a53a8
    • S
      tracepoints: Update static_call before tp_funcs when adding a tracepoint · 352384d5
      Steven Rostedt (VMware) 提交于
      Because of the significant overhead that retpolines pose on indirect
      calls, the tracepoint code was updated to use the new "static_calls" that
      can modify the running code to directly call a function instead of using
      an indirect caller, and this function can be changed at runtime.
      
      In the tracepoint code that calls all the registered callbacks that are
      attached to a tracepoint, the following is done:
      
      	it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs);
      	if (it_func_ptr) {
      		__data = (it_func_ptr)->data;
      		static_call(tp_func_##name)(__data, args);
      	}
      
      If there's just a single callback, the static_call is updated to just call
      that callback directly. Once another handler is added, then the static
      caller is updated to call the iterator, that simply loops over all the
      funcs in the array and calls each of the callbacks like the old method
      using indirect calling.
      
      The issue was discovered with a race between updating the funcs array and
      updating the static_call. The funcs array was updated first and then the
      static_call was updated. This is not an issue as long as the first element
      in the old array is the same as the first element in the new array. But
      that assumption is incorrect, because callbacks also have a priority
      field, and if there's a callback added that has a higher priority than the
      callback on the old array, then it will become the first callback in the
      new array. This means that it is possible to call the old callback with
      the new callback data element, which can cause a kernel panic.
      
      	static_call = callback1()
      	funcs[] = {callback1,data1};
      	callback2 has higher priority than callback1
      
      	CPU 1				CPU 2
      	-----				-----
      
         new_funcs = {callback2,data2},
                     {callback1,data1}
      
         rcu_assign_pointer(tp->funcs, new_funcs);
      
        /*
         * Now tp->funcs has the new array
         * but the static_call still calls callback1
         */
      
      				it_func_ptr = tp->funcs [ new_funcs ]
      				data = it_func_ptr->data [ data2 ]
      				static_call(callback1, data);
      
      				/* Now callback1 is called with
      				 * callback2's data */
      
      				[ KERNEL PANIC ]
      
         update_static_call(iterator);
      
      To prevent this from happening, always switch the static_call to the
      iterator before assigning the tp->funcs to the new array. The iterator will
      always properly match the callback with its data.
      
      To trigger this bug:
      
        In one terminal:
      
          while :; do hackbench 50; done
      
        In another terminal
      
          echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable
          while :; do
              echo 1 > /sys/kernel/tracing/set_event_pid;
              sleep 0.5
              echo 0 > /sys/kernel/tracing/set_event_pid;
              sleep 0.5
         done
      
      And it doesn't take long to crash. This is because the set_event_pid adds
      a callback to the sched_waking tracepoint with a high priority, which will
      be called before the sched_waking trace event callback is called.
      
      Note, the removal to a single callback updates the array first, before
      changing the static_call to single callback, which is the proper order as
      the first element in the array is the same as what the static_call is
      being changed to.
      
      Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
      
      Cc: stable@vger.kernel.org
      Fixes: d25e37d8 ("tracepoint: Optimize using static_call()")
      Reported-by: NStefan Metzmacher <metze@samba.org>
      tested-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      352384d5
    • C
      ftrace: Remove redundant initialization of variable ret · 3b1a8f45
      Colin Ian King 提交于
      The variable ret is being initialized with a value that is never
      read, it is being updated later on. The assignment is redundant and
      can be removed.
      
      Link: https://lkml.kernel.org/r/20210721120915.122278-1-colin.king@canonical.com
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3b1a8f45
    • N
      ftrace: Avoid synchronize_rcu_tasks_rude() call when not necessary · 68e83498
      Nicolas Saenz Julienne 提交于
      synchronize_rcu_tasks_rude() triggers IPIs and forces rescheduling on
      all CPUs. It is a costly operation and, when targeting nohz_full CPUs,
      very disrupting (hence the name). So avoid calling it when 'old_hash'
      doesn't need to be freed.
      
      Link: https://lkml.kernel.org/r/20210721114726.1545103-1-nsaenzju@redhat.comSigned-off-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      68e83498
    • S
      tracing: Clean up alloc_synth_event() · 9528c195
      Steven Rostedt (VMware) 提交于
      alloc_synth_event() currently has the following code to initialize the
      event fields and dynamic_fields:
      
      	for (i = 0, j = 0; i < n_fields; i++) {
      		event->fields[i] = fields[i];
      
      		if (fields[i]->is_dynamic) {
      			event->dynamic_fields[j] = fields[i];
      			event->dynamic_fields[j]->field_pos = i;
      			event->dynamic_fields[j++] = fields[i];
      			event->n_dynamic_fields++;
      		}
      	}
      
      1) It would make more sense to have all fields keep track of their
         field_pos.
      
      2) event->dynmaic_fields[j] is assigned twice for no reason.
      
      3) We can move updating event->n_dynamic_fields outside the loop, and just
         assign it to j.
      
      This combination makes the code much cleaner.
      
      Link: https://lkml.kernel.org/r/20210721195341.29bb0f77@oasis.local.homeSigned-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      9528c195
    • S
      tracing/histogram: Rename "cpu" to "common_cpu" · 1e3bac71
      Steven Rostedt (VMware) 提交于
      Currently the histogram logic allows the user to write "cpu" in as an
      event field, and it will record the CPU that the event happened on.
      
      The problem with this is that there's a lot of events that have "cpu"
      as a real field, and using "cpu" as the CPU it ran on, makes it
      impossible to run histograms on the "cpu" field of events.
      
      For example, if I want to have a histogram on the count of the
      workqueue_queue_work event on its cpu field, running:
      
       ># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
      
      Gives a misleading and wrong result.
      
      Change the command to "common_cpu" as no event should have "common_*"
      fields as that's a reserved name for fields used by all events. And
      this makes sense here as common_cpu would be a field used by all events.
      
      Now we can even do:
      
       ># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger
       ># cat events/workqueue/workqueue_queue_work/hist
       # event histogram
       #
       # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active]
       #
      
       { common_cpu:          0, cpu:          2 } hitcount:          1
       { common_cpu:          0, cpu:          4 } hitcount:          1
       { common_cpu:          7, cpu:          7 } hitcount:          1
       { common_cpu:          0, cpu:          7 } hitcount:          1
       { common_cpu:          0, cpu:          1 } hitcount:          1
       { common_cpu:          0, cpu:          6 } hitcount:          2
       { common_cpu:          0, cpu:          5 } hitcount:          2
       { common_cpu:          1, cpu:          1 } hitcount:          4
       { common_cpu:          6, cpu:          6 } hitcount:          4
       { common_cpu:          5, cpu:          5 } hitcount:         14
       { common_cpu:          4, cpu:          4 } hitcount:         26
       { common_cpu:          0, cpu:          0 } hitcount:         39
       { common_cpu:          2, cpu:          2 } hitcount:        184
      
      Now for backward compatibility, I added a trick. If "cpu" is used, and
      the field is not found, it will fall back to "common_cpu" and work as
      it did before. This way, it will still work for old programs that use
      "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it
      will get that event's "cpu" field, which is probably what it wants
      anyway.
      
      I updated the tracefs/README to include documentation about both the
      common_timestamp and the common_cpu. This way, if that text is present in
      the README, then an application can know that common_cpu is supported over
      just plain "cpu".
      
      Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 8b7622bf ("tracing: Add cpu field for hist triggers")
      Reviewed-by: NTom Zanussi <zanussi@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1e3bac71
    • S
      tracing: Synthetic event field_pos is an index not a boolean · 3b13911a
      Steven Rostedt (VMware) 提交于
      Performing the following:
      
       ># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events
       ># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs' > events/sched/sched_waking/trigger
       ># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1:onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,prev_comm)'\
            > events/sched/sched_switch/trigger
       ># echo 1 > events/synthetic/enable
      
      Crashed the kernel:
      
       BUG: kernel NULL pointer dereference, address: 000000000000001b
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.13.0-rc5-test+ #104
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:strlen+0x0/0x20
       Code: f6 82 80 2b 0b bc 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2b 0b bc
        20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10
        48 89 f8 48 83 c0 01 80 38 9 f8 c3 31
       RSP: 0018:ffffaa75000d79d0 EFLAGS: 00010046
       RAX: 0000000000000002 RBX: ffff9cdb55575270 RCX: 0000000000000000
       RDX: ffff9cdb58c7a320 RSI: ffffaa75000d7b40 RDI: 000000000000001b
       RBP: ffffaa75000d7b40 R08: ffff9cdb40a4f010 R09: ffffaa75000d7ab8
       R10: ffff9cdb4398c700 R11: 0000000000000008 R12: ffff9cdb58c7a320
       R13: ffff9cdb55575270 R14: ffff9cdb58c7a000 R15: 0000000000000018
       FS:  0000000000000000(0000) GS:ffff9cdb5aa00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000000000001b CR3: 00000000c0612006 CR4: 00000000001706e0
       Call Trace:
        trace_event_raw_event_synth+0x90/0x1d0
        action_trace+0x5b/0x70
        event_hist_trigger+0x4bd/0x4e0
        ? cpumask_next_and+0x20/0x30
        ? update_sd_lb_stats.constprop.0+0xf6/0x840
        ? __lock_acquire.constprop.0+0x125/0x550
        ? find_held_lock+0x32/0x90
        ? sched_clock_cpu+0xe/0xd0
        ? lock_release+0x155/0x440
        ? update_load_avg+0x8c/0x6f0
        ? enqueue_entity+0x18a/0x920
        ? __rb_reserve_next+0xe5/0x460
        ? ring_buffer_lock_reserve+0x12a/0x3f0
        event_triggers_call+0x52/0xe0
        trace_event_buffer_commit+0x1ae/0x240
        trace_event_raw_event_sched_switch+0x114/0x170
        __traceiter_sched_switch+0x39/0x50
        __schedule+0x431/0xb00
        schedule_idle+0x28/0x40
        do_idle+0x198/0x2e0
        cpu_startup_entry+0x19/0x20
        secondary_startup_64_no_verify+0xc2/0xcb
      
      The reason is that the dynamic events array keeps track of the field
      position of the fields array, via the field_pos variable in the
      synth_field structure. Unfortunately, that field is a boolean for some
      reason, which means any field_pos greater than 1 will be a bug (in this
      case it was 2).
      
      Link: https://lkml.kernel.org/r/20210721191008.638bce34@oasis.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: bd82631d ("tracing: Add support for dynamic strings to synthetic events")
      Reviewed-by: NTom Zanussi <zanussi@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3b13911a
    • L
      Merge tag 'drm-fixes-2021-07-23' of git://anongit.freedesktop.org/drm/drm · 8baef638
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Regular fixes - a bunch of amdgpu fixes are the main thing mostly for
        the new gpus. There is also some i915 reverts for older changes that
        were having some unwanted side effects. One nouveau fix for a report
        regressions, and otherwise just some misc fixes.
      
        core:
         - fix for non-drm ioctls on drm fd
      
        panel:
         - avoid double free
      
        ttm:
         - refcounting fix
         - NULL checks
      
        amdgpu:
         - Yellow Carp updates
         - Add some Yellow Carp DIDs
         - Beige Goby updates
         - CIK 10bit 4K regression fix
         - GFX10 golden settings updates
         - eDP panel regression fix
         - Misc display fixes
         - Aldebaran fix
         - fix COW checks
      
        nouveau:
         - init BO GEM fields
      
        i915:
         - revert async command parsing
         - revert fence error propogation
         - GVT fix for shadow ppgtt
      
        vc4:
         - fix interrupt handling"
      
      * tag 'drm-fixes-2021-07-23' of git://anongit.freedesktop.org/drm/drm: (34 commits)
        drm/panel: raspberrypi-touchscreen: Prevent double-free
        drm/amdgpu - Corrected the video codecs array name for yellow carp
        drm/amd/display: Fix ASSR regression on embedded panels
        drm/amdgpu: add yellow carp pci id (v2)
        drm/amdgpu: update yellow carp external rev_id handling
        drm/amd/pm: Support board calibration on aldebaran
        drm/amd/display: change zstate allow msg condition
        drm/amd/display: Populate dtbclk entries for dcn3.02/3.03
        drm/amd/display: Line Buffer changes
        drm/amd/display: Remove MALL function from DCN3.1
        drm/amd/display: Only set default brightness for OLED
        drm/amd/display: Update bounding box for DCN3.1
        drm/amd/display: Query VCO frequency from register for DCN3.1
        drm/amd/display: Populate socclk entries for dcn3.02/3.03
        drm/amd/display: Fix max vstartup calculation for modes with borders
        drm/amd/display: implement workaround for riommu related hang
        drm/amd/display: Fix comparison error in dcn21 DML
        drm/i915: Correct the docs for intel_engine_cmd_parser
        drm/ttm: add missing NULL checks
        drm/ttm: Force re-init if ttm_global_init() fails
        ...
      8baef638
    • L
      Merge tag 'fallthrough-fixes-clang-5.14-rc3' of... · e08100fe
      Linus Torvalds 提交于
      Merge tag 'fallthrough-fixes-clang-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull fallthrough fix from Gustavo Silva:
       "Fix a fall-through warning when building with -Wimplicit-fallthrough
        on PowerPC"
      
      * tag 'fallthrough-fixes-clang-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
        powerpc/pasemi: Fix fall-through warning for Clang
      e08100fe
    • D
      Merge tag 'drm-misc-fixes-2021-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 2e41a669
      Dave Airlie 提交于
      Short summary of fixes pull:
      
       * Return -ENOTTY for non-DRM ioctls
       * amdgpu: Fix COW checks
       * nouveau: init BO GME fields
       * panel: Avoid double free
       * ttm: Fix refcounting in ttm_global_init(); NULL checks
       * vc4: Fix interrupt handling
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/YPlbkmH6S4VAHP9j@linux-uq9g.fritz.box
      2e41a669
    • D
      Merge tag 'drm-intel-fixes-2021-07-22' of... · 36ebaeb4
      Dave Airlie 提交于
      Merge tag 'drm-intel-fixes-2021-07-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      Couple reverts from Jason getting rid of asynchronous command parsing
      and fence error propagation and a GVT fix of shadow ppgtt invalidation
      with proper D3 state tracking from Colin.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/YPl1sIyruD0U5Orl@intel.com
      36ebaeb4
    • L
      Merge tag 'array-bounds-fixes-5.14-rc3' of... · 9bead1b5
      Linus Torvalds 提交于
      Merge tag 'array-bounds-fixes-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull array bounds warning fix from Gustavo Silva:
       "Fix a couple of out-of-bounds warnings in the media subsystem.
      
        This is part of the ongoing efforts to globally enable -Warray-bounds"
      
      * tag 'array-bounds-fixes-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
        media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
      9bead1b5
    • G
      Merge tag 'usb-serial-5.14-rc3' of... · 1d1b97d5
      Greg Kroah-Hartman 提交于
      Merge tag 'usb-serial-5.14-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
      
      Johan writes:
      
      USB-serial fixes for 5.14-rc3
      
      Here are some new device ids and a device-id comment fix.
      
      All have been in linux-next with no reported issues.
      
      * tag 'usb-serial-5.14-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
        USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
        USB: serial: cp210x: fix comments for GE CS1000
        USB: serial: option: add support for u-blox LARA-R6 family
      1d1b97d5
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 9f42f674
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "A pair of arm64 fixes for -rc3. The straightforward one is a fix to
        our firmware calling stub, which accidentally started corrupting the
        link register on machines with SVE. Since these machines don't really
        exist yet, it wasn't spotted in -next.
      
        The other fix is a revert-and-a-bit of a patch originally intended to
        allow PTE-level huge mappings for the VMAP area on 32-bit PPC 8xx. A
        side-effect of this change was that our pXd_set_huge() implementations
        could be replaced with generic dummy functions depending on the levels
        of page-table being used, which in turn broke the boot if we fail to
        create the linear mapping as a result of using these functions to
        operate on the pgd. Huge thanks to Michael Ellerman for modifying the
        revert so as not to regress PPC 8xx in terms of functionality.
      
        Anyway, that's the background and it's also available in the commit
        message along with Link tags pointing at all of the fun.
      
        Summary:
      
         - Fix hang when issuing SMC on SVE-capable system due to
           clobbered LR
      
         - Fix boot failure due to missing block mappings with folded
           page-table"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge"
        arm64: smccc: Save lr before calling __arm_smccc_sve_check()
      9f42f674
    • L
      Merge tag 'hyperv-fixes-signed-20210722' of... · 7c14e4d6
      Linus Torvalds 提交于
      Merge tag 'hyperv-fixes-signed-20210722' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - bug fix from Haiyang for vmbus CPU assignment
      
       - revert of a bogus patch that went into 5.14-rc1
      
      * tag 'hyperv-fixes-signed-20210722' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        Revert "x86/hyperv: fix logical processor creation"
        Drivers: hv: vmbus: Fix duplicate CPU assignments within a device
      7c14e4d6