1. 09 9月, 2021 40 次提交
    • H
      parisc: Reduce sigreturn trampoline to 3 instructions · e4f2006f
      Helge Deller 提交于
      We can move the INSN_LDI_R20 instruction into the branch delay slot.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      e4f2006f
    • H
      parisc: Check user signal stack trampoline is inside TASK_SIZE · 3e4a1aff
      Helge Deller 提交于
      Add some additional checks to ensure the signal stack is inside
      userspace bounds.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      3e4a1aff
    • H
      ea4b3fca
    • H
      parisc: Drop strnlen_user() in favour of generic version · 1260dea6
      Helge Deller 提交于
      As suggested by Arnd Bergmann, drop the parisc version of
      strnlen_user() and switch to the generic version.
      Suggested-by: NArnd Bergmann <arnd@kernel.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      1260dea6
    • H
      parisc: Add missing FORCE prerequisite in Makefile · 3da6379a
      Helge Deller 提交于
      Signed-off-by: NHelge Deller <deller@gmx.de>
      3da6379a
    • L
      Merge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew) · a3fa7a10
      Linus Torvalds 提交于
      Merge yet more updates and hotfixes from Andrew Morton:
       "Post-linux-next material, based upon latest upstream to catch the
        now-merged dependencies:
      
         - 10 patches.
      
           Subsystems affected by this patch series: mm (vmstat and migration)
           and compat.
      
        And bunch of hotfixes, mostly cc:stable:
      
         - 8 patches.
      
           Subsystems affected by this patch series: mm (hmm, hugetlb, vmscan,
           pagealloc, pagemap, kmemleak, mempolicy, and memblock)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        arch: remove compat_alloc_user_space
        compat: remove some compat entry points
        mm: simplify compat numa syscalls
        mm: simplify compat_sys_move_pages
        kexec: avoid compat_alloc_user_space
        kexec: move locking into do_kexec_load
        mm: migrate: change to use bool type for 'page_was_mapped'
        mm: migrate: fix the incorrect function name in comments
        mm: migrate: introduce a local variable to get the number of pages
        mm/vmstat: protect per cpu variables with preempt disable on RT
      
      * emailed hotfixes from Andrew Morton <akpm@linux-foundation.org>:
        nds32/setup: remove unused memblock_region variable in setup_memory()
        mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
        mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
        mmap_lock: change trace and locking order
        mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
        mm,vmscan: fix divide by zero in get_scan_count
        mm/hugetlb: initialize hugetlb_usage in mm_init
        mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
      a3fa7a10
    • M
      nds32/setup: remove unused memblock_region variable in setup_memory() · ddb13122
      Mike Rapoport 提交于
      kernel test robot reports unused variable warning:
      
         arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region
         [unusedVariable]
          struct memblock_region *region;
                                  ^
      
      Remove the unused variable.
      
      Link: https://lkml.kernel.org/r/20210712125218.28951-1-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddb13122
    • Y
      mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task · 276aeee1
      yanghui 提交于
      Servers happened below panic:
      
        Kernel version:5.4.56
        BUG: unable to handle page fault for address: 0000000000002c48
        RIP: 0010:__next_zones_zonelist+0x1d/0x40
        Call Trace:
          __alloc_pages_nodemask+0x277/0x310
          alloc_page_interleave+0x13/0x70
          handle_mm_fault+0xf99/0x1390
          __do_page_fault+0x288/0x500
          do_page_fault+0x30/0x110
          page_fault+0x3e/0x50
      
      The reason for the panic is that MAX_NUMNODES is passed in the third
      parameter in __alloc_pages_nodemask(preferred_nid).  So access to
      zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic.
      
      In offset_il_node(), first_node() returns nid from pol->v.nodes, after
      this other threads may chang pol->v.nodes before next_node().  This race
      condition will let next_node return MAX_NUMNODES.  So put pol->nodes in
      a local variable.
      
      The race condition is between offset_il_node and cpuset_change_task_nodemask:
      
        CPU0:                                     CPU1:
        alloc_pages_vma()
          interleave_nid(pol,)
            offset_il_node(pol,)
              first_node(pol->v.nodes)            cpuset_change_task_nodemask
                              //nodes==0xc          mpol_rebind_task
                                                      mpol_rebind_policy
                                                        mpol_rebind_nodemask(pol,nodes)
                              //nodes==0x3
              next_node(nid, pol->v.nodes)//return MAX_NUMNODES
      
      Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.comSigned-off-by: Nyanghui <yanghui.def@bytedance.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      276aeee1
    • N
      mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp · 79d37050
      Naohiro Aota 提交于
      In a memory pressure situation, I'm seeing the lockdep WARNING below.
      Actually, this is similar to a known false positive which is already
      addressed by commit 6dcde60e ("xfs: more lockdep whackamole with
      kmem_alloc*").
      
      This warning still persists because it's not from kmalloc() itself but
      from an allocation for kmemleak object.  While kmalloc() itself suppress
      the warning with __GFP_NOLOCKDEP, gfp_kmemleak_mask() is dropping the
      flag for the kmemleak's allocation.
      
      Allow __GFP_NOLOCKDEP to be passed to kmemleak's allocation, so that the
      warning for it is also suppressed.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.14.0-rc7-BTRFS-ZNS+ #37 Not tainted
        ------------------------------------------------------
        kswapd0/288 is trying to acquire lock:
        ffff88825ab45df0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x8a/0x250
      
        but task is already holding lock:
        ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (fs_reclaim){+.+.}-{0:0}:
               fs_reclaim_acquire+0x112/0x160
               kmem_cache_alloc+0x48/0x400
               create_object.isra.0+0x42/0xb10
               kmemleak_alloc+0x48/0x80
               __kmalloc+0x228/0x440
               kmem_alloc+0xd3/0x2b0
               kmem_alloc_large+0x5a/0x1c0
               xfs_attr_copy_value+0x112/0x190
               xfs_attr_shortform_getvalue+0x1fc/0x300
               xfs_attr_get_ilocked+0x125/0x170
               xfs_attr_get+0x329/0x450
               xfs_get_acl+0x18d/0x430
               get_acl.part.0+0xb6/0x1e0
               posix_acl_xattr_get+0x13a/0x230
               vfs_getxattr+0x21d/0x270
               getxattr+0x126/0x310
               __x64_sys_fgetxattr+0x1a6/0x2a0
               do_syscall_64+0x3b/0x90
               entry_SYSCALL_64_after_hwframe+0x44/0xae
      
        -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
               __lock_acquire+0x2c0f/0x5a00
               lock_acquire+0x1a1/0x4b0
               down_read_nested+0x50/0x90
               xfs_ilock+0x8a/0x250
               xfs_can_free_eofblocks+0x34f/0x570
               xfs_inactive+0x411/0x520
               xfs_fs_destroy_inode+0x2c8/0x710
               destroy_inode+0xc5/0x1a0
               evict+0x444/0x620
               dispose_list+0xfe/0x1c0
               prune_icache_sb+0xdc/0x160
               super_cache_scan+0x31e/0x510
               do_shrink_slab+0x337/0x8e0
               shrink_slab+0x362/0x5c0
               shrink_node+0x7a7/0x1a40
               balance_pgdat+0x64e/0xfe0
               kswapd+0x590/0xa80
               kthread+0x38c/0x460
               ret_from_fork+0x22/0x30
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0                    CPU1
               ----                    ----
          lock(fs_reclaim);
                                       lock(&xfs_nondir_ilock_class);
                                       lock(fs_reclaim);
          lock(&xfs_nondir_ilock_class);
      
         *** DEADLOCK ***
        3 locks held by kswapd0/288:
         #0: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
         #1: ffffffff848a08d8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x269/0x5c0
         #2: ffff8881a7a820e8 (&type->s_umount_key#60){++++}-{3:3}, at: super_cache_scan+0x5a/0x510
      
      Link: https://lkml.kernel.org/r/20210907055659.3182992-1-naohiro.aota@wdc.comSigned-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: "Darrick J . Wong" <djwong@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79d37050
    • L
      mmap_lock: change trace and locking order · 10994316
      Liam Howlett 提交于
      Print to the trace log before releasing the lock to avoid racing with
      other trace log printers of the same lock type.
      
      Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.comSigned-off-by: NLiam R. Howlett <Liam.Howlett@oracle.com>
      Suggested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michel Lespinasse <walken.cr@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10994316
    • M
      mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype · 053cfda1
      Miaohe Lin 提交于
      If it's not prepared to free unref page, the pcp page migratetype is
      unset.  Thus we will get rubbish from get_pcppage_migratetype() and
      might list_del(&page->lru) again after it's already deleted from the list
      leading to grumble about data corruption.
      
      Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com
      Fixes: df1acc85 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      053cfda1
    • R
      mm,vmscan: fix divide by zero in get_scan_count · 32d4f4b7
      Rik van Riel 提交于
      Commit f56ce412 ("mm: memcontrol: fix occasional OOMs due to
      proportional memory.low reclaim") introduced a divide by zero corner
      case when oomd is being used in combination with cgroup memory.low
      protection.
      
      When oomd decides to kill a cgroup, it will force the cgroup memory to
      be reclaimed after killing the tasks, by writing to the memory.max file
      for that cgroup, forcing the remaining page cache and reclaimable slab
      to be reclaimed down to zero.
      
      Previously, on cgroups with some memory.low protection that would result
      in the memory being reclaimed down to the memory.low limit, or likely
      not at all, having the page cache reclaimed asynchronously later.
      
      With f56ce412 the oomd write to memory.max tries to reclaim all the
      way down to zero, which may race with another reclaimer, to the point of
      ending up with the divide by zero below.
      
      This patch implements the obvious fix.
      
      Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
      Fixes: f56ce412 ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NChris Down <chris@chrisdown.name>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d4f4b7
    • L
      mm/hugetlb: initialize hugetlb_usage in mm_init · 13db8c50
      Liu Zixian 提交于
      After fork, the child process will get incorrect (2x) hugetlb_usage.  If
      a process uses 5 2MB hugetlb pages in an anonymous mapping,
      
      	HugetlbPages:	   10240 kB
      
      and then forks, the child will show,
      
      	HugetlbPages:	   20480 kB
      
      The reason for double the amount is because hugetlb_usage will be copied
      from the parent and then increased when we copy page tables from parent
      to child.  Child will have 2x actual usage.
      
      Fix this by adding hugetlb_count_init in mm_init.
      
      Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
      Fixes: 5d317b2b ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
      Signed-off-by: NLiu Zixian <liuzixian4@huawei.com>
      Reviewed-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13db8c50
    • L
      mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled · 4b42fb21
      Li Zhijian 提交于
      Previously, we noticed the one rpma example was failed[1] since commit
      36f30e48 ("IB/core: Improve ODP to use hmm_range_fault()"), where it
      will use ODP feature to do RDMA WRITE between fsdax files.
      
      After digging into the code, we found hmm_vma_handle_pte() will still
      return EFAULT even though all the its requesting flags has been
      fulfilled.  That's because a DAX page will be marked as (_PAGE_SPECIAL |
      PAGE_DEVMAP) by pte_mkdevmap().
      
      Link: https://github.com/pmem/rpma/issues/1142 [1]
      Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com
      Fixes: 40550627 ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")
      Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b42fb21
    • L
      Merge tag 'tag-chrome-platform-for-v5.15' of... · 730bf31b
      Linus Torvalds 提交于
      Merge tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
      
      Pull chrome platform updates from Benson Leung:
       "cros_ec_typec:
      
         - make the cros_ec_typec driver to use the pre-existing
           cros_ec_check_features() function
      
        sensorhub:
      
         - add trace events for sample
      
        misc:
      
         - cros_ec_proto - re-send commands in the event of a timeout (for the
           FPMCU)
      
         - fix warnings in cros_ec_trace related to format output"
      
      * tag 'tag-chrome-platform-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
        platform/chrome: cros_ec_trace: Fix format warnings
        platform/chrome: cros_ec_typec: Use existing feature check
        platform/chrome: cros_ec_proto: Send command again when timeout occurs
        platform/chrome: sensorhub: Add trace events for sample
      730bf31b
    • L
      Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 30f34909
      Linus Torvalds 提交于
      Pull more power management updates from Rafael Wysocki:
       "These are mostly ARM cpufreq driver updates, including one new
        MediaTek driver that has just passed all of the reviews, with the
        addition of a revert of a recent intel_pstate commit, some core
        cpufreq changes and a DT-related update of the operating performance
        points (OPP) support code.
      
        Specifics:
      
         - Add new cpufreq driver for the MediaTek MT6779 platform called
           mediatek-hw along with corresponding DT bindings (Hector.Yuan).
      
         - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
           Gopinath).
      
         - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
           policy flag (Taniya Das).
      
         - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
           Andersson).
      
         - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
           flag (Viresh Kumar).
      
         - Add new cpufreq driver callback to allow drivers to register with
           the Energy Model in a consistent way and make several drivers use
           it (Viresh Kumar).
      
         - Change the remaining users of the .ready() cpufreq driver callback
           to move the code from it elsewhere and drop it from the cpufreq
           core (Viresh Kumar).
      
         - Revert recent intel_pstate change adding HWP guaranteed performance
           change notification support to it that led to problems, because the
           notification in question is triggered prematurely on some systems
           (Rafael Wysocki).
      
         - Convert the OPP DT bindings to DT schema and clean them up while at
           it (Rob Herring)"
      
      * tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
        Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
        cpufreq: mediatek-hw: Add support for CPUFREQ HW
        cpufreq: Add of_perf_domain_get_sharing_cpumask
        dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
        cpufreq: Remove ready() callback
        cpufreq: sh: Remove sh_cpufreq_cpu_ready()
        cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
        cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
        cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
        cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
        cpufreq: scmi: Use .register_em() to register with energy model
        cpufreq: vexpress: Use .register_em() to register with energy model
        cpufreq: scpi: Use .register_em() to register with energy model
        dt-bindings: opp: Convert to DT schema
        dt-bindings: Clean-up OPP binding node names in examples
        ARM: dts: omap: Drop references to opp.txt
        cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
        cpufreq: omap: Use .register_em() to register with energy model
        cpufreq: mediatek: Use .register_em() to register with energy model
        cpufreq: imx6q: Use .register_em() to register with energy model
        ...
      30f34909
    • L
      Merge tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9c566611
      Linus Torvalds 提交于
      Pull more ACPI updates from Rafael Wysocki:
       "These add ACPI support to the PCI VMD driver, improve suspend-to-idle
        support for AMD platforms and update documentation.
      
        Specifics:
      
         - Add ACPI support to the PCI VMD driver (Rafael Wysocki)
      
         - Rearrange suspend-to-idle support code to reflect the platform
           firmware expectations on some AMD platforms (Mario Limonciello)
      
         - Make SSDT overlays documentation follow the code documented by it
           more closely (Andy Shevchenko)"
      
      * tag 'acpi-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported
        Documentation: ACPI: Align the SSDT overlays file with the code
        PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus
      9c566611
    • L
      Merge tag 'docs-5.15-2' of git://git.lwn.net/linux · 0f4b9289
      Linus Torvalds 提交于
      Pull more documentation updates from Jonathan Corbet:
       "Another collection of documentation patches, mostly fixes but also
        includes another set of traditional Chinese translations"
      
      * tag 'docs-5.15-2' of git://git.lwn.net/linux:
        docs: pdfdocs: Fix typo in CJK-language specific font settings
        docs: kernel-hacking: Remove inappropriate text
        docs/zh_TW: add translations for zh_TW/filesystems
        docs/zh_TW: add translations for zh_TW/cpu-freq
        docs/zh_TW: add translations for zh_TW/arm64
        docs/zh_CN: Modify the translator tag and fix the wrong word
        Documentation/features/vm: correct huge-vmap APIs
        Documentation: block: blk-mq: Fix small typo in multi-queue docs
        Documentation: in_irq() cleanup
        Documentation: arm: marvell: Add 88F6825 model into list
        Documentation/process/maintainer-pgp-guide: Replace broken link to PGP path finder
        Documentation: locking: fix references
        Documentation: Update details of The Linux Kernel Module Programming Guide
        docs: x86: Remove obsolete information about x86_64 vmalloc() faulting
        Documentation/process/applying-patches: Activate linux-next man hyperlink
      0f4b9289
    • L
      Merge tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 6dcaf9fb
      Linus Torvalds 提交于
      Pull module updates from Jessica Yu:
       "The only main change I have for this round of updates is the modules
        MAINTAINERS update.
      
        As I find myself with less time to devote to upstream these days, Luis
        has kindly agreed to help maintain the module loader, to eventually
        transition to being the primary maintainer. Since Luis is already very
        involved upstream with experience maintaining various areas of the
        kernel including the kmod usermode helper, I think he is a great fit
        for this area of the kernel.
      
        Summary:
      
         - Add Luis Chamberlain as modules maintainer
      
         - Fix for .ctors sections in module linker script"
      
      * tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        MAINTAINERS: Add Luis Chamberlain as modules maintainer
        module: combine constructors in module linker script
      6dcaf9fb
    • L
      Merge tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze · 1511e5d6
      Linus Torvalds 提交于
      Pull microblaze update from Michal Simek:
      
       - Kbuild clean up
      
      * tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: move core-y in arch/microblaze/Makefile to arch/microblaze/Kbuild
      1511e5d6
    • L
      Merge tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 14e2bc4e
      Linus Torvalds 提交于
      Pull nfsd fixes from Chuck Lever:
      
       - Restore performance on memory-starved servers
      
      * tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: improve error response to over-size gss credential
        SUNRPC: don't pause on incomplete allocation
      14e2bc4e
    • L
      Merge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client · 8a05abd0
      Linus Torvalds 提交于
      Pull ceph updates from Ilya Dryomov:
      
       - a set of patches to address fsync stalls caused by depending on
         periodic rather than triggered MDS journal flushes in some cases
         (Xiubo Li)
      
       - a fix for mtime effectively not getting updated in case of competing
         writers (Jeff Layton)
      
       - a couple of fixes for inode reference leaks and various WARNs after
         "umount -f" (Xiubo Li)
      
       - a new ceph.auth_mds extended attribute (Jeff Layton)
      
       - a smattering of fixups and cleanups from Jeff, Xiubo and Colin.
      
      * tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client:
        ceph: fix dereference of null pointer cf
        ceph: drop the mdsc_get_session/put_session dout messages
        ceph: lockdep annotations for try_nonblocking_invalidate
        ceph: don't WARN if we're forcibly removing the session caps
        ceph: don't WARN if we're force umounting
        ceph: remove the capsnaps when removing caps
        ceph: request Fw caps before updating the mtime in ceph_write_iter
        ceph: reconnect to the export targets on new mdsmaps
        ceph: print more information when we can't find snaprealm
        ceph: add ceph_change_snap_realm() helper
        ceph: remove redundant initializations from mdsc and session
        ceph: cancel delayed work instead of flushing on mdsc teardown
        ceph: add a new vxattr to return auth mds for an inode
        ceph: remove some defunct forward declarations
        ceph: flush the mdlog before waiting on unsafe reqs
        ceph: flush mdlog before umounting
        ceph: make iterate_sessions a global symbol
        ceph: make ceph_create_session_msg a global symbol
        ceph: fix comment about short copies in ceph_write_end
        ceph: fix memory leak on decode error in ceph_handle_caps
      8a05abd0
    • L
      Merge tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux · 34c59da4
      Linus Torvalds 提交于
      Pull 9p updates from Dominique Martinet:
       "A couple of harmless fixes, increase max tcp msize (64KB -> 1MB), and
        increase default msize (8KB -> 128KB)
      
        The default increase has been discussed with Christian for the qemu
        side of things but makes sense for all supported transports"
      
      * tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux:
        net/9p: increase default msize to 128k
        net/9p: use macro to define default msize
        net/9p: increase tcp max msize to 1MB
        9p/xen: Fix end of loop tests for list_for_each_entry
        9p/trans_virtio: Remove sysfs file on probe failure
      34c59da4
    • A
      arch: remove compat_alloc_user_space · a7a08b27
      Arnd Bergmann 提交于
      All users of compat_alloc_user_space() and copy_in_user() have been
      removed from the kernel, only a few functions in sparc remain that can be
      changed to calling arch_copy_in_user() instead.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7a08b27
    • A
      compat: remove some compat entry points · 59ab844e
      Arnd Bergmann 提交于
      These are all handled correctly when calling the native system call entry
      point, so remove the special cases.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59ab844e
    • A
      mm: simplify compat numa syscalls · e130242d
      Arnd Bergmann 提交于
      The compat implementations for mbind, get_mempolicy, set_mempolicy and
      migrate_pages are just there to handle the subtly different layout of
      bitmaps on 32-bit hosts.
      
      The compat implementation however lacks some of the checks that are
      present in the native one, in particular for checking that the extra bits
      are all zero when user space has a larger mask size than the kernel.
      Worse, those extra bits do not get cleared when copying in or out of the
      kernel, which can lead to incorrect data as well.
      
      Unify the implementation to handle the compat bitmap layout directly in
      the get_nodes() and copy_nodes_to_user() helpers.  Splitting out the
      get_bitmap() helper from get_nodes() also helps readability of the native
      case.
      
      On x86, two additional problems are addressed by this: compat tasks can
      pass a bitmap at the end of a mapping, causing a fault when reading across
      the page boundary for a 64-bit word.  x32 tasks might also run into
      problems with get_mempolicy corrupting data when an odd number of 32-bit
      words gets passed.
      
      On parisc the migrate_pages() system call apparently had the wrong calling
      convention, as big-endian architectures expect the words inside of a
      bitmap to be swapped.  This is not a problem though since parisc has no
      NUMA support.
      
      [arnd@arndb.de: fix mempolicy crash]
        Link: https://lkml.kernel.org/r/20210730143417.3700653-1-arnd@kernel.org
        Link: https://lore.kernel.org/lkml/YQPLG20V3dmOfq3a@osiris/
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-5-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e130242d
    • A
      mm: simplify compat_sys_move_pages · 5b1b561b
      Arnd Bergmann 提交于
      The compat move_pages() implementation uses compat_alloc_user_space() for
      converting the pointer array.  Moving the compat handling into the
      function itself is a bit simpler and lets us avoid the
      compat_alloc_user_space() call.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-4-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b1b561b
    • A
      kexec: avoid compat_alloc_user_space · 5d700a0f
      Arnd Bergmann 提交于
      kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load()
      uses compat_alloc_user_space() to convert the layout and put it back onto
      the user space caller stack.
      
      Moving the user space access into the syscall handler directly actually
      makes the code simpler, as the conversion for compat mode can now be done
      on kernel memory.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-3-arnd@kernel.org
      Link: https://lore.kernel.org/lkml/YPbtsU4GX6PL7%2F42@infradead.org/
      Link: https://lore.kernel.org/lkml/m1y2cbzmnw.fsf@fess.ebiederm.org/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Co-developed-by: NEric Biederman <ebiederm@xmission.com>
      Co-developed-by: NChristoph Hellwig <hch@infradead.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d700a0f
    • A
      kexec: move locking into do_kexec_load · 4b692e86
      Arnd Bergmann 提交于
      Patch series "compat: remove compat_alloc_user_space", v5.
      
      Going through compat_alloc_user_space() to convert indirect system call
      arguments tends to add complexity compared to handling the native and
      compat logic in the same code.
      
      This patch (of 6):
      
      The locking is the same between the native and compat version of
      sys_kexec_load(), so it can be done in the common implementation to reduce
      duplication.
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-1-arnd@kernel.org
      Link: https://lkml.kernel.org/r/20210727144859.4150043-2-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Co-developed-by: NEric Biederman <ebiederm@xmission.com>
      Co-developed-by: NChristoph Hellwig <hch@infradead.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b692e86
    • B
      mm: migrate: change to use bool type for 'page_was_mapped' · 213ecb31
      Baolin Wang 提交于
      Change to use bool type for 'page_was_mapped' variable making it more
      readable.
      
      Link: https://lkml.kernel.org/r/ce1279df18d2c163998c403e0b5ec6d3f6f90f7a.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      213ecb31
    • B
      mm: migrate: fix the incorrect function name in comments · 68a9843f
      Baolin Wang 提交于
      since commit a98a2f0c ("mm/rmap: split migration into its own
      function"), the migration ptes establishment has been split into a
      separate try_to_migrate() function, thus update the related comments.
      
      Link: https://lkml.kernel.org/r/5b824bad6183259c916ae6cf42f81d14c6118b06.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Reviewed-by: NAlistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68a9843f
    • B
      mm: migrate: introduce a local variable to get the number of pages · 2b9b624f
      Baolin Wang 提交于
      Use thp_nr_pages() instead of compound_nr() to get the number of pages for
      THP page, meanwhile introducing a local variable 'nr_pages' to avoid
      getting the number of pages repeatedly.
      
      Link: https://lkml.kernel.org/r/a8e331ac04392ee230c79186330fb05e86a2aa77.1629447552.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b9b624f
    • I
      mm/vmstat: protect per cpu variables with preempt disable on RT · c68ed794
      Ingo Molnar 提交于
      Disable preemption on -RT for the vmstat code.  On vanila the code runs in
      IRQ-off regions while on -RT it may not when stats are updated under a
      local_lock.  "preempt_disable" ensures that the same resources is not
      updated in parallel due to preemption.
      
      This patch differs from the preempt-rt version where __count_vm_event and
      __count_vm_events are also protected.  The counters are explicitly
      "allowed to be to be racy" so there is no need to protect them from
      preemption.  Only the accurate page stats that are updated by a
      read-modify-write need protection.  This patch also differs in that a
      preempt_[en|dis]able_rt helper is not used.  As vmstat is the only user of
      the helper, it was suggested that it be open-coded in vmstat.c instead of
      risking the helper being used in unnecessary contexts.
      
      Link: https://lkml.kernel.org/r/20210805160019.1137-2-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c68ed794
    • L
      Merge branch 'akpm' (patches from Andrew) · 2d338201
      Linus Torvalds 提交于
      Merge more updates from Andrew Morton:
       "147 patches, based on 7d2a07b7.
      
        Subsystems affected by this patch series: mm (memory-hotplug, rmap,
        ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
        alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
        checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
        selftests, ipc, and scripts"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
        scripts: check_extable: fix typo in user error message
        mm/workingset: correct kernel-doc notations
        ipc: replace costly bailout check in sysvipc_find_ipc()
        selftests/memfd: remove unused variable
        Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
        configs: remove the obsolete CONFIG_INPUT_POLLDEV
        prctl: allow to setup brk for et_dyn executables
        pid: cleanup the stale comment mentioning pidmap_init().
        kernel/fork.c: unexport get_{mm,task}_exe_file
        coredump: fix memleak in dump_vma_snapshot()
        fs/coredump.c: log if a core dump is aborted due to changed file permissions
        nilfs2: use refcount_dec_and_lock() to fix potential UAF
        nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
        nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
        nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
        nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
        nilfs2: fix NULL pointer in nilfs_##name##_attr_release
        nilfs2: fix memory leak in nilfs_sysfs_create_device_group
        trap: cleanup trap_init()
        init: move usermodehelper_enable() to populate_rootfs()
        ...
      2d338201
    • L
      Merge tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux · cc09ee80
      Linus Torvalds 提交于
      Pull SLUB updates from Vlastimil Babka:
       "SLUB: reduce irq disabled scope and make it RT compatible
      
        This series was initially inspired by Mel's pcplist local_lock
        rewrite, and also interest to better understand SLUB's locking and the
        new primitives and RT variants and implications. It makes SLUB
        compatible with PREEMPT_RT and generally more preemption-friendly,
        apparently without significant regressions, as the fast paths are not
        affected.
      
        The main changes to SLUB by this series:
      
         - irq disabling is now only done for minimum amount of time needed to
           protect the strict kmem_cache_cpu fields, and as part of spin lock,
           local lock and bit lock operations to make them irq-safe
      
         - SLUB is fully PREEMPT_RT compatible
      
        The series should now be sufficiently tested in both RT and !RT
        configs, mainly thanks to Mike.
      
        The RFC/v1 version also got basic performance screening by Mel that
        didn't show major regressions. Mike's testing with hackbench of v2 on
        !RT reported negligible differences [6]:
      
          virgin(ish) tip
          5.13.0.g60ab3ed-tip
                    7,320.67 msec task-clock                #    7.792 CPUs utilized            ( +-  0.31% )
                     221,215      context-switches          #    0.030 M/sec                    ( +-  3.97% )
                      16,234      cpu-migrations            #    0.002 M/sec                    ( +-  4.07% )
                      13,233      page-faults               #    0.002 M/sec                    ( +-  0.91% )
              27,592,205,252      cycles                    #    3.769 GHz                      ( +-  0.32% )
               8,309,495,040      instructions              #    0.30  insn per cycle           ( +-  0.37% )
               1,555,210,607      branches                  #  212.441 M/sec                    ( +-  0.42% )
                   5,484,209      branch-misses             #    0.35% of all branches          ( +-  2.13% )
      
                     0.93949 +- 0.00423 seconds time elapsed  ( +-  0.45% )
                     0.94608 +- 0.00384 seconds time elapsed  ( +-  0.41% ) (repeat)
                     0.94422 +- 0.00410 seconds time elapsed  ( +-  0.43% )
      
          5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
                    7,343.57 msec task-clock                #    7.776 CPUs utilized            ( +-  0.44% )
                     223,044      context-switches          #    0.030 M/sec                    ( +-  3.02% )
                      16,057      cpu-migrations            #    0.002 M/sec                    ( +-  4.03% )
                      13,164      page-faults               #    0.002 M/sec                    ( +-  0.97% )
              27,684,906,017      cycles                    #    3.770 GHz                      ( +-  0.45% )
               8,323,273,871      instructions              #    0.30  insn per cycle           ( +-  0.28% )
               1,556,106,680      branches                  #  211.901 M/sec                    ( +-  0.31% )
                   5,463,468      branch-misses             #    0.35% of all branches          ( +-  1.33% )
      
                     0.94440 +- 0.00352 seconds time elapsed  ( +-  0.37% )
                     0.94830 +- 0.00228 seconds time elapsed  ( +-  0.24% ) (repeat)
                     0.93813 +- 0.00440 seconds time elapsed  ( +-  0.47% ) (repeat)
      
        RT configs showed some throughput regressions, but that's expected
        tradeoff for the preemption improvements through the RT mutex. It
        didn't prevent the v2 to be incorporated to the 5.13 RT tree [7],
        leading to testing exposure and bugfixes.
      
        Before the series, SLUB is lockless in both allocation and free fast
        paths, but elsewhere, it's disabling irqs for considerable periods of
        time - especially in allocation slowpath and the bulk allocation,
        where IRQs are re-enabled only when a new page from the page allocator
        is needed, and the context allows blocking. The irq disabled sections
        can then include deactivate_slab() which walks a full freelist and
        frees the slab back to page allocator or unfreeze_partials() going
        through a list of percpu partial slabs. The RT tree currently has some
        patches mitigating these, but we can do much better in mainline too.
      
        Patches 1-6 are straightforward improvements or cleanups that could
        exist outside of this series too, but are prerequsities.
      
        Patches 7-9 are also preparatory code changes without functional
        changes, but not so useful without the rest of the series.
      
        Patch 10 simplifies the fast paths on systems with preemption, based
        on (hopefully correct) observation that the current loops to verify
        tid are unnecessary.
      
        Patches 11-20 focus on reducing irq disabled scope in the allocation
        slowpath:
      
         - patch 11 moves disabling of irqs into ___slab_alloc() from its
           callers, which are the allocation slowpath, and bulk allocation.
           Instead these callers only disable preemption to stabilize the cpu.
      
         - The following patches then gradually reduce the scope of disabled
           irqs in ___slab_alloc() and the functions called from there. As of
           patch 14, the re-enabling of irqs based on gfp flags before calling
           the page allocator is removed from allocate_slab(). As of patch 17,
           it's possible to reach the page allocator (in case of existing
           slabs depleted) without disabling and re-enabling irqs a single
           time.
      
        Pathces 21-26 reduce the scope of disabled irqs in functions related
        to unfreezing percpu partial slab.
      
        Patch 27 is preparatory. Patch 28 is adopted from the RT tree and
        converts the flushing of percpu slabs on all cpus from using IPI to
        workqueue, so that the processing isn't happening with irqs disabled
        in the IPI handler. The flushing is not performance critical so it
        should be acceptable.
      
        Patch 29 also comes from RT tree and makes object_map_lock RT
        compatible.
      
        Patch 30 make slab_lock irq-safe on RT where we cannot rely on having
        irq disabled from the list_lock spin lock usage.
      
        Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial()
        from cmpxchg loop to a short irq disabled section, which is used by
        all other code modifying the field. This addresses a theoretical race
        scenario pointed out by Jann, and makes the critical section safe wrt
        with RT local_lock semantics after the conversion in patch 35.
      
        Patch 32 changes preempt disable to migrate disable, so that the
        nested list_lock spinlock is safe to take on RT. Because
        migrate_disable() is a function call even on !RT, a small set of
        private wrappers is introduced to keep using the cheaper
        preempt_disable() on !PREEMPT_RT configurations. As of this patch,
        SLUB should be already compatible with RT's lock semantics.
      
        Finally, patch 33 changes irq disabled sections that protect
        kmem_cache_cpu fields in the slow paths, with a local lock. However on
        PREEMPT_RT it means the lockless fast paths can now preempt slow paths
        which don't expect that, so the local lock has to be taken also in the
        fast paths and they are no longer lockless. RT folks seem to not mind
        this tradeoff. The patch also updates the locking documentation in the
        file's comment"
      
      Mike Galbraith and Mel Gorman verified that their earlier testing
      observations still hold for the final series:
      
      Link: https://lore.kernel.org/lkml/89ba4f783114520c167cc915ba949ad2c04d6790.camel@gmx.de/
      Link: https://lore.kernel.org/lkml/20210907082010.GB3959@techsingularity.net/
      
      * tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux: (33 commits)
        mm, slub: convert kmem_cpu_slab protection to local_lock
        mm, slub: use migrate_disable() on PREEMPT_RT
        mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
        mm, slub: make slab_lock() disable irqs with PREEMPT_RT
        mm: slub: make object_map_lock a raw_spinlock_t
        mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
        mm, slab: split out the cpu offline variant of flush_slab()
        mm, slub: don't disable irqs in slub_cpu_dead()
        mm, slub: only disable irq with spin_lock in __unfreeze_partials()
        mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
        mm, slub: detach whole partial list at once in unfreeze_partials()
        mm, slub: discard slabs in unfreeze_partials() without irqs disabled
        mm, slub: move irq control into unfreeze_partials()
        mm, slub: call deactivate_slab() without disabling irqs
        mm, slub: make locking in deactivate_slab() irq-safe
        mm, slub: move reset of c->page and freelist out of deactivate_slab()
        mm, slub: stop disabling irqs around get_partial()
        mm, slub: check new pages with restored irqs
        mm, slub: validate slab from partial list or page allocator before making it cpu slab
        mm, slub: restore irqs around calling new_slab()
        ...
      cc09ee80
    • R
      scripts: check_extable: fix typo in user error message · b285437d
      Randy Dunlap 提交于
      Fix typo ("and" should be "an") in an error message.
      
      Link: https://lkml.kernel.org/r/20210727002943.29774-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b285437d
    • R
      mm/workingset: correct kernel-doc notations · 560a8705
      Randy Dunlap 提交于
      Use the documented kernel-doc format to prevent kernel-doc warnings.
      
      mm/workingset.c:256: warning: No description found for return value of 'workingset_eviction'
      mm/workingset.c:285: warning: Function parameter or member 'folio' not described in 'workingset_refault'
      mm/workingset.c:285: warning: Excess function parameter 'page' description in 'workingset_refault'
      
      Link: https://lkml.kernel.org/r/20210808203153.10678-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      560a8705
    • R
      ipc: replace costly bailout check in sysvipc_find_ipc() · 20401d10
      Rafael Aquini 提交于
      sysvipc_find_ipc() was left with a costly way to check if the offset
      position fed to it is bigger than the total number of IPC IDs in use.  So
      much so that the time it takes to iterate over /proc/sysvipc/* files grows
      exponentially for a custom benchmark that creates "N" SYSV shm segments
      and then times the read of /proc/sysvipc/shm (milliseconds):
      
          12 msecs to read   1024 segs from /proc/sysvipc/shm
          18 msecs to read   2048 segs from /proc/sysvipc/shm
          65 msecs to read   4096 segs from /proc/sysvipc/shm
         325 msecs to read   8192 segs from /proc/sysvipc/shm
        1303 msecs to read  16384 segs from /proc/sysvipc/shm
        5182 msecs to read  32768 segs from /proc/sysvipc/shm
      
      The root problem lies with the loop that computes the total amount of ids
      in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than
      "ids->in_use".  That is a quite inneficient way to get to the maximum
      index in the id lookup table, specially when that value is already
      provided by struct ipc_ids.max_idx.
      
      This patch follows up on the optimization introduced via commit
      15df03c8 ("sysvipc: make get_maxid O(1) again") and gets rid of the
      aforementioned costly loop replacing it by a simpler checkpoint based on
      ipc_get_maxidx() returned value, which allows for a smooth linear increase
      in time complexity for the same custom benchmark:
      
           2 msecs to read   1024 segs from /proc/sysvipc/shm
           2 msecs to read   2048 segs from /proc/sysvipc/shm
           4 msecs to read   4096 segs from /proc/sysvipc/shm
           9 msecs to read   8192 segs from /proc/sysvipc/shm
          19 msecs to read  16384 segs from /proc/sysvipc/shm
          39 msecs to read  32768 segs from /proc/sysvipc/shm
      
      Link: https://lkml.kernel.org/r/20210809203554.1562989-1-aquini@redhat.comSigned-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Waiman Long <llong@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20401d10
    • G
      selftests/memfd: remove unused variable · d42990f4
      Greg Thelen 提交于
      Commit 54402986 ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE
      seal") added an unused variable to mfd_assert_reopen_fd().
      
      Delete the unused variable.
      
      Link: https://lkml.kernel.org/r/20210702045509.1517643-1-gthelen@google.com
      Fixes: 54402986 ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d42990f4
    • L
      Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH · 6fe26259
      Lukas Bulwahn 提交于
      Commit 05a4a952 ("kernel/watchdog: split up config options") adds a
      new config HARDLOCKUP_DETECTOR, which selects the non-existing config
      HARDLOCKUP_DETECTOR_ARCH.
      
      Hence, ./scripts/checkkconfigsymbols.py warns:
      
      HARDLOCKUP_DETECTOR_ARCH Referencing files: lib/Kconfig.debug
      
      Simply drop selecting the non-existing HARDLOCKUP_DETECTOR_ARCH.
      
      Link: https://lkml.kernel.org/r/20210806115618.22088-1-lukas.bulwahn@gmail.com
      Fixes: 05a4a952 ("kernel/watchdog: split up config options")
      Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6fe26259