1. 14 9月, 2021 1 次提交
  2. 25 8月, 2021 1 次提交
    • W
      Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID" · 3eb9cdff
      Will Deacon 提交于
      This partially reverts commit 16c9afc7.
      
      Alex Bee reports a regression in 5.14 on their RK3328 SoC when
      configuring the PL330 DMA controller:
      
       | ------------[ cut here ]------------
       | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
       | Modules linked in: spi_rockchip(+) fuse
       | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
       | Hardware name: Pine64 Rock64 (DT)
       | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
       | pc : dma_map_resource+0x68/0xc0
       | lr : pl330_prep_slave_fifo+0x78/0xd0
      
      This appears to be because dma_map_resource() is being called for a
      physical address which does not correspond to a memory address yet does
      have a valid 'struct page' due to the way in which the vmemmap is
      constructed.
      
      Prior to 16c9afc7 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
      implementation of pfn_valid() called memblock_is_memory() to return
      'false' for such regions and the DMA mapping request would proceed.
      However, now that we are using the generic implementation where only the
      presence of the memory map entry is considered, we return 'true' and
      erroneously fail with DMA_MAPPING_ERROR because we identify the region
      as DRAM.
      
      Although fixing this in the DMA mapping code is arguably the right fix,
      it is a risky, cross-architecture change at this stage in the cycle. So
      just revert arm64 back to its old pfn_valid() implementation for v5.14.
      The change to the generic pfn_valid() code is preserved from the original
      patch, so as to avoid impacting other architectures.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: NAlex Bee <knaerzche@gmail.com>
      Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
      3eb9cdff
  3. 16 8月, 2021 1 次提交
  4. 03 8月, 2021 1 次提交
  5. 30 7月, 2021 1 次提交
    • A
      asm-generic: reverse GENERIC_{STRNCPY_FROM,STRNLEN}_USER symbols · e6226997
      Arnd Bergmann 提交于
      Most architectures do not need a custom implementation, and in most
      cases the generic implementation is preferred, so change the polariy
      on these Kconfig symbols to require architectures to select them when
      they provide their own version.
      
      The new name is CONFIG_ARCH_HAS_{STRNCPY_FROM,STRNLEN}_USER.
      
      The remaining architectures at the moment are: ia64, mips, parisc,
      um and xtensa. We should probably convert these as well, but
      I was not sure how far to take this series. Thomas Bogendoerfer
      had some concerns about converting mips but may still do some
      more detailed measurements to see which version is better.
      
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: Helge Deller <deller@gmx.de> # parisc
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      e6226997
  6. 13 7月, 2021 1 次提交
  7. 01 7月, 2021 4 次提交
  8. 30 6月, 2021 1 次提交
  9. 23 6月, 2021 1 次提交
  10. 15 6月, 2021 2 次提交
  11. 26 5月, 2021 2 次提交
  12. 06 5月, 2021 7 次提交
    • O
      arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE · ca6e51d5
      Oscar Salvador 提交于
      Enable arm64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-9-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca6e51d5
    • A
      mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK · 66f24fa7
      Anshuman Khandual 提交于
      ARCH_ENABLE_SPLIT_PMD_PTLOCKS has duplicate definitions on platforms
      that subscribe it.  Drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-6-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66f24fa7
    • A
      mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION · 1e866974
      Anshuman Khandual 提交于
      ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
      platforms that subscribe them.  Drop these reduntant definitions and
      instead just select them appropriately.
      
      [akpm@linux-foundation.org: s/x86_64/X86_64/, per Oscar]
      
      Link: https://lkml.kernel.org/r/1617259448-22529-5-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e866974
    • A
      mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] · 91024b3c
      Anshuman Khandual 提交于
      ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate
      definitions on platforms that subscribe them.  Instead, just make them
      generic options which can be selected on applicable platforms.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-4-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91024b3c
    • A
      mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS) · 855f9a8e
      Anshuman Khandual 提交于
      SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms
      that subscribe it.  Instead, just make it a generic option which can be
      selected on applicable platforms.
      
      Also rename it as ARCH_SUPPORTS_HUGETLBFS instead.  This reduces code
      duplication and makes it cleaner.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[riscv]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      855f9a8e
    • A
      mm: generalize ARCH_HAS_CACHE_LINE_SIZE · c2280be8
      Anshuman Khandual 提交于
      Patch series "mm: some config cleanups", v2.
      
      This series contains config cleanup patches which reduces code
      duplication across platforms and also improves maintainability.  There
      is no functional change intended with this series.
      
      This patch (of 6):
      
      ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms
      that subscribe it.  Instead, just make it a generic option which can be
      selected on applicable platforms.  This change reduces code duplication
      and makes it cleaner.
      
      Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com
      Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arc]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2280be8
    • A
      userfaultfd: add minor fault registration mode · 7677f7fd
      Axel Rasmussen 提交于
      Patch series "userfaultfd: add minor fault handling", v9.
      
      Overview
      ========
      
      This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
      When enabled (via the UFFDIO_API ioctl), this feature means that any
      hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
      get events for "minor" faults.  By "minor" fault, I mean the following
      situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
      memory).  One of the mappings is registered with userfaultfd (in minor
      mode), and the other is not.  Via the non-UFFD mapping, the underlying
      pages have already been allocated & filled with some contents.  The UFFD
      mapping has not yet been faulted in; when it is touched for the first
      time, this results in what I'm calling a "minor" fault.  As a concrete
      example, when working with hugetlbfs, we have huge_pte_none(), but
      find_lock_page() finds an existing page.
      
      We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
      is, userspace resolves the fault by either a) doing nothing if the
      contents are already correct, or b) updating the underlying contents using
      the second, non-UFFD mapping (via memcpy/memset or similar, or something
      fancier like RDMA, or etc...).  In either case, userspace issues
      UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
      correct, carry on setting up the mapping".
      
      Use Case
      ========
      
      Consider the use case of VM live migration (e.g. under QEMU/KVM):
      
      1. While a VM is still running, we copy the contents of its memory to a
         target machine. The pages are populated on the target by writing to the
         non-UFFD mapping, using the setup described above. The VM is still running
         (and therefore its memory is likely changing), so this may be repeated
         several times, until we decide the target is "up to date enough".
      
      2. We pause the VM on the source, and start executing on the target machine.
         During this gap, the VM's user(s) will *see* a pause, so it is desirable to
         minimize this window.
      
      3. Between the last time any page was copied from the source to the target, and
         when the VM was paused, the contents of that page may have changed - and
         therefore the copy we have on the target machine is out of date. Although we
         can keep track of which pages are out of date, for VMs with large amounts of
         memory, it is "slow" to transfer this information to the target machine. We
         want to resume execution before such a transfer would complete.
      
      4. So, the guest begins executing on the target machine. The first time it
         touches its memory (via the UFFD-registered mapping), userspace wants to
         intercept this fault. Userspace checks whether or not the page is up to date,
         and if not, copies the updated page from the source machine, via the non-UFFD
         mapping. Finally, whether a copy was performed or not, userspace issues a
         UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
         are correct, carry on setting up the mapping".
      
      We don't have to do all of the final updates on-demand. The userfaultfd manager
      can, in the background, also copy over updated pages once it receives the map of
      which pages are up-to-date or not.
      
      Interaction with Existing APIs
      ==============================
      
      Because this is a feature, a registered VMA could potentially receive both
      missing and minor faults.  I spent some time thinking through how the
      existing API interacts with the new feature:
      
      UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
      allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:
      
      - For non-shared memory or shmem, -EINVAL is returned.
      - For hugetlb, -EFAULT is returned.
      
      UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
      Without modifications, the existing codepath assumes a new page needs to
      be allocated.  This is okay, since userspace must have a second
      non-UFFD-registered mapping anyway, thus there isn't much reason to want
      to use these in any case (just memcpy or memset or similar).
      
      - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
      - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
        in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
      - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
        -ENOENT in that case (regardless of the kind of fault).
      
      Future Work
      ===========
      
      This series only supports hugetlbfs.  I have a second series in flight to
      support shmem as well, extending the functionality.  This series is more
      mature than the shmem support at this point, and the functionality works
      fully on hugetlbfs, so this series can be merged first and then shmem
      support will follow.
      
      This patch (of 6):
      
      This feature allows userspace to intercept "minor" faults.  By "minor"
      faults, I mean the following situation:
      
      Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
      mappings is registered with userfaultfd (in minor mode), and the other is
      not.  Via the non-UFFD mapping, the underlying pages have already been
      allocated & filled with some contents.  The UFFD mapping has not yet been
      faulted in; when it is touched for the first time, this results in what
      I'm calling a "minor" fault.  As a concrete example, when working with
      hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
      page.
      
      This commit adds the new registration mode, and sets the relevant flag on
      the VMAs being registered.  In the hugetlb fault path, if we find that we
      have huge_pte_none(), but find_lock_page() does indeed find an existing
      page, then we have a "minor" fault, and if the VMA has the userfaultfd
      registration flag, we call into userfaultfd to handle it.
      
      This is implemented as a new registration mode, instead of an API feature.
      This is because the alternative implementation has significant drawbacks
      [1].
      
      However, doing it this was requires we allocate a VM_* flag for the new
      registration mode.  On 32-bit systems, there are no unused bits, so this
      feature is only supported on architectures with
      CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
      MINOR mode on 32-bit architectures, we return -EINVAL.
      
      [1] https://lore.kernel.org/patchwork/patch/1380226/
      
      [peterx@redhat.com: fix minor fault page leak]
        Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7677f7fd
  13. 01 5月, 2021 1 次提交
  14. 25 4月, 2021 1 次提交
  15. 23 4月, 2021 2 次提交
  16. 12 4月, 2021 1 次提交
  17. 09 4月, 2021 2 次提交
  18. 08 4月, 2021 1 次提交
  19. 29 3月, 2021 2 次提交
  20. 26 3月, 2021 1 次提交
  21. 25 3月, 2021 2 次提交
  22. 18 3月, 2021 1 次提交
  23. 16 3月, 2021 1 次提交
    • Y
      ARM64: enable GENERIC_FIND_FIRST_BIT · 98c5ec77
      Yury Norov 提交于
      ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
      enable it in a config. It leads to using find_next_bit() which is less
      efficient:
      
      0000000000000000 <find_first_bit>:
         0:	aa0003e4 	mov	x4, x0
         4:	aa0103e0 	mov	x0, x1
         8:	b4000181 	cbz	x1, 38 <find_first_bit+0x38>
         c:	f9400083 	ldr	x3, [x4]
        10:	d2800802 	mov	x2, #0x40                  	// #64
        14:	91002084 	add	x4, x4, #0x8
        18:	b40000c3 	cbz	x3, 30 <find_first_bit+0x30>
        1c:	14000008 	b	3c <find_first_bit+0x3c>
        20:	f8408483 	ldr	x3, [x4], #8
        24:	91010045 	add	x5, x2, #0x40
        28:	b50000c3 	cbnz	x3, 40 <find_first_bit+0x40>
        2c:	aa0503e2 	mov	x2, x5
        30:	eb02001f 	cmp	x0, x2
        34:	54ffff68 	b.hi	20 <find_first_bit+0x20>  // b.pmore
        38:	d65f03c0 	ret
        3c:	d2800002 	mov	x2, #0x0                   	// #0
        40:	dac00063 	rbit	x3, x3
        44:	dac01063 	clz	x3, x3
        48:	8b020062 	add	x2, x3, x2
        4c:	eb02001f 	cmp	x0, x2
        50:	9a829000 	csel	x0, x0, x2, ls  // ls = plast
        54:	d65f03c0 	ret
      
        ...
      
      0000000000000118 <_find_next_bit.constprop.1>:
       118:	eb02007f 	cmp	x3, x2
       11c:	540002e2 	b.cs	178 <_find_next_bit.constprop.1+0x60>  // b.hs, b.nlast
       120:	d346fc66 	lsr	x6, x3, #6
       124:	f8667805 	ldr	x5, [x0, x6, lsl #3]
       128:	b4000061 	cbz	x1, 134 <_find_next_bit.constprop.1+0x1c>
       12c:	f8667826 	ldr	x6, [x1, x6, lsl #3]
       130:	8a0600a5 	and	x5, x5, x6
       134:	ca0400a6 	eor	x6, x5, x4
       138:	92800005 	mov	x5, #0xffffffffffffffff    	// #-1
       13c:	9ac320a5 	lsl	x5, x5, x3
       140:	927ae463 	and	x3, x3, #0xffffffffffffffc0
       144:	ea0600a5 	ands	x5, x5, x6
       148:	54000120 	b.eq	16c <_find_next_bit.constprop.1+0x54>  // b.none
       14c:	1400000e 	b	184 <_find_next_bit.constprop.1+0x6c>
       150:	d346fc66 	lsr	x6, x3, #6
       154:	f8667805 	ldr	x5, [x0, x6, lsl #3]
       158:	b4000061 	cbz	x1, 164 <_find_next_bit.constprop.1+0x4c>
       15c:	f8667826 	ldr	x6, [x1, x6, lsl #3]
       160:	8a0600a5 	and	x5, x5, x6
       164:	eb05009f 	cmp	x4, x5
       168:	540000c1 	b.ne	180 <_find_next_bit.constprop.1+0x68>  // b.any
       16c:	91010063 	add	x3, x3, #0x40
       170:	eb03005f 	cmp	x2, x3
       174:	54fffee8 	b.hi	150 <_find_next_bit.constprop.1+0x38>  // b.pmore
       178:	aa0203e0 	mov	x0, x2
       17c:	d65f03c0 	ret
       180:	ca050085 	eor	x5, x4, x5
       184:	dac000a5 	rbit	x5, x5
       188:	dac010a5 	clz	x5, x5
       18c:	8b0300a3 	add	x3, x5, x3
       190:	eb03005f 	cmp	x2, x3
       194:	9a839042 	csel	x2, x2, x3, ls  // ls = plast
       198:	aa0203e0 	mov	x0, x2
       19c:	d65f03c0 	ret
      
       ...
      
      0000000000000238 <find_next_bit>:
       238:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
       23c:	aa0203e3 	mov	x3, x2
       240:	d2800004 	mov	x4, #0x0                   	// #0
       244:	aa0103e2 	mov	x2, x1
       248:	910003fd 	mov	x29, sp
       24c:	d2800001 	mov	x1, #0x0                   	// #0
       250:	97ffffb2 	bl	118 <_find_next_bit.constprop.1>
       254:	a8c17bfd 	ldp	x29, x30, [sp], #16
       258:	d65f03c0 	ret
      
      Enabling find_{first,next}_bit() would also benefit for_each_{set,clear}_bit().
      On A-53 find_first_bit() is almost twice faster than find_next_bit(), according
      to lib/find_bit_benchmark (thanks to Alexey for testing):
      
      GENERIC_FIND_FIRST_BIT=n:
      [7126084.948181] find_first_bit:               47389224 ns,  16357 iterations
      [7126085.032315] find_first_bit:               19048193 ns,    655 iterations
      
      GENERIC_FIND_FIRST_BIT=y:
      [   84.158068] find_first_bit:               27193319 ns,  16406 iterations
      [   84.233005] find_first_bit:               11082437 ns,    656 iterations
      
      GENERIC_FIND_FIRST_BIT=n bloats the kernel despite that it disables generation
      of find_{first,next}_bit():
      
              yury:linux$ scripts/bloat-o-meter vmlinux vmlinux.ffb
              add/remove: 4/1 grow/shrink: 19/251 up/down: 564/-1692 (-1128)
              ...
      
      Overall, GENERIC_FIND_FIRST_BIT=n is harmful both in terms of performance and
      code size, and it's better to have GENERIC_FIND_FIRST_BIT enabled.
      Tested-by: NAlexey Klimov <aklimov@redhat.com>
      Signed-off-by: NYury Norov <yury.norov@gmail.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210225135700.1381396-2-yury.norov@gmail.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      98c5ec77
  24. 09 3月, 2021 1 次提交
  25. 08 3月, 2021 1 次提交
    • A
      arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER · 79cc2ed5
      Anshuman Khandual 提交于
      Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets
      reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K
      page size configurations. This is problematic which throws up the following
      warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds
      MAX_ORDER.
      
      WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70
      Modules linked in:
      CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-00005-g0221e3101a1 #237
      Hardware name: linux,dummy-virt (DT)
      pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
      pc : __fragmentation_index+0x58/0x70
      lr : fragmentation_index+0x88/0xa8
      sp : ffff800016ccfc00
      x29: ffff800016ccfc00 x28: 0000000000000000
      x27: ffff800011fd4000 x26: 0000000000000002
      x25: ffff800016ccfda0 x24: 0000000000000002
      x23: 0000000000000640 x22: ffff0005ffcb5b18
      x21: 0000000000000002 x20: 000000000000000d
      x19: ffff0005ffcb3980 x18: 0000000000000004
      x17: 0000000000000001 x16: 0000000000000019
      x15: ffff800011ca7fb8 x14: 00000000000002b3
      x13: 0000000000000000 x12: 00000000000005e0
      x11: 0000000000000003 x10: 0000000000000080
      x9 : ffff800011c93948 x8 : 0000000000000000
      x7 : 0000000000000000 x6 : 0000000000007000
      x5 : 0000000000007944 x4 : 0000000000000032
      x3 : 000000000000001c x2 : 000000000000000b
      x1 : ffff800016ccfc10 x0 : 000000000000000d
      Call trace:
      __fragmentation_index+0x58/0x70
      compaction_suitable+0x58/0x78
      wakeup_kcompactd+0x8c/0xd8
      balance_pgdat+0x570/0x5d0
      kswapd+0x1e0/0x388
      kthread+0x154/0x158
      ret_from_fork+0x10/0x30
      
      This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or
      without THP on 16K and 64K page size configurations, making sure that the
      HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/1614597914-28565-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      79cc2ed5