1. 25 8月, 2021 1 次提交
    • W
      Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID" · 3eb9cdff
      Will Deacon 提交于
      This partially reverts commit 16c9afc7.
      
      Alex Bee reports a regression in 5.14 on their RK3328 SoC when
      configuring the PL330 DMA controller:
      
       | ------------[ cut here ]------------
       | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
       | Modules linked in: spi_rockchip(+) fuse
       | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
       | Hardware name: Pine64 Rock64 (DT)
       | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
       | pc : dma_map_resource+0x68/0xc0
       | lr : pl330_prep_slave_fifo+0x78/0xd0
      
      This appears to be because dma_map_resource() is being called for a
      physical address which does not correspond to a memory address yet does
      have a valid 'struct page' due to the way in which the vmemmap is
      constructed.
      
      Prior to 16c9afc7 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
      implementation of pfn_valid() called memblock_is_memory() to return
      'false' for such regions and the DMA mapping request would proceed.
      However, now that we are using the generic implementation where only the
      presence of the memory map entry is considered, we return 'true' and
      erroneously fail with DMA_MAPPING_ERROR because we identify the region
      as DRAM.
      
      Although fixing this in the DMA mapping code is arguably the right fix,
      it is a risky, cross-architecture change at this stage in the cycle. So
      just revert arm64 back to its old pfn_valid() implementation for v5.14.
      The change to the generic pfn_valid() code is preserved from the original
      patch, so as to avoid impacting other architectures.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: NAlex Bee <knaerzche@gmail.com>
      Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
      3eb9cdff
  2. 01 7月, 2021 3 次提交
  3. 26 5月, 2021 1 次提交
  4. 14 5月, 2021 1 次提交
  5. 01 5月, 2021 2 次提交
  6. 23 4月, 2021 1 次提交
  7. 19 3月, 2021 1 次提交
    • Q
      KVM: arm64: Prepare the creation of s1 mappings at EL2 · f320bc74
      Quentin Perret 提交于
      When memory protection is enabled, the EL2 code needs the ability to
      create and manage its own page-table. To do so, introduce a new set of
      hypercalls to bootstrap a memory management system at EL2.
      
      This leads to the following boot flow in nVHE Protected mode:
      
       1. the host allocates memory for the hypervisor very early on, using
          the memblock API;
      
       2. the host creates a set of stage 1 page-table for EL2, installs the
          EL2 vectors, and issues the __pkvm_init hypercall;
      
       3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
          and stores it in the memory pool provided by the host;
      
       4. the hypervisor then extends its stage 1 mappings to include a
          vmemmap in the EL2 VA space, hence allowing to use the buddy
          allocator introduced in a previous patch;
      
       5. the hypervisor jumps back in the idmap page, switches from the
          host-provided page-table to the new one, and wraps up its
          initialization by enabling the new allocator, before returning to
          the host.
      
       6. the host can free the now unused page-table created for EL2, and
          will now need to issue hypercalls to make changes to the EL2 stage 1
          mappings instead of modifying them directly.
      
      Note that for the sake of simplifying the review, this patch focuses on
      the hypervisor side of things. In other words, this only implements the
      new hypercalls, but does not make use of them from the host yet. The
      host-side changes will follow in a subsequent patch.
      
      Credits to Will for __pkvm_init_switch_pgd.
      Acked-by: NWill Deacon <will@kernel.org>
      Co-authored-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NQuentin Perret <qperret@google.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
      f320bc74
  8. 09 3月, 2021 2 次提交
    • A
      arm64/mm: Reorganize pfn_valid() · 093bbe21
      Anshuman Khandual 提交于
      There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
      when CONFIG_SPARSEMEM is enabled. This can be optimized if memory section
      is fetched earlier. This replaces the open coded PFN and ADDR conversion
      with PFN_PHYS() and PHYS_PFN() helpers. While there, also add a comment.
      This does not cause any functional change.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/1614921898-4099-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      093bbe21
    • A
      arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory · eeb0753b
      Anshuman Khandual 提交于
      pfn_valid() validates a pfn but basically it checks for a valid struct page
      backing for that pfn. It should always return positive for memory ranges
      backed with struct page mapping. But currently pfn_valid() fails for all
      ZONE_DEVICE based memory types even though they have struct page mapping.
      
      pfn_valid() asserts that there is a memblock entry for a given pfn without
      MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
      that they do not have memblock entries. Hence memblock_is_map_memory() will
      invariably fail via memblock_search() for a ZONE_DEVICE based address. This
      eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
      to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
      into the system via memremap_pages() called from a driver, their respective
      memory sections will not have SECTION_IS_EARLY set.
      
      Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
      regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
      for firmware reserved memory regions. memblock_is_map_memory() can just be
      skipped as its always going to be positive and that will be an optimization
      for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
      hotplugged memory too will not have SECTION_IS_EARLY set for their sections
      
      Skipping memblock_is_map_memory() for all non early memory sections would
      fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
      performance for normal hotplug memory as well.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Fixes: 73b20c84 ("arm64: mm: implement pte_devmap support")
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/1614921898-4099-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      eeb0753b
  9. 15 1月, 2021 1 次提交
  10. 13 1月, 2021 1 次提交
    • C
      arm64: Remove arm64_dma32_phys_limit and its uses · d78050ee
      Catalin Marinas 提交于
      With the introduction of a dynamic ZONE_DMA range based on DT or IORT
      information, there's no need for CMA allocations from the wider
      ZONE_DMA32 since on most platforms ZONE_DMA will cover the 32-bit
      addressable range. Remove the arm64_dma32_phys_limit and set
      arm64_dma_phys_limit to cover the smallest DMA range required on the
      platform. CMA allocation and crashkernel reservation now go in the
      dynamically sized ZONE_DMA, allowing correct functionality on RPi4.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzjulienne@suse.de>
      Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> # On RPi4B
      d78050ee
  11. 04 1月, 2021 1 次提交
  12. 16 12月, 2020 2 次提交
    • M
      arm, arm64: move free_unused_memmap() to generic mm · 4f5b0c17
      Mike Rapoport 提交于
      ARM and ARM64 free unused parts of the memory map just before the
      initialization of the page allocator. To allow holes in the memory map both
      architectures overload pfn_valid() and define HAVE_ARCH_PFN_VALID.
      
      Allowing holes in the memory map for FLATMEM may be useful for small
      machines, such as ARC and m68k and will enable those architectures to cease
      using DISCONTIGMEM and still support more than one memory bank.
      
      Move the functions that free unused memory map to generic mm and enable
      them in case HAVE_ARCH_PFN_VALID=y.
      
      Link: https://lkml.kernel.org/r/20201101170454.9567-10-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Meelis Roos <mroos@linux.ee>
      Cc: Michael Schmitz <schmitzmic@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f5b0c17
    • M
      arm64: Warn the user when a small VA_BITS value wastes memory · 31f80a4e
      Marc Zyngier 提交于
      The memblock code ignores any memory that doesn't fit in the
      linear mapping. In order to preserve the distance between two physical
      memory locations and their mappings in the linear map, any hole between
      two memory regions occupies the same space in the linear map.
      
      On most systems, this is hardly a problem (the memory banks are close
      together, and VA_BITS represents a large space compared to the available
      memory *and* the potential gaps).
      
      On NUMA systems, things are quite different: the gaps between the
      memory nodes can be pretty large compared to the memory size itself,
      and the range from memblock_start_of_DRAM() to memblock_end_of_DRAM()
      can exceed the space described by VA_BITS.
      
      Unfortunately, we're not very good at making this obvious to the user,
      and on a D05 system (two sockets and 4 nodes with 64GB each)
      accidentally configured with 39bit VA, we display something like this:
      
      [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe100-0x1ffbffffff]
      [    0.000000] NUMA: NODE_DATA [mem 0x2febfc1100-0x2febfc2fff]
      [    0.000000] NUMA: Initmem setup node 2 [<memory-less node>]
      [    0.000000] NUMA: NODE_DATA [mem 0x2febfbf200-0x2febfc10ff]
      [    0.000000] NUMA: NODE_DATA(2) on node 1
      [    0.000000] NUMA: Initmem setup node 3 [<memory-less node>]
      [    0.000000] NUMA: NODE_DATA [mem 0x2febfbd300-0x2febfbf1ff]
      [    0.000000] NUMA: NODE_DATA(3) on node 1
      
      which isn't very explicit, and doesn't tell the user why 128GB
      have suddently disappeared.
      
      Let's add a warning message telling the user that memory has been
      truncated, and offer a potential solution (bumping VA_BITS up).
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20201215152918.1511108-1-maz@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      31f80a4e
  13. 20 11月, 2020 5 次提交
  14. 18 11月, 2020 1 次提交
    • A
      arm64: omit [_text, _stext) from permanent kernel mapping · e2a073dd
      Ard Biesheuvel 提交于
      In a previous patch, we increased the size of the EFI PE/COFF header
      to 64 KB, which resulted in the _stext symbol to appear at a fixed
      offset of 64 KB into the image.
      
      Since 64 KB is also the largest page size we support, this completely
      removes the need to map the first 64 KB of the kernel image, given that
      it only contains the arm64 Image header and the EFI header, neither of
      which we ever access again after booting the kernel. More importantly,
      we should avoid an executable mapping of non-executable and not entirely
      predictable data, to deal with the unlikely event that we inadvertently
      emitted something that looks like an opcode that could be used as a
      gadget for speculative execution.
      
      So let's limit the kernel mapping of .text to the [_stext, _etext)
      region, which matches the view of generic code (such as kallsyms) when
      it reasons about the boundaries of the kernel's .text section.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20201117124729.12642-2-ardb@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      e2a073dd
  15. 12 11月, 2020 1 次提交
  16. 11 11月, 2020 1 次提交
    • A
      arm64: mm: account for hotplug memory when randomizing the linear region · 97d6786e
      Ard Biesheuvel 提交于
      As a hardening measure, we currently randomize the placement of
      physical memory inside the linear region when KASLR is in effect.
      Since the random offset at which to place the available physical
      memory inside the linear region is chosen early at boot, it is
      based on the memblock description of memory, which does not cover
      hotplug memory. The consequence of this is that the randomization
      offset may be chosen such that any hotplugged memory located above
      memblock_end_of_DRAM() that appears later is pushed off the end of
      the linear region, where it cannot be accessed.
      
      So let's limit this randomization of the linear region to ensure
      that this can no longer happen, by using the CPU's addressable PA
      range instead. As it is guaranteed that no hotpluggable memory will
      appear that falls outside of that range, we can safely put this PA
      range sized window anywhere in the linear region.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/20201014081857.3288-1-ardb@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      97d6786e
  17. 10 11月, 2020 2 次提交
    • A
      arm64: mm: make vmemmap region a projection of the linear region · 8c96400d
      Ard Biesheuvel 提交于
      Now that we have reverted the introduction of the vmemmap struct page
      pointer and the separate physvirt_offset, we can simplify things further,
      and place the vmemmap region in the VA space in such a way that virtual
      to page translations and vice versa can be implemented using a single
      arithmetic shift.
      
      One happy coincidence resulting from this is that the 48-bit/4k and
      52-bit/64k configurations (which are assumed to be the two most
      prevalent) end up with the same placement of the vmemmap region. In
      a subsequent patch, we will take advantage of this, and unify the
      memory maps even more.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Link: https://lore.kernel.org/r/20201008153602.9467-4-ardb@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8c96400d
    • A
      arm64: mm: extend linear region for 52-bit VA configurations · f4693c27
      Ard Biesheuvel 提交于
      For historical reasons, the arm64 kernel VA space is configured as two
      equally sized halves, i.e., on a 48-bit VA build, the VA space is split
      into a 47-bit vmalloc region and a 47-bit linear region.
      
      When support for 52-bit virtual addressing was added, this equal split
      was kept, resulting in a substantial waste of virtual address space in
      the linear region:
      
                                 48-bit VA                     52-bit VA
        0xffff_ffff_ffff_ffff +-------------+               +-------------+
                              |   vmalloc   |               |   vmalloc   |
        0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+
                              |   linear    |               :             :
        0xffff_0000_0000_0000 +-------------+               :             :
                              :             :               :             :
                              :             :               :             :
                              :             :               :             :
                              :             :               :  currently  :
                              :  unusable   :               :             :
                              :             :               :   unused    :
                              :     by      :               :             :
                              :             :               :             :
                              :  hardware   :               :             :
                              :             :               :             :
        0xfff8_0000_0000_0000 :             : _PAGE_END(52) +-------------+
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :  unusable   :               |             |
                              :             :               |   linear    |
                              :     by      :               |             |
                              :             :               |   region    |
                              :  hardware   :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
                              :             :               |             |
        0xfff0_0000_0000_0000 +-------------+  PAGE_OFFSET  +-------------+
      
      As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc
      space (as before), to ensure that a single 64k granule kernel image can
      support any 64k granule capable system, regardless of whether it supports
      the 52-bit virtual addressing extension. However, due to the fact that
      the VA space is still split in equal halves, the linear region is only
      2^51 bytes in size, wasting almost half of the 52-bit VA space.
      
      Let's fix this, by abandoning the equal split, and simply assigning all
      VA space outside of the vmalloc region to the linear region.
      
      The KASAN shadow region is reconfigured so that it ends at the start of
      the vmalloc region, and grows downwards. That way, the arrangement of
      the vmalloc space (which contains kernel mappings, modules, BPF region,
      the vmemmap array etc) is identical between non-KASAN and KASAN builds,
      which aids debugging.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f4693c27
  18. 15 10月, 2020 1 次提交
    • A
      arm64: mm: use single quantity to represent the PA to VA translation · 7bc1a0f9
      Ard Biesheuvel 提交于
      On arm64, the global variable memstart_addr represents the physical
      address of PAGE_OFFSET, and so physical to virtual translations or
      vice versa used to come down to simple additions or subtractions
      involving the values of PAGE_OFFSET and memstart_addr.
      
      When support for 52-bit virtual addressing was introduced, we had to
      deal with PAGE_OFFSET potentially being outside of the region that
      can be covered by the virtual range (as the 52-bit VA capable build
      needs to be able to run on systems that are only 48-bit VA capable),
      and for this reason, another translation was introduced, and recorded
      in the global variable physvirt_offset.
      
      However, if we go back to the original definition of memstart_addr,
      i.e., the physical address of PAGE_OFFSET, it turns out that there is
      no need for two separate translations: instead, we can simply subtract
      the size of the unaddressable VA space from memstart_addr to make the
      available physical memory appear in the 48-bit addressable VA region.
      
      This simplifies things, but also fixes a bug on KASLR builds, which
      may update memstart_addr later on in arm64_memblock_init(), but fails
      to update vmemmap and physvirt_offset accordingly.
      
      Fixes: 5383cc6e ("arm64: mm: Introduce vabits_actual")
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Link: https://lore.kernel.org/r/20201008153602.9467-2-ardb@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>
      7bc1a0f9
  19. 14 10月, 2020 1 次提交
    • M
      arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() · c9118e6c
      Mike Rapoport 提交于
      There are several occurrences of the following pattern:
      
      	for_each_memblock(memory, reg) {
      		start_pfn = memblock_region_memory_base_pfn(reg);
      		end_pfn = memblock_region_memory_end_pfn(reg);
      
      		/* do something with start_pfn and end_pfn */
      	}
      
      Rather than iterate over all memblock.memory regions and each time query
      for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
      simpler and clearer code.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9118e6c
  20. 06 10月, 2020 1 次提交
  21. 01 9月, 2020 1 次提交
  22. 08 8月, 2020 1 次提交
  23. 15 7月, 2020 1 次提交
    • A
      arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs · abb7962a
      Anshuman Khandual 提交于
      Currently 'hugetlb_cma=' command line argument does not create CMA area on
      ARM64_16K_PAGES and ARM64_64K_PAGES based platforms. Instead, it just ends
      up with the following warning message. Reason being, hugetlb_cma_reserve()
      never gets called for these huge page sizes.
      
      [   64.255669] hugetlb_cma: the option isn't supported by current arch
      
      This enables CMA areas reservation on ARM64_16K_PAGES and ARM64_64K_PAGES
      configs by defining an unified arm64_hugetlb_cma_reseve() that is wrapped
      in CONFIG_CMA. Call site for arm64_hugetlb_cma_reserve() is also protected
      as <asm/hugetlb.h> is conditionally included and hence cannot contain stub
      for the inverse config i.e !(CONFIG_HUGETLB_PAGE && CONFIG_CMA).
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Barry Song <song.bao.hua@hisilicon.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Link: https://lore.kernel.org/r/1593578521-24672-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      abb7962a
  24. 02 7月, 2020 1 次提交
  25. 18 6月, 2020 1 次提交
  26. 04 6月, 2020 2 次提交
    • M
      arm64: simplify detection of memory zone boundaries for UMA configs · 584cb13d
      Mike Rapoport 提交于
      The free_area_init() function only requires the definition of maximal PFN
      for each of the supported zone rater than calculation of actual zone sizes
      and the sizes of the holes between the zones.
      
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
      available to all architectures.
      
      Using this function instead of free_area_init_node() simplifies the zone
      detection.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-9-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      584cb13d
    • M
      mm: use free_area_init() instead of free_area_init_nodes() · 9691a071
      Mike Rapoport 提交于
      free_area_init() has effectively became a wrapper for
      free_area_init_nodes() and there is no point of keeping it.  Still
      free_area_init() name is shorter and more general as it does not imply
      necessity to initialize multiple nodes.
      
      Rename free_area_init_nodes() to free_area_init(), update the callers and
      drop old version of free_area_init().
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9691a071
  27. 01 5月, 2020 1 次提交
  28. 11 4月, 2020 1 次提交
    • R
      mm: hugetlb: optionally allocate gigantic hugepages using cma · cf11e85f
      Roman Gushchin 提交于
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the run-time allocation of gigantic pages.
      
      However it actually works only at early stages of the system loading,
      when the majority of memory is free.  After some time the memory gets
      fragmented by non-movable pages, so the chances to find a contiguous 1GB
      block are getting close to zero.  Even dropping caches manually doesn't
      help a lot.
      
      At large scale rebooting servers in order to allocate gigantic hugepages
      is quite expensive and complex.  At the same time keeping some constant
      percentage of memory in reserved hugepages even if the workload isn't
      using it is a big waste: not all workloads can benefit from using 1 GB
      pages.
      
      The following solution can solve the problem:
      1) On boot time a dedicated cma area* is reserved. The size is passed
         as a kernel argument.
      2) Run-time allocations of gigantic hugepages are performed using the
         cma allocator and the dedicated cma area
      
      In this case gigantic hugepages can be allocated successfully with a
      high probability, however the memory isn't completely wasted if nobody
      is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
      etc.
      
      * On a multi-node machine a per-node cma area is allocated on each node.
        Following gigantic hugetlb allocation are using the first available
        numa node if the mask isn't specified by a user.
      
      Usage:
      1) configure the kernel to allocate a cma area for hugetlb allocations:
         pass hugetlb_cma=10G as a kernel argument
      
      2) allocate hugetlb pages as usual, e.g.
         echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
      
      If the option isn't enabled or the allocation of the cma area failed,
      the current behavior of the system is preserved.
      
      x86 and arm-64 are covered by this patch, other architectures can be
      trivially added later.
      
      The patch contains clean-ups and fixes proposed and implemented by Aslan
      Bakirov and Randy Dunlap.  It also contains ideas and suggestions
      proposed by Rik van Riel, Michal Hocko and Mike Kravetz.  Thanks!
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NAndreas Schaufler <andreas.schaufler@gmx.de>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Cc: Aslan Bakirov <aslan@fb.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf11e85f
  29. 04 12月, 2019 1 次提交
    • W
      arm64: mm: Fix initialisation of DMA zones on non-NUMA systems · 93b90414
      Will Deacon 提交于
      John reports that the recently merged commit 1a8e1cef ("arm64: use
      both ZONE_DMA and ZONE_DMA32") breaks the boot on his DB845C board:
      
        | Booting Linux on physical CPU 0x0000000000 [0x517f803c]
        | Linux version 5.4.0-mainline-10675-g957a03b9e38f
        | Machine model: Thundercomm Dragonboard 845c
        | [...]
        | Built 1 zonelists, mobility grouping on.  Total pages: -188245
        | Kernel command line: earlycon
        | firmware_class.path=/vendor/firmware/ androidboot.hardware=db845c
        | init=/init androidboot.boot_devices=soc/1d84000.ufshc
        | printk.devkmsg=on buildvariant=userdebug root=/dev/sda2
        | androidboot.bootdevice=1d84000.ufshc androidboot.serialno=c4e1189c
        | androidboot.baseband=sda
        | msm_drm.dsi_display0=dsi_lt9611_1080_video_display:
        | androidboot.slot_suffix=_a skip_initramfs rootwait ro init=/init
        |
        | <hangs indefinitely here>
      
      This is because, when CONFIG_NUMA=n, zone_sizes_init() fails to handle
      memblocks that fall entirely within the ZONE_DMA region and erroneously ends up
      trying to add a negatively-sized region into the following ZONE_DMA32, which is
      later interpreted as a large unsigned region by the core MM code.
      
      Rework the non-NUMA implementation of zone_sizes_init() so that the start
      address of the memblock being processed is adjusted according to the end of the
      previous zone, which is then range-checked before updating the hole information
      of subsequent zones.
      
      Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Link: https://lore.kernel.org/lkml/CALAqxLVVcsmFrDKLRGRq7GewcW405yTOxG=KR3csVzQ6bXutkA@mail.gmail.com
      Fixes: 1a8e1cef ("arm64: use both ZONE_DMA and ZONE_DMA32")
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      93b90414