1. 12 12月, 2018 1 次提交
    • R
      arm64: Add memory hotplug support · 4ab21506
      Robin Murphy 提交于
      Wire up the basic support for hot-adding memory. Since memory hotplug
      is fairly tightly coupled to sparsemem, we tweak pfn_valid() to also
      cross-check the presence of a section in the manner of the generic
      implementation, before falling back to memblock to check for no-map
      regions within a present section as before. By having arch_add_memory(()
      create the linear mapping first, this then makes everything work in the
      way that __add_section() expects.
      
      We expect hotplug to be ACPI-driven, so the swapper_pg_dir updates
      should be safe from races by virtue of the global device hotplug lock.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4ab21506
  2. 11 12月, 2018 2 次提交
    • W
      arm64: mm: EXPORT vabits_user to modules · 4a1daf29
      Will Deacon 提交于
      TASK_SIZE is defined using the vabits_user variable for 64-bit tasks,
      so ensure that this variable is exported to modules to avoid the
      following build breakage with allmodconfig:
      
       | ERROR: "vabits_user" [lib/test_user_copy.ko] undefined!
       | ERROR: "vabits_user" [drivers/misc/lkdtm/lkdtm.ko] undefined!
       | ERROR: "vabits_user" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4a1daf29
    • S
      arm64: mm: introduce 52-bit userspace support · 67e7fdfc
      Steve Capper 提交于
      On arm64 there is optional support for a 52-bit virtual address space.
      To exploit this one has to be running with a 64KB page size and be
      running on hardware that supports this.
      
      For an arm64 kernel supporting a 48 bit VA with a 64KB page size,
      some changes are needed to support a 52-bit userspace:
       * TCR_EL1.T0SZ needs to be 12 instead of 16,
       * TASK_SIZE needs to reflect the new size.
      
      This patch implements the above when the support for 52-bit VAs is
      detected at early boot time.
      
      On arm64 userspace addresses translation is controlled by TTBR0_EL1. As
      well as userspace, TTBR0_EL1 controls:
       * The identity mapping,
       * EFI runtime code.
      
      It is possible to run a kernel with an identity mapping that has a
      larger VA size than userspace (and for this case __cpu_set_tcr_t0sz()
      would set TCR_EL1.T0SZ as appropriate). However, when the conditions for
      52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at
      12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is
      disabled.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      67e7fdfc
  3. 20 11月, 2018 1 次提交
    • A
      arm64: mm: apply r/o permissions of VM areas to its linear alias as well · c55191e9
      Ard Biesheuvel 提交于
      On arm64, we use block mappings and contiguous hints to map the linear
      region, to minimize the TLB footprint. However, this means that the
      entire region is mapped using read/write permissions, which we cannot
      modify at page granularity without having to take intrusive measures to
      prevent TLB conflicts.
      
      This means the linear aliases of pages belonging to read-only mappings
      (executable or otherwise) in the vmalloc region are also mapped read/write,
      and could potentially be abused to modify things like module code, bpf JIT
      code or other read-only data.
      
      So let's fix this, by extending the set_memory_ro/rw routines to take
      the linear alias into account. The consequence of enabling this is
      that we can no longer use block mappings or contiguous hints, so in
      cases where the TLB footprint of the linear region is a bottleneck,
      performance may be affected.
      
      Therefore, allow this feature to be runtime en/disabled, by setting
      rodata=full (or 'on' to disable just this enhancement, or 'off' to
      disable read-only mappings for code and r/o data entirely) on the
      kernel command line. Also, allow the default value to be set via a
      Kconfig option.
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c55191e9
  4. 09 11月, 2018 1 次提交
  5. 31 10月, 2018 1 次提交
    • M
      memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc* · 9a8dd708
      Mike Rapoport 提交于
      Make it explicit that the caller gets a physical address rather than a
      virtual one.
      
      This will also allow using meblock_alloc prefix for memblock allocations
      returning virtual address, which is done in the following patches.
      
      The conversion is done using the following semantic patch:
      
      @@
      expression e1, e2, e3;
      @@
      (
      - memblock_alloc(e1, e2)
      + memblock_phys_alloc(e1, e2)
      |
      - memblock_alloc_nid(e1, e2, e3)
      + memblock_phys_alloc_nid(e1, e2, e3)
      |
      - memblock_alloc_try_nid(e1, e2, e3)
      + memblock_phys_alloc_try_nid(e1, e2, e3)
      )
      
      Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a8dd708
  6. 11 10月, 2018 1 次提交
  7. 25 9月, 2018 2 次提交
    • J
      arm64/mm: use fixmap to modify swapper_pg_dir · 2330b7ca
      Jun Yao 提交于
      Once swapper_pg_dir is in the rodata section, it will not be possible to
      modify it directly, but we will need to modify it in some cases.
      
      To enable this, we can use the fixmap when deliberately modifying
      swapper_pg_dir. As the pgd is only transiently mapped, this provides
      some resilience against illicit modification of the pgd, e.g. for
      Kernel Space Mirror Attack (KSMA).
      Signed-off-by: NJun Yao <yaojun8558363@gmail.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      [Mark: simplify ifdeffery, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2330b7ca
    • J
      arm64/mm: Separate boot-time page tables from swapper_pg_dir · 2b5548b6
      Jun Yao 提交于
      Since the address of swapper_pg_dir is fixed for a given kernel image,
      it is an attractive target for manipulation via an arbitrary write. To
      mitigate this we'd like to make it read-only by moving it into the
      rodata section.
      
      We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
      and reserved_ttbr0, so these will also need to move into rodata.
      However, swapper_pg_dir is allocated along with some transient page
      tables used for boot which we do not want to move into rodata.
      
      As a step towards this, this patch separates the boot-time page tables
      into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
      needs to be. This allows us to retain the relationship between
      swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
      separating these from the boot-time page tables.
      
      The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
      boot, and all of these levels will be freed when we switch to the
      swapper_pg_dir, which is initialized by the existing code in
      paging_init(). Since we start off on the init_pg_dir, we no longer need
      to allocate a transient page table in paging_init() in order to ensure
      that swapper_pg_dir isn't live while we initialize it.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NJun Yao <yaojun8558363@gmail.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      [Mark: place init_pg_dir after BSS, fold mm changes, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2b5548b6
  8. 07 9月, 2018 1 次提交
    • M
      arm64: fix erroneous warnings in page freeing functions · fac880c7
      Mark Rutland 提交于
      In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they
      hit a present non-table entry. In both cases we'll warn for non-present
      entries, as the VM_WARN_ON() only checks the entry is not a table entry.
      
      This has been observed to result in warnings when booting a v4.19-rc2
      kernel under qemu.
      
      Fix this by bailing out earlier for non-present entries.
      
      Fixes: ec28bb9c ("arm64: Implement page table free interfaces")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fac880c7
  9. 06 7月, 2018 1 次提交
    • C
      arm64: Implement page table free interfaces · ec28bb9c
      Chintan Pandya 提交于
      arm64 requires break-before-make. Originally, before
      setting up new pmd/pud entry for huge mapping, in few
      cases, the modifying pmd/pud entry was still valid
      and pointing to next level page table as we only
      clear off leaf PTE in unmap leg.
      
       a) This was resulting into stale entry in TLBs (as few
          TLBs also cache intermediate mapping for performance
          reasons)
       b) Also, modifying pmd/pud was the only reference to
          next level page table and it was getting lost without
          freeing it. So, page leaks were happening.
      
      Implement pud_free_pmd_page() and pmd_free_pte_page() to
      enforce BBM and also free the leaking page tables.
      
      Implementation requires,
       1) Clearing off the current pud/pmd entry
       2) Invalidation of TLB
       3) Freeing of the un-used next level page tables
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ec28bb9c
  10. 05 7月, 2018 1 次提交
    • C
      ioremap: Update pgtable free interfaces with addr · 785a19f9
      Chintan Pandya 提交于
      The following kernel panic was observed on ARM64 platform due to a stale
      TLB entry.
      
       1. ioremap with 4K size, a valid pte page table is set.
       2. iounmap it, its pte entry is set to 0.
       3. ioremap the same address with 2M size, update its pmd entry with
          a new value.
       4. CPU may hit an exception because the old pmd entry is still in TLB,
          which leads to a kernel panic.
      
      Commit b6bdb751 ("mm/vmalloc: add interfaces to free unmapped page
      table") has addressed this panic by falling to pte mappings in the above
      case on ARM64.
      
      To support pmd mappings in all cases, TLB purge needs to be performed
      in this case on ARM64.
      
      Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
      so that TLB purge can be added later in seprate patches.
      
      [toshi.kani@hpe.com: merge changes, rewrite patch description]
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Signed-off-by: NChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
      785a19f9
  11. 24 5月, 2018 1 次提交
  12. 23 3月, 2018 1 次提交
    • T
      mm/vmalloc: add interfaces to free unmapped page table · b6bdb751
      Toshi Kani 提交于
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
      
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
      
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
         up.
      
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
      
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
         purge.
      
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      entries.
      
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: NLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6bdb751
  13. 26 2月, 2018 1 次提交
  14. 22 2月, 2018 1 次提交
  15. 17 2月, 2018 1 次提交
    • W
      arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables · 20a004e7
      Will Deacon 提交于
      In many cases, page tables can be accessed concurrently by either another
      CPU (due to things like fast gup) or by the hardware page table walker
      itself, which may set access/dirty bits. In such cases, it is important
      to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
      entries cannot be torn, merged or subject to apparent loss of coherence
      due to compiler transformations.
      
      Whilst there are some scenarios where this cannot happen (e.g. pinned
      kernel mappings for the linear region), the overhead of using READ_ONCE
      /WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
      to reason about. This patch consistently uses these macros in the arch
      code, as well as explicitly namespacing pointers to page table entries
      from the entries themselves by using adopting a 'p' suffix for the former
      (as is sometimes used elsewhere in the kernel source).
      Tested-by: NYury Norov <ynorov@caviumnetworks.com>
      Tested-by: NRichard Ruigrok <rruigrok@codeaurora.org>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      20a004e7
  16. 07 2月, 2018 1 次提交
  17. 15 1月, 2018 2 次提交
  18. 09 1月, 2018 2 次提交
  19. 23 12月, 2017 2 次提交
  20. 11 12月, 2017 2 次提交
  21. 07 11月, 2017 1 次提交
  22. 28 7月, 2017 1 次提交
  23. 30 5月, 2017 1 次提交
  24. 06 4月, 2017 1 次提交
    • T
      arm64: kdump: protect crash dump kernel memory · 98d2e153
      Takahiro Akashi 提交于
      arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres()
      are meant to be called by kexec_load() in order to protect the memory
      allocated for crash dump kernel once the image is loaded.
      
      The protection is implemented by unmapping the relevant segments in crash
      dump kernel memory, rather than making it read-only as other archs do,
      to prevent coherency issues due to potential cache aliasing (with
      mismatched attributes).
      
      Page-level mappings are consistently used here so that we can change
      the attributes of segments in page granularity as well as shrink the region
      also in page granularity through /sys/kernel/kexec_crash_size, putting
      the freed memory back to buddy system.
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      98d2e153
  25. 23 3月, 2017 10 次提交