1. 04 9月, 2020 2 次提交
    • C
      arm64: mte: Add PROT_MTE support to mmap() and mprotect() · 9f341931
      Catalin Marinas 提交于
      To enable tagging on a memory range, the user must explicitly opt in via
      a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
      memory type in the AttrIndx field of a pte, simplify the or'ing of these
      bits over the protection_map[] attributes by making MT_NORMAL index 0.
      
      There are two conditions for arch_vm_get_page_prot() to return the
      MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
      registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
      decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.
      
      arch_calc_vm_prot_bits() is responsible for registering the user request
      as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
      VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
      filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
      mmap() file ops call.
      
      In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
      stack or brk area.
      
      The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
      presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
      will not report an error and the memory will not be mapped as Normal
      Tagged. For consistency, mprotect(PROT_MTE) will not report an error
      either if the memory range does not support MTE. Two subsequent patches
      in the series will propose tightening of this behaviour.
      Co-developed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      9f341931
    • C
      arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE · 34bfeea4
      Catalin Marinas 提交于
      Pages allocated by the kernel are not guaranteed to have the tags
      zeroed, especially as the kernel does not (yet) use MTE itself. To
      ensure the user can still access such pages when mapped into its address
      space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged
      (PG_arch_2) - is used to track pages with valid allocation tags.
      
      Since the zero page is mapped as pte_special(), it won't be covered by
      the above set_pte_at() mechanism. Clear its tags during early MTE
      initialisation.
      Co-developed-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      34bfeea4
  2. 07 7月, 2020 1 次提交
  3. 17 6月, 2020 1 次提交
  4. 10 6月, 2020 2 次提交
    • M
      mm: consolidate pte_index() and pte_offset_*() definitions · 974b9b2c
      Mike Rapoport 提交于
      All architectures define pte_index() as
      
      	(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
      
      and all architectures define pte_offset_kernel() as an entry in the array
      of PTEs indexed by the pte_index().
      
      For the most architectures the pte_offset_kernel() implementation relies
      on the availability of pmd_page_vaddr() that converts a PMD entry value to
      the virtual address of the page containing PTEs array.
      
      Let's move x86 definitions of the PTE accessors to the generic place in
      <linux/pgtable.h> and then simply drop the respective definitions from the
      other architectures.
      
      The architectures that didn't provide pmd_page_vaddr() are updated to have
      that defined.
      
      The generic implementation of pte_offset_kernel() can be overridden by an
      architecture and alpha makes use of this because it has special ordering
      requirements for its version of pte_offset_kernel().
      
      [rppt@linux.ibm.com: v2]
        Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
      [akpm@linux-foundation.org: fix x86 warning]
      [sfr@canb.auug.org.au: fix powerpc build]
        Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      974b9b2c
    • M
      mm: introduce include/linux/pgtable.h · ca5999fd
      Mike Rapoport 提交于
      The include/linux/pgtable.h is going to be the home of generic page table
      manipulation functions.
      
      Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
      make the latter include asm/pgtable.h.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5999fd
  5. 05 6月, 2020 1 次提交
    • M
      arm64: add support for folded p4d page tables · e9f63768
      Mike Rapoport 提交于
      Implement primitives necessary for the 4th level folding, add walks of p4d
      level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and
      remove __ARCH_USE_5LEVEL_HACK.
      
      [arnd@arndb.de: fix gcc-10 shift warning]
        Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.deSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9f63768
  6. 04 6月, 2020 1 次提交
  7. 03 6月, 2020 1 次提交
    • C
      mm: enforce that vmap can't map pages executable · cca98e9f
      Christoph Hellwig 提交于
      To help enforcing the W^X protection don't allow remapping existing pages
      as executable.
      
      x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Mark Rutland <mark.rutland@arm.com>.
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Gao Xiang <xiang@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cca98e9f
  8. 28 4月, 2020 2 次提交
  9. 17 3月, 2020 1 次提交
    • D
      arm64: Basic Branch Target Identification support · 8ef8f360
      Dave Martin 提交于
      This patch adds the bare minimum required to expose the ARMv8.5
      Branch Target Identification feature to userspace.
      
      By itself, this does _not_ automatically enable BTI for any initial
      executable pages mapped by execve().  This will come later, but for
      now it should be possible to enable BTI manually on those pages by
      using mprotect() from within the target process.
      
      Other arches already using the generic mman.h are already using
      0x10 for arch-specific prot flags, so we use that for PROT_BTI
      here.
      
      For consistency, signal handler entry points in BTI guarded pages
      are required to be annotated as such, just like any other function.
      This blocks a relatively minor attack vector, but comforming
      userspace will have the annotations anyway, so we may as well
      enforce them.
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8ef8f360
  10. 04 2月, 2020 1 次提交
    • S
      arm64: mm: add p?d_leaf() definitions · 8aa82df3
      Steven Price 提交于
      walk_page_range() is going to be allowed to walk page tables other than
      those of user space.  For this it needs to know when it has reached a
      'leaf' entry in the page tables.  This information will be provided by the
      p?d_leaf() functions/macros.
      
      For arm64, we already have p?d_sect() macros which we can reuse for
      p?d_leaf().
      
      pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
      or CONFIG_ARM64_64K_PAGES is defined.  However when the kernel is
      configured this way then architecturally it isn't allowed to have a large
      page at this level, and any code using these page walking macros is
      implicitly relying on the page size/number of levels being the same as the
      kernel.  So it is safe to reuse this for p?d_leaf() as it is an
      architectural restriction.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-5-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aa82df3
  11. 07 1月, 2020 1 次提交
    • C
      arm64: Revert support for execute-only user mappings · 24cecc37
      Catalin Marinas 提交于
      The ARMv8 64-bit architecture supports execute-only user permissions by
      clearing the PTE_USER and PTE_UXN bits, practically making it a mostly
      privileged mapping but from which user running at EL0 can still execute.
      
      The downside, however, is that the kernel at EL1 inadvertently reading
      such mapping would not trip over the PAN (privileged access never)
      protection.
      
      Revert the relevant bits from commit cab15ce6 ("arm64: Introduce
      execute-only page access permissions") so that PROT_EXEC implies
      PROT_READ (and therefore PTE_USER) until the architecture gains proper
      support for execute-only user mappings.
      
      Fixes: cab15ce6 ("arm64: Introduce execute-only page access permissions")
      Cc: <stable@vger.kernel.org> # 4.9.x-
      Acked-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24cecc37
  12. 07 11月, 2019 1 次提交
  13. 25 10月, 2019 1 次提交
  14. 18 10月, 2019 1 次提交
  15. 12 10月, 2019 1 次提交
    • C
      arm64: Fix kcore macros after 52-bit virtual addressing fallout · 86109a69
      Chris von Recklinghausen 提交于
      We export the entire kernel address space (i.e. the whole of the TTBR1
      address range) via /proc/kcore. The kc_vaddr_to_offset() and
      kc_offset_to_vaddr() macros are intended to convert between a kernel
      virtual address and its offset relative to the start of the TTBR1
      address space.
      
      Prior to commit:
      
        14c127c9 ("arm64: mm: Flip kernel VA space")
      
      ... the offset was calculated relative to VA_START, which at the time
      was the start of the TTBR1 address space. At this time, PAGE_OFFSET
      pointed to the high half of the TTBR1 address space where arm64's
      linear map lived.
      
      That commit swapped the position of VA_START and PAGE_OFFSET, but
      failed to update kc_vaddr_to_offset() or kc_offset_to_vaddr(), so
      since then the two macros behave incorrectly.
      
      Note that VA_START was subsequently renamed to PAGE_END in commit:
      
        77ad4ce6 ("arm64: memory: rename VA_START to PAGE_END")
      
      As the generic implementations of the two macros calculate the offset
      relative to PAGE_OFFSET (which is now the start of the TTBR1 address
      space), we can delete the arm64 implementation and use those.
      
      Fixes: 14c127c9 ("arm64: mm: Flip kernel VA space")
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NChris von Recklinghausen <crecklin@redhat.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      86109a69
  16. 25 9月, 2019 1 次提交
  17. 29 8月, 2019 2 次提交
  18. 28 8月, 2019 2 次提交
    • W
      arm64: mm: Add ISB instruction to set_pgd() · eb6a4dcc
      Will Deacon 提交于
      Commit 6a4cbd63c25a ("Revert "arm64: Remove unnecessary ISBs from
      set_{pte,pmd,pud}"") reintroduced ISB instructions to some of our
      page table setter functions in light of a recent clarification to the
      Armv8 architecture. Although 'set_pgd()' isn't currently used to update
      a live page table, add the ISB instruction there too for consistency
      with the other macros and to provide some future-proofing if we use it
      on live tables in the future.
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      eb6a4dcc
    • W
      Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}" · d0b7a302
      Will Deacon 提交于
      This reverts commit 24fe1b0e.
      
      Commit 24fe1b0e ("arm64: Remove unnecessary ISBs from
      set_{pte,pmd,pud}") removed ISB instructions immediately following updates
      to the page table, on the grounds that they are not required by the
      architecture and a DSB alone is sufficient to ensure that subsequent data
      accesses use the new translation:
      
        DDI0487E_a, B2-128:
      
        | ... no instruction that appears in program order after the DSB
        | instruction can alter any state of the system or perform any part of
        | its functionality until the DSB completes other than:
        |
        | * Being fetched from memory and decoded
        | * Reading the general-purpose, SIMD and floating-point,
        |   Special-purpose, or System registers that are directly or indirectly
        |   read without causing side-effects.
      
      However, the same document also states the following:
      
        DDI0487E_a, B2-125:
      
        | DMB and DSB instructions affect reads and writes to the memory system
        | generated by Load/Store instructions and data or unified cache
        | maintenance instructions being executed by the PE. Instruction fetches
        | or accesses caused by a hardware translation table access are not
        | explicit accesses.
      
      which appears to claim that the DSB alone is insufficient.  Unfortunately,
      some CPU designers have followed the second clause above, whereas in Linux
      we've been relying on the first. This means that our mapping sequence:
      
      	MOV	X0, <valid pte>
      	STR	X0, [Xptep]	// Store new PTE to page table
      	DSB	ISHST
      	LDR	X1, [X2]	// Translates using the new PTE
      
      can actually raise a translation fault on the load instruction because the
      translation can be performed speculatively before the page table update and
      then marked as "faulting" by the CPU. For user PTEs, this is ok because we
      can handle the spurious fault, but for kernel PTEs and intermediate table
      entries this results in a panic().
      
      Revert the offending commit to reintroduce the missing barriers.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 24fe1b0e ("arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}")
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      d0b7a302
  19. 15 8月, 2019 1 次提交
    • M
      arm64: memory: rename VA_START to PAGE_END · 77ad4ce6
      Mark Rutland 提交于
      Prior to commit:
      
        14c127c9 ("arm64: mm: Flip kernel VA space")
      
      ... VA_START described the start of the TTBR1 address space for a given
      VA size described by VA_BITS, where all kernel mappings began.
      
      Since that commit, VA_START described a portion midway through the
      address space, where the linear map ends and other kernel mappings
      begin.
      
      To avoid confusion, let's rename VA_START to PAGE_END, making it clear
      that it's not the start of the TTBR1 address space and implying that
      it's related to PAGE_OFFSET. Comments and other mnemonics are updated
      accordingly, along with a typo fix in the decription of VMEMMAP_SIZE.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Tested-by: NSteve Capper <steve.capper@arm.com>
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      77ad4ce6
  20. 09 8月, 2019 3 次提交
    • S
      arm64: mm: Separate out vmemmap · c8b6d2cc
      Steve Capper 提交于
      vmemmap is a preprocessor definition that depends on a variable,
      memstart_addr. In a later patch we will need to expand the size of
      the VMEMMAP region and optionally modify vmemmap depending upon
      whether or not hardware support is available for 52-bit virtual
      addresses.
      
      This patch changes vmemmap to be a variable. As the old definition
      depended on a variable load, this should not affect performance
      noticeably.
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      c8b6d2cc
    • S
      arm64: mm: Flip kernel VA space · 14c127c9
      Steve Capper 提交于
      In order to allow for a KASAN shadow that changes size at boot time, one
      must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
      start address. Also, it is highly desirable to maintain the same
      function addresses in the kernel .text between VA sizes. Both of these
      requirements necessitate us to flip the kernel address space halves s.t.
      the direct linear map occupies the lower addresses.
      
      This patch puts the direct linear map in the lower addresses of the
      kernel VA range and everything else in the higher ranges.
      
      We need to adjust:
       *) KASAN shadow region placement logic,
       *) KASAN_SHADOW_OFFSET computation logic,
       *) virt_to_phys, phys_to_virt checks,
       *) page table dumper.
      
      These are all small changes, that need to take place atomically, so they
      are bundled into this commit.
      
      As part of the re-arrangement, a guard region of 2MB (to preserve
      alignment for fixed map) is added after the vmemmap. Otherwise the
      vmemmap could intersect with IS_ERR pointers.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      14c127c9
    • J
      arm64: mm: add missing PTE_SPECIAL in pte_mkdevmap on arm64 · 30e23538
      Jia He 提交于
      Without this patch, the MAP_SYNC test case will cause a print_bad_pte
      warning on arm64 as follows:
      
      [   25.542693] BUG: Bad page map in process mapdax333 pte:2e8000448800f53 pmd:41ff5f003
      [   25.546360] page:ffff7e0010220000 refcount:1 mapcount:-1 mapping:ffff8003e29c7440 index:0x0
      [   25.550281] ext4_dax_aops
      [   25.550282] name:"__aaabbbcccddd__"
      [   25.551553] flags: 0x3ffff0000001002(referenced|reserved)
      [   25.555802] raw: 03ffff0000001002 ffff8003dfffa908 0000000000000000 ffff8003e29c7440
      [   25.559446] raw: 0000000000000000 0000000000000000 00000001fffffffe 0000000000000000
      [   25.563075] page dumped because: bad pte
      [   25.564938] addr:0000ffffbe05b000 vm_flags:208000fb anon_vma:0000000000000000 mapping:ffff8003e29c7440 index:0
      [   25.574272] file:__aaabbbcccddd__ fault:ext4_dax_fault mmmmap:ext4_file_mmap readpage:0x0
      [   25.578799] CPU: 1 PID: 1180 Comm: mapdax333 Not tainted 5.2.0+ #21
      [   25.581702] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [   25.585624] Call trace:
      [   25.587008]  dump_backtrace+0x0/0x178
      [   25.588799]  show_stack+0x24/0x30
      [   25.590328]  dump_stack+0xa8/0xcc
      [   25.591901]  print_bad_pte+0x18c/0x218
      [   25.593628]  unmap_page_range+0x778/0xc00
      [   25.595506]  unmap_single_vma+0x94/0xe8
      [   25.597304]  unmap_vmas+0x90/0x108
      [   25.598901]  unmap_region+0xc0/0x128
      [   25.600566]  __do_munmap+0x284/0x3f0
      [   25.602245]  __vm_munmap+0x78/0xe0
      [   25.603820]  __arm64_sys_munmap+0x34/0x48
      [   25.605709]  el0_svc_common.constprop.0+0x78/0x168
      [   25.607956]  el0_svc_handler+0x34/0x90
      [   25.609698]  el0_svc+0x8/0xc
      [...]
      
      The root cause is in _vm_normal_page, without the PTE_SPECIAL bit,
      the return value will be incorrectly set to pfn_to_page(pfn) instead
      of NULL. Besides, this patch also rewrite the pmd_mkdevmap to avoid
      setting PTE_SPECIAL for pmd
      
      The MAP_SYNC test case is as follows(Provided by Yibo Cai)
      $#include <stdio.h>
      $#include <string.h>
      $#include <unistd.h>
      $#include <sys/file.h>
      $#include <sys/mman.h>
      
      $#ifndef MAP_SYNC
      $#define MAP_SYNC 0x80000
      $#endif
      
      /* mount -o dax /dev/pmem0 /mnt */
      $#define F "/mnt/__aaabbbcccddd__"
      
      int main(void)
      {
          int fd;
          char buf[4096];
          void *addr;
      
          if ((fd = open(F, O_CREAT|O_TRUNC|O_RDWR, 0644)) < 0) {
              perror("open1");
              return 1;
          }
      
          if (write(fd, buf, 4096) != 4096) {
              perror("lseek");
              return 1;
          }
      
          addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_SYNC, fd, 0);
          if (addr == MAP_FAILED) {
              perror("mmap");
              printf("did you mount with '-o dax'?\n");
              return 1;
          }
      
          memset(addr, 0x55, 4096);
      
          if (munmap(addr, 4096) == -1) {
              perror("munmap");
              return 1;
          }
      
          close(fd);
      
          return 0;
      }
      
      Fixes: 73b20c84 ("arm64: mm: implement pte_devmap support")
      Reported-by: NYibo Cai <Yibo.Cai@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NRobin Murphy <Robin.Murphy@arm.com>
      Signed-off-by: NJia He <justin.he@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      30e23538
  21. 01 8月, 2019 1 次提交
    • Q
      arm64/mm: fix variable 'pud' set but not used · 7d4e2dcf
      Qian Cai 提交于
      GCC throws a warning,
      
      arch/arm64/mm/mmu.c: In function 'pud_free_pmd_page':
      arch/arm64/mm/mmu.c:1033:8: warning: variable 'pud' set but not used
      [-Wunused-but-set-variable]
        pud_t pud;
              ^~~
      
      because pud_table() is a macro and compiled away. Fix it by making it a
      static inline function and for pud_sect() as well.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NWill Deacon <will@kernel.org>
      7d4e2dcf
  22. 22 7月, 2019 1 次提交
  23. 17 7月, 2019 1 次提交
  24. 19 6月, 2019 1 次提交
  25. 18 6月, 2019 1 次提交
  26. 10 6月, 2019 1 次提交
    • M
      arm64: mm: avoid redundant READ_ONCE(*ptep) · 9b604722
      Mark Rutland 提交于
      In set_pte_at(), we read the old pte value so that it can be passed into
      checks for racy hw updates. These checks are only performed for
      CONFIG_DEBUG_VM, and the value is not used otherwise.
      
      Since we read the pte value with READ_ONCE(), the compiler cannot elide
      the redundant read for !CONFIG_DEBUG_VM kernels.
      
      Let's ameliorate matters by moving the read and the checks into a
      helper, __check_racy_pte_update(), which only performs the read when the
      value will be used. This also allows us to reformat the conditions for
      clarity.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9b604722
  27. 04 6月, 2019 1 次提交
  28. 30 4月, 2019 2 次提交
    • Q
      arm64: mm: Remove pte_unmap_nested() · 5fbbeedb
      Qian Cai 提交于
      As of commit ece0e2b6 ("mm: remove pte_*map_nested()"),
      pte_unmap_nested() is no longer used and can be removed from the arm64
      code.
      Signed-off-by: NQian Cai <cai@lca.pw>
      [will: also remove pte_offset_map_nested()]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5fbbeedb
    • Q
      arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable · 74dd022f
      Qian Cai 提交于
      When building with -Wunused-but-set-variable, the compiler shouts about
      a number of pte_unmap() users, since this expands to an empty macro on
      arm64:
      
        | mm/gup.c: In function 'gup_pte_range':
        | mm/gup.c:1727:16: warning: variable 'ptem' set but not used
        | [-Wunused-but-set-variable]
        | mm/gup.c: At top level:
        | mm/memory.c: In function 'copy_pte_range':
        | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/swap_state.c: In function 'swap_ra_info':
        | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/madvise.c: In function 'madvise_free_pte_range':
        | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
        | [-Wunused-but-set-variable]
      
      Rewrite pte_unmap() as a static inline function, which silences the
      warnings.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      74dd022f
  29. 18 12月, 2018 3 次提交
  30. 27 11月, 2018 1 次提交
    • A
      arm64: mm: Don't wait for completion of TLB invalidation when page aging · 3403e56b
      Alex Van Brunt 提交于
      When transitioning a PTE from young to old as part of page aging, we
      can avoid waiting for the TLB invalidation to complete and therefore
      drop the subsequent DSB instruction. Whilst this opens up a race with
      page reclaim, where a PTE in active use via a stale, young TLB entry
      does not update the underlying descriptor, the worst thing that happens
      is that the page is reclaimed and then immediately faulted back in.
      
      Given that we have a DSB in our context-switch path, the window for a
      spurious reclaim is fairly limited and eliding the barrier claims to
      boost NVMe/SSD accesses by over 10% on some platforms.
      
      A similar optimisation was made for x86 in commit b13b1d2d ("x86/mm:
      In the PTE swapout page reclaim case clear the accessed bit instead of
      flushing the TLB").
      Signed-off-by: NAlex Van Brunt <avanbrunt@nvidia.com>
      Signed-off-by: NAshish Mhetre <amhetre@nvidia.com>
      [will: rewrote patch]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3403e56b