1. 18 3月, 2020 4 次提交
  2. 13 11月, 2019 1 次提交
  3. 05 10月, 2019 1 次提交
    • W
      Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}" · fc7d6bfd
      Will Deacon 提交于
      commit d0b7a302d58abe24ed0f32a0672dd4c356bb73db upstream.
      
      This reverts commit 24fe1b0e.
      
      Commit 24fe1b0e ("arm64: Remove unnecessary ISBs from
      set_{pte,pmd,pud}") removed ISB instructions immediately following updates
      to the page table, on the grounds that they are not required by the
      architecture and a DSB alone is sufficient to ensure that subsequent data
      accesses use the new translation:
      
        DDI0487E_a, B2-128:
      
        | ... no instruction that appears in program order after the DSB
        | instruction can alter any state of the system or perform any part of
        | its functionality until the DSB completes other than:
        |
        | * Being fetched from memory and decoded
        | * Reading the general-purpose, SIMD and floating-point,
        |   Special-purpose, or System registers that are directly or indirectly
        |   read without causing side-effects.
      
      However, the same document also states the following:
      
        DDI0487E_a, B2-125:
      
        | DMB and DSB instructions affect reads and writes to the memory system
        | generated by Load/Store instructions and data or unified cache
        | maintenance instructions being executed by the PE. Instruction fetches
        | or accesses caused by a hardware translation table access are not
        | explicit accesses.
      
      which appears to claim that the DSB alone is insufficient.  Unfortunately,
      some CPU designers have followed the second clause above, whereas in Linux
      we've been relying on the first. This means that our mapping sequence:
      
      	MOV	X0, <valid pte>
      	STR	X0, [Xptep]	// Store new PTE to page table
      	DSB	ISHST
      	LDR	X1, [X2]	// Translates using the new PTE
      
      can actually raise a translation fault on the load instruction because the
      translation can be performed speculatively before the page table update and
      then marked as "faulting" by the CPU. For user PTEs, this is ok because we
      can handle the spurious fault, but for kernel PTEs and intermediate table
      entries this results in a panic().
      
      Revert the offending commit to reintroduce the missing barriers.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 24fe1b0e ("arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}")
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc7d6bfd
  4. 25 8月, 2019 1 次提交
  5. 31 5月, 2019 1 次提交
    • Q
      arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable · efa336f7
      Qian Cai 提交于
      [ Upstream commit 74dd022f9e6260c3b5b8d15901d27ebcc5f21eda ]
      
      When building with -Wunused-but-set-variable, the compiler shouts about
      a number of pte_unmap() users, since this expands to an empty macro on
      arm64:
      
        | mm/gup.c: In function 'gup_pte_range':
        | mm/gup.c:1727:16: warning: variable 'ptem' set but not used
        | [-Wunused-but-set-variable]
        | mm/gup.c: At top level:
        | mm/memory.c: In function 'copy_pte_range':
        | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/swap_state.c: In function 'swap_ra_info':
        | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used
        | [-Wunused-but-set-variable]
        | mm/madvise.c: In function 'madvise_free_pte_range':
        | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
        | [-Wunused-but-set-variable]
      
      Rewrite pte_unmap() as a static inline function, which silences the
      warnings.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efa336f7
  6. 28 6月, 2018 1 次提交
  7. 08 6月, 2018 1 次提交
    • L
      mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea
      Laurent Dufour 提交于
      Currently the PTE special supports is turned on in per architecture
      header files.  Most of the time, it is defined in
      arch/*/include/asm/pgtable.h depending or not on some other per
      architecture static definition.
      
      This patch introduce a new configuration variable to manage this
      directly in the Kconfig files.  It would later replace
      __HAVE_ARCH_PTE_SPECIAL.
      
      Here notes for some architecture where the definition of
      __HAVE_ARCH_PTE_SPECIAL is not obvious:
      
      arm
       __HAVE_ARCH_PTE_SPECIAL which is currently defined in
      arch/arm/include/asm/pgtable-3level.h which is included by
      arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
      So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.
      
      powerpc
      __HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
       - arch/powerpc/include/asm/book3s/64/pgtable.h
       - arch/powerpc/include/asm/pte-common.h
      The first one is included if (PPC_BOOK3S & PPC64) while the second is
      included in all the other cases.
      So select ARCH_HAS_PTE_SPECIAL all the time.
      
      sparc:
      __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
      defined(__arch64__) which are defined through the compiler in
      sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
      So select ARCH_HAS_PTE_SPECIAL if SPARC64
      
      There is no functional change introduced by this patch.
      
      Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NJerome Glisse <jglisse@redhat.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <albert@sifive.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Christophe LEROY <christophe.leroy@c-s.fr>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3010a5ea
  8. 24 4月, 2018 1 次提交
  9. 17 2月, 2018 1 次提交
    • W
      arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables · 20a004e7
      Will Deacon 提交于
      In many cases, page tables can be accessed concurrently by either another
      CPU (due to things like fast gup) or by the hardware page table walker
      itself, which may set access/dirty bits. In such cases, it is important
      to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
      entries cannot be torn, merged or subject to apparent loss of coherence
      due to compiler transformations.
      
      Whilst there are some scenarios where this cannot happen (e.g. pinned
      kernel mappings for the linear region), the overhead of using READ_ONCE
      /WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
      to reason about. This patch consistently uses these macros in the arch
      code, as well as explicitly namespacing pointers to page table entries
      from the entries themselves by using adopting a 'p' suffix for the former
      (as is sometimes used elsewhere in the kernel source).
      Tested-by: NYury Norov <ynorov@caviumnetworks.com>
      Tested-by: NRichard Ruigrok <rruigrok@codeaurora.org>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      20a004e7
  10. 01 2月, 2018 1 次提交
  11. 15 1月, 2018 1 次提交
  12. 23 12月, 2017 3 次提交
  13. 12 12月, 2017 2 次提交
    • W
      arm64: mm: Fix false positives in set_pte_at access/dirty race detection · 86c9e812
      Will Deacon 提交于
      Jiankang reports that our race detection in set_pte_at is firing when
      copying the page tables in dup_mmap as a result of a fork(). In this
      situation, the page table isn't actually live and so there is no way
      that we can race with a concurrent update from the hardware page table
      walker.
      
      This patch reworks the race detection so that we require either the
      mm to match the current active_mm (i.e. currently installed in our TTBR0)
      or the mm_users count to be greater than 1, implying that the page table
      could be live in another CPU. The mm_users check might still be racy,
      but we'll avoid false positives and it's not realistic to validate that
      all the necessary locks are held as part of this assertion.
      
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Reported-by: NJiankang Chen <chenjiankang1@huawei.com>
      Tested-by: NJiankang Chen <chenjiankang1@huawei.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      86c9e812
    • S
      arm64: mm: Fix pte_mkclean, pte_mkdirty semantics · 8781bcbc
      Steve Capper 提交于
      On systems with hardware dirty bit management, the ltp madvise09 unit
      test fails due to dirty bit information being lost and pages being
      incorrectly freed.
      
      This was bisected to:
      	arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
      
      Reverting this commit leads to a separate problem, that the unit test
      retains pages that should have been dropped due to the function
      madvise_free_pte_range(.) not cleaning pte's properly.
      
      Currently pte_mkclean only clears the software dirty bit, thus the
      following code sequence can appear:
      
      	pte = pte_mkclean(pte);
      	if (pte_dirty(pte))
      		// this condition can return true with HW DBM!
      
      This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
      clearing both the SW and HW dirty information.
      
      In order for this to function on systems without HW DBM, we need to
      also adjust pte_mkdirty to remove the read only bit from writable pte's
      to avoid infinite fault loops.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 64c26841 ("arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()")
      Reported-by: NBhupinder Thakur <bhupinder.thakur@linaro.org>
      Tested-by: NBhupinder Thakur <bhupinder.thakur@linaro.org>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8781bcbc
  14. 11 12月, 2017 1 次提交
  15. 30 11月, 2017 1 次提交
  16. 30 10月, 2017 1 次提交
    • C
      arm64: Implement arch-specific pte_access_permitted() · 6218f96c
      Catalin Marinas 提交于
      The generic pte_access_permitted() implementation only checks for
      pte_present() (together with the write permission where applicable).
      However, for both kernel ptes and PROT_NONE mappings pte_present() also
      returns true on arm64 even though such mappings are not user accessible.
      Additionally, arm64 now supports execute-only user permission
      (PROT_EXEC) which is implemented by clearing the PTE_USER bit.
      
      With this patch the arm64 implementation of pte_access_permitted()
      checks for the PTE_VALID and PTE_USER bits together with writable access
      if applicable.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6218f96c
  17. 29 9月, 2017 1 次提交
    • W
      arm64: mm: Use READ_ONCE when dereferencing pointer to pte table · f069faba
      Will Deacon 提交于
      On kernels built with support for transparent huge pages, different CPUs
      can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
      and they must take care to use READ_ONCE to avoid value tearing or caching
      of stale values by the compiler. Unfortunately, these functions call into
      our pgtable macros, which don't use READ_ONCE, and compiler caching has
      been observed to cause the following crash during ext4 writeback:
      
      PC is at check_pte+0x20/0x170
      LR is at page_vma_mapped_walk+0x2e0/0x540
      [...]
      Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
      Call trace:
      [<ffff000008233328>] check_pte+0x20/0x170
      [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
      [<ffff000008234adc>] page_mkclean_one+0xac/0x278
      [<ffff000008234d98>] rmap_walk_file+0xf0/0x238
      [<ffff000008236e74>] rmap_walk+0x64/0xa0
      [<ffff0000082370c8>] page_mkclean+0x90/0xa8
      [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
      [<ffff00000832f984>] mpage_submit_page+0x34/0x98
      [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
      [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
      [<ffff00000833530c>] ext4_writepages+0x484/0xe30
      [<ffff0000081f6ab4>] do_writepages+0x44/0xe8
      [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
      [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
      [<ffff000008324310>] ext4_sync_file+0x80/0x4b8
      [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
      [<ffff0000082332b4>] SyS_msync+0x194/0x1e8
      
      This is because page_vma_mapped_walk loads the PMD twice before calling
      pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
      due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
      (where it sees a valid table pointer due to a concurrent pmd_populate).
      However, the compiler inlines everything and caches the first value in
      a register, which is subsequently used in pte_offset_phys which returns
      a junk pointer that is later dereferenced when attempting to access the
      relevant pte.
      
      This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
      that a stale value is not used. Whilst this is a point fix for a known
      failure (and simple to backport), a full fix moving all of our page table
      accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
      page_vma_mapped_walk is in the works for a future kernel release.
      
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Timur Tabi <timur@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Fixes: f27176cf ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
      Tested-by: NRichard Ruigrok <rruigrok@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f069faba
  18. 21 8月, 2017 4 次提交
  19. 12 6月, 2017 1 次提交
  20. 23 3月, 2017 1 次提交
    • A
      arm64: mm: set the contiguous bit for kernel mappings where appropriate · d27cfa1f
      Ard Biesheuvel 提交于
      This is the third attempt at enabling the use of contiguous hints for
      kernel mappings. The most recent attempt 0bfc445d was reverted after
      it turned out that updating permission attributes on live contiguous ranges
      may result in TLB conflicts. So this time, the contiguous hint is not set
      for .rodata or for the linear alias of .text/.rodata, both of which are
      mapped read-write initially, and remapped read-only at a later stage.
      (Note that the latter region could also be unmapped and remapped again
      with updated permission attributes, given that the region, while live, is
      only mapped for the convenience of the hibernation code, but that also
      means the TLB footprint is negligible anyway, so why bother)
      
      This enables the following contiguous range sizes for the virtual mapping
      of the kernel image, and for the linear mapping:
      
                granule size |  cont PTE  |  cont PMD  |
                -------------+------------+------------+
                     4 KB    |    64 KB   |   32 MB    |
                    16 KB    |     2 MB   |    1 GB*   |
                    64 KB    |     2 MB   |   16 GB*   |
      
      * Only when built for 3 or more levels of translation. This is due to the
        fact that a 2 level configuration only consists of PGDs and PTEs, and the
        added complexity of dealing with folded PMDs is not justified considering
        that 16 GB contiguous ranges are likely to be ignored by the hardware (and
        16k/2 levels is a niche configuration)
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      d27cfa1f
  21. 01 2月, 2017 1 次提交
    • C
      arm64: Improve detection of user/non-user mappings in set_pte(_at) · ec663d96
      Catalin Marinas 提交于
      Commit cab15ce6 ("arm64: Introduce execute-only page access
      permissions") allowed a valid user PTE to have the PTE_USER bit clear.
      As a consequence, the pte_valid_not_user() macro in set_pte() was
      replaced with pte_valid_global() under the assumption that only user
      pages have the nG bit set. EFI mappings, however, also have the nG bit
      set and set_pte() wrongly ignores issuing the DSB+ISB.
      
      This patch reinstates the pte_valid_not_user() macro and adds the
      PTE_UXN bit check since all kernel mappings have this bit set. For
      clarity, pte_exec() is renamed to pte_user_exec() as it only checks for
      the absence of PTE_UXN. Consequently, the user executable check in
      set_pte_at() drops the pte_ng() test since pte_user_exec() is
      sufficient.
      
      Fixes: cab15ce6 ("arm64: Introduce execute-only page access permissions")
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ec663d96
  22. 12 1月, 2017 1 次提交
  23. 26 8月, 2016 2 次提交
    • J
      arm64: hibernate: Support DEBUG_PAGEALLOC · 5ebe3a44
      James Morse 提交于
      DEBUG_PAGEALLOC removes the valid bit of page table entries to prevent
      any access to unallocated memory. Hibernate uses this as a hint that those
      pages don't need to be saved/restored. This patch adds the
      kernel_page_present() function it uses.
      
      hibernate.c copies the resume kernel's linear map for use during restore.
      Add _copy_pte() to fill-in the holes made by DEBUG_PAGEALLOC in the resume
      kernel, so we can restore data the original kernel had at these addresses.
      
      Finally, DEBUG_PAGEALLOC means the linear-map alias of KERNEL_START to
      KERNEL_END may have holes in it, so we can't lazily clean this whole
      area to the PoC. Only clean the new mmuoff region, and the kernel/kvm
      idmaps.
      
      This reverts commit da24eb1f.
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5ebe3a44
    • C
      arm64: Introduce execute-only page access permissions · cab15ce6
      Catalin Marinas 提交于
      The ARMv8 architecture allows execute-only user permissions by clearing
      the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU
      implementation without User Access Override (ARMv8.2 onwards) can still
      access such page, so execute-only page permission does not protect
      against read(2)/write(2) etc. accesses. Systems requiring such
      protection must enable features like SECCOMP.
      
      This patch changes the arm64 __P100 and __S100 protection_map[] macros
      to the new __PAGE_EXECONLY attributes. A side effect is that
      pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't
      set. To work around this, the check is done on the PTE_NG bit via the
      pte_ng() macro. VM_READ is also checked now for page faults.
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      cab15ce6
  24. 04 8月, 2016 1 次提交
    • S
      arm64: Fix copy-on-write referencing in HugeTLB · 747a70e6
      Steve Capper 提交于
      set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
      writing the entry to the table.
      
      This can cause problems with the copy-on-write logic in hugetlb_cow:
       *) hugetlb_cow(.) called to handle a write fault on read only pte,
       *) Before the copy-on-write updates the new page table a call is
          made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
       *) Because set_pte_at(.) changed the pte, *ptep != pte, and the
          hugetlb_cow(.) code erroneously assumes that it lost the race,
       *) The new page is subsequently freed without being used.
      
      On arm64 this problem only becomes apparent when we apply:
      67961f9d mm/hugetlb: fix huge page reserve accounting for private
      mappings
      
      When one runs the libhugetlbfs test suite, there are allocation errors
      and hugetlbfs pages become erroneously locked in memory as reserved.
      (There is a high HugePages_Rsvd: count).
      
      In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
      allowing for the libhugetlbfs test suite to pass as expected and
      without leaking any reserved HugeTLB pages.
      Reported-by: NHuang Shijie <shijie.huang@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      747a70e6
  25. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  26. 10 5月, 2016 1 次提交
    • C
      kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables · 06485053
      Catalin Marinas 提交于
      The ARMv8.1 architecture extensions introduce support for hardware
      updates of the access and dirty information in page table entries. With
      VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
      PTE_AF bit cleared in the stage 2 page table, instead of raising an
      Access Flag fault to EL2 the CPU sets the actual page table entry bit
      (10). To ensure that kernel modifications to the page table do not
      inadvertently revert a bit set by hardware updates, certain Stage 2
      software pte/pmd operations must be performed atomically.
      
      The main user of the AF bit is the kvm_age_hva() mechanism. The
      kvm_age_hva_handler() function performs a "test and clear young" action
      on the pte/pmd. This needs to be atomic in respect of automatic hardware
      updates of the AF bit. Since the AF bit is in the same position for both
      Stage 1 and Stage 2, the patch reuses the existing
      ptep_test_and_clear_young() functionality if
      __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
      existing pte_young/pte_mkold mechanism is preserved.
      
      The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
      to perform atomic modifications in order to avoid a race with updates of
      the AF bit. The arm64 implementation has been re-written using
      exclusives.
      
      Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
      argument and modify the pte/pmd in place. However, these functions are
      only used on local variables rather than actual page table entries, so
      it makes more sense to follow the pte_mkwrite() approach for stage 1
      attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
      functions do not modify the actual page table entries.
      
      The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
      explicitly) do not need to be modified since hardware updates of the
      dirty status are not supported by KVM, so there is no possibility of
      losing such information.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      06485053
  27. 06 5月, 2016 4 次提交