1. 04 8月, 2016 1 次提交
    • S
      arm64: Fix copy-on-write referencing in HugeTLB · 747a70e6
      Steve Capper 提交于
      set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
      writing the entry to the table.
      
      This can cause problems with the copy-on-write logic in hugetlb_cow:
       *) hugetlb_cow(.) called to handle a write fault on read only pte,
       *) Before the copy-on-write updates the new page table a call is
          made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
       *) Because set_pte_at(.) changed the pte, *ptep != pte, and the
          hugetlb_cow(.) code erroneously assumes that it lost the race,
       *) The new page is subsequently freed without being used.
      
      On arm64 this problem only becomes apparent when we apply:
      67961f9d mm/hugetlb: fix huge page reserve accounting for private
      mappings
      
      When one runs the libhugetlbfs test suite, there are allocation errors
      and hugetlbfs pages become erroneously locked in memory as reserved.
      (There is a high HugePages_Rsvd: count).
      
      In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
      allowing for the libhugetlbfs test suite to pass as expected and
      without leaking any reserved HugeTLB pages.
      Reported-by: NHuang Shijie <shijie.huang@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      747a70e6
  2. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  3. 10 5月, 2016 1 次提交
    • C
      kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables · 06485053
      Catalin Marinas 提交于
      The ARMv8.1 architecture extensions introduce support for hardware
      updates of the access and dirty information in page table entries. With
      VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
      PTE_AF bit cleared in the stage 2 page table, instead of raising an
      Access Flag fault to EL2 the CPU sets the actual page table entry bit
      (10). To ensure that kernel modifications to the page table do not
      inadvertently revert a bit set by hardware updates, certain Stage 2
      software pte/pmd operations must be performed atomically.
      
      The main user of the AF bit is the kvm_age_hva() mechanism. The
      kvm_age_hva_handler() function performs a "test and clear young" action
      on the pte/pmd. This needs to be atomic in respect of automatic hardware
      updates of the AF bit. Since the AF bit is in the same position for both
      Stage 1 and Stage 2, the patch reuses the existing
      ptep_test_and_clear_young() functionality if
      __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
      existing pte_young/pte_mkold mechanism is preserved.
      
      The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
      to perform atomic modifications in order to avoid a race with updates of
      the AF bit. The arm64 implementation has been re-written using
      exclusives.
      
      Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
      argument and modify the pte/pmd in place. However, these functions are
      only used on local variables rather than actual page table entries, so
      it makes more sense to follow the pte_mkwrite() approach for stage 1
      attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
      functions do not modify the actual page table entries.
      
      The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
      explicitly) do not need to be modified since hardware updates of the
      dirty status are not supported by KVM, so there is no possibility of
      losing such information.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      06485053
  4. 06 5月, 2016 4 次提交
  5. 21 4月, 2016 1 次提交
  6. 16 4月, 2016 2 次提交
    • C
      arm64: Implement ptep_set_access_flags() for hardware AF/DBM · 66dbd6e6
      Catalin Marinas 提交于
      When hardware updates of the access and dirty states are enabled, the
      default ptep_set_access_flags() implementation based on calling
      set_pte_at() directly is potentially racy. This triggers the "racy dirty
      state clearing" warning in set_pte_at() because an existing writable PTE
      is overridden with a clean entry.
      
      There are two main scenarios for this situation:
      
      1. The CPU getting an access fault does not support hardware updates of
         the access/dirty flags. However, a different agent in the system
         (e.g. SMMU) can do this, therefore overriding a writable entry with a
         clean one could potentially lose the automatically updated dirty
         status
      
      2. A more complex situation is possible when all CPUs support hardware
         AF/DBM:
      
         a) Initial state: shareable + writable vma and pte_none(pte)
         b) Read fault taken by two threads of the same process on different
            CPUs
         c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
            eventually reaches do_set_pte() which sets a writable + clean pte.
            CPU0 releases the mmap_sem
         d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
            pte entry it reads is present, writable and clean and it continues
            to pte_mkyoung()
         e) CPU1 calls ptep_set_access_flags()
      
         If between (d) and (e) the hardware (another CPU) updates the dirty
         state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
         marking the entry clean again.
      
      This patch implements an arm64-specific ptep_set_access_flags() function
      to perform an atomic update of the PTE flags.
      
      Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NJulien Grall <julien.grall@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org> # 4.3+
      [will: reworded comment]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      66dbd6e6
    • G
      arm64, mm, numa: Add NUMA balancing support for arm64. · 56166230
      Ganapatrao Kulkarni 提交于
      Enable NUMA balancing for arm64 platforms.
      Add pte, pmd protnone helpers for use by automatic NUMA balancing.
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Reviewed-by: NRobert Richter <rrichter@cavium.com>
      Signed-off-by: NGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      56166230
  7. 14 4月, 2016 3 次提交
  8. 11 3月, 2016 1 次提交
    • C
      arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission · fdc69e7d
      Catalin Marinas 提交于
      The set_pte_at() function must update the hardware PTE_RDONLY bit
      depending on the state of the PTE_WRITE and PTE_DIRTY bits of the given
      entry value. However, it currently only performs this for pte_valid()
      entries, ignoring PTE_PROT_NONE. The side-effect is that PROT_NONE
      mappings would not have the PTE_RDONLY bit set. Without
      CONFIG_ARM64_HW_AFDBM, this is not an issue since such PROT_NONE pages
      are not accessible anyway.
      
      With commit 2f4b829c ("arm64: Add support for hardware updates of
      the access and dirty pte bits"), the ptep_set_wrprotect() function was
      re-written to cope with automatic hardware updates of the dirty state.
      As an optimisation, only PTE_RDONLY is checked to assess the "dirty"
      status. Since set_pte_at() does not set this bit for PROT_NONE mappings,
      such pages may be considered "dirty" as a result of
      ptep_set_wrprotect().
      
      This patch updates the pte_valid() check to pte_present() in
      set_pte_at(). It also adds PTE_PROT_NONE to the swap entry bits comment.
      
      Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Tested-by: NGanapatrao Kulkarni <gkulkarni@cavium.com>
      Cc: <stable@vger.kernel.org>
      fdc69e7d
  9. 09 3月, 2016 1 次提交
  10. 27 2月, 2016 1 次提交
    • A
      arm64: vmemmap: use virtual projection of linear region · dfd55ad8
      Ard Biesheuvel 提交于
      Commit dd006da2 ("arm64: mm: increase VA range of identity map") made
      some changes to the memory mapping code to allow physical memory to reside
      at an offset that exceeds the size of the virtual mapping.
      
      However, since the size of the vmemmap area is proportional to the size of
      the VA area, but it is populated relative to the physical space, we may
      end up with the struct page array being mapped outside of the vmemmap
      region. For instance, on my Seattle A0 box, I can see the following output
      in the dmesg log.
      
         vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
                   0xffffffbfc0000000 - 0xffffffbfd0000000   (   256 MB actual)
      
      We can fix this by deciding that the vmemmap region is not a projection of
      the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
      linear region. This way, we are guaranteed that the vmemmap region is of
      sufficient size, and we can even reduce the size by half.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      dfd55ad8
  11. 26 2月, 2016 2 次提交
    • M
      arm64: Remove fixmap include fragility · 3eca86e7
      Mark Rutland 提交于
      The asm-generic fixmap.h depends on each architecture's fixmap.h to pull
      in the definition of PAGE_KERNEL_RO, if this exists. In the absence of
      this, FIXMAP_PAGE_RO will not be defined. In mm/early_ioremap.c the
      definition of early_memremap_ro is predicated on FIXMAP_PAGE_RO being
      defined.
      
      Currently, the arm64 fixmap.h doesn't include pgtable.h for the
      definition of PAGE_KERNEL_RO, and as a knock-on effect early_memremap_ro
      is not always defined, leading to link-time failures when it is used.
      This has been observed with defconfig on next-20160226.
      
      Unfortunately, as pgtable.h includes fixmap.h, adding the include
      introduces a circular dependency, which is just as fragile.
      
      Instead, this patch factors out PAGE_KERNEL_RO and other prot
      definitions into a new pgtable-prot header which can be included by poth
      pgtable.h and fixmap.h, avoiding the  circular dependency, and ensuring
      that early_memremap_ro is alwyas defined where it is used.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3eca86e7
    • C
      arm64: Fix building error with 16KB pages and 36-bit VA · cac4b8cd
      Catalin Marinas 提交于
      In such configuration, Linux uses only two pages of page tables and
      __pud_populate() should not be used. However, the BUILD_BUG() triggers
      since pud_sect() is still defined and the compiler cannot eliminate such
      code, even though at run-time it should not be triggered. This patch
      extends the #ifdef ARM64_64K_PAGES condition for pud_sect to include
      PGTABLE_LEVELS < 3.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      cac4b8cd
  12. 19 2月, 2016 2 次提交
  13. 16 2月, 2016 4 次提交
  14. 25 1月, 2016 1 次提交
  15. 16 1月, 2016 2 次提交
    • M
      arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP · 05ee26d9
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05ee26d9
    • K
      arm64, thp: remove infrastructure for handling splitting PMDs · b7ed934a
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      
      pmdp_splitting_flush() is not needed too: on splitting PMD we will do
      pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
      needed for fast_gup.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7ed934a
  16. 05 1月, 2016 1 次提交
    • W
      arm64: mm: move pgd_cache initialisation to pgtable_cache_init · 39b5be9b
      Will Deacon 提交于
      Initialising the suppport for EFI runtime services requires us to
      allocate a pgd off the back of an early_initcall. On systems where the
      PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
      pgd_cache isn't initialised at this stage, and we panic with a NULL
      dereference during boot:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
      
        __create_mapping.isra.5+0x84/0x350
        create_pgd_mapping+0x20/0x28
        efi_create_mapping+0x5c/0x6c
        arm_enable_runtime_services+0x154/0x1e4
        do_one_initcall+0x8c/0x190
        kernel_init_freeable+0x84/0x1ec
        kernel_init+0x10/0xe0
        ret_from_fork+0x10/0x50
      
      This patch fixes the problem by initialising the pgd_cache earlier, in
      the pgtable_cache_init callback, which sounds suspiciously like what it
      was intended for.
      Reported-by: NDennis Chen <dennis.chen@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      39b5be9b
  17. 22 12月, 2015 1 次提交
    • D
      arm64: hugetlb: add support for PTE contiguous bit · 66b3923a
      David Woods 提交于
      The arm64 MMU supports a Contiguous bit which is a hint that the TTE
      is one of a set of contiguous entries which can be cached in a single
      TLB entry.  Supporting this bit adds new intermediate huge page sizes.
      
      The set of huge page sizes available depends on the base page size.
      Without using contiguous pages the huge page sizes are as follows.
      
       4KB:   2MB  1GB
      64KB: 512MB
      
      With a 4KB granule, the contiguous bit groups together sets of 16 pages
      and with a 64KB granule it groups sets of 32 pages.  This enables two new
      huge page sizes in each case, so that the full set of available sizes
      is as follows.
      
       4KB:  64KB   2MB  32MB  1GB
      64KB:   2MB 512MB  16GB
      
      If a 16KB granule is used then the contiguous bit groups 128 pages
      at the PTE level and 32 pages at the PMD level.
      
      If the base page size is set to 64KB then 2MB pages are enabled by
      default.  It is possible in the future to make 2MB the default huge
      page size for both 4KB and 64KB granules.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Reviewed-by: NSteve Capper <steve.capper@linaro.org>
      Signed-off-by: NDavid Woods <dwoods@ezchip.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      66b3923a
  18. 11 12月, 2015 1 次提交
  19. 01 12月, 2015 1 次提交
  20. 18 11月, 2015 1 次提交
  21. 09 11月, 2015 1 次提交
    • A
      arm64: fix R/O permissions of FDT mapping · fb226c3d
      Ard Biesheuvel 提交于
      The mapping permissions of the FDT are set to 'PAGE_KERNEL | PTE_RDONLY'
      in an attempt to map the FDT as read-only. However, not only does this
      break at build time under STRICT_MM_TYPECHECKS (since the two terms are
      of different types in that case), it also results in both the PTE_WRITE
      and PTE_RDONLY attributes to be set, which means the region is still
      writable under ARMv8.1 DBM (and an attempted write will simply clear the
      PT_RDONLY bit).
      
      So instead, define PAGE_KERNEL_RO (which already has an established
      meaning across architectures) and use that instead.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      fb226c3d
  22. 16 10月, 2015 1 次提交
  23. 13 10月, 2015 2 次提交
    • Y
      arm64: add kc_offset_to_vaddr and kc_vaddr_to_offset macro · 03875ad5
      yalin wang 提交于
      This patch add kc_offset_to_vaddr() and kc_vaddr_to_offset(),
      the default version doesn't work on arm64, because arm64 kernel address
      is below the PAGE_OFFSET, like module address and vmemmap address are
      all below PAGE_OFFSET address.
      Signed-off-by: Nyalin wang <yalin.wang2010@gmail.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      03875ad5
    • A
      arm64: add KASAN support · 39d114dd
      Andrey Ryabinin 提交于
      This patch adds arch specific code for kernel address sanitizer
      (see Documentation/kasan.txt).
      
      1/8 of kernel addresses reserved for shadow memory. There was no
      big enough hole for this, so virtual addresses for shadow were
      stolen from vmalloc area.
      
      At early boot stage the whole shadow region populated with just
      one physical page (kasan_zero_page). Later, this page reused
      as readonly zero shadow for some memory that KASan currently
      don't track (vmalloc).
      After mapping the physical memory, pages for shadow memory are
      allocated and mapped.
      
      Functions like memset/memmove/memcpy do a lot of memory accesses.
      If bad pointer passed to one of these function it is important
      to catch this. Compiler's instrumentation cannot do this since
      these functions are written in assembly.
      KASan replaces memory functions with manually instrumented variants.
      Original functions declared as weak symbols so strong definitions
      in mm/kasan/kasan.c could replace them. Original functions have aliases
      with '__' prefix in name, so we could call non-instrumented variant
      if needed.
      Some files built without kasan instrumentation (e.g. mm/slub.c).
      Original mem* function replaced (via #define) with prefixed variants
      to disable memory access checks for such files.
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Tested-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      39d114dd
  24. 09 10月, 2015 2 次提交
  25. 07 10月, 2015 2 次提交