提交 · 73e86cb03cf2ec0aa3789dc8621c6d53619cac5e · openeuler / Kernel

21 8月, 2017 2 次提交

arm64: Move PTE_RDONLY bit handling out of set_pte_at() · 73e86cb0

由 Catalin Marinas 提交于 7月 04, 2017

Currently PTE_RDONLY is treated as a hardware only bit and not handled
by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions.
The set_pte_at() function is responsible for setting this bit based on
the write permission or dirty state. This patch moves the PTE_RDONLY
handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect()
functions. The PAGE_* definitions to need to be updated to explicitly
include PTE_RDONLY when !PTE_WRITE.

The patch also removes the redundant PAGE_COPY(_EXEC) definitions as
they are identical to the corresponding PAGE_READONLY(_EXEC).
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

73e86cb0

arm64: Convert pte handling from inline asm to using (cmp)xchg · 3bbf7157

由 Catalin Marinas 提交于 6月 26, 2017

With the support for hardware updates of the access and dirty states,
the following pte handling functions had to be implemented using
exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(),
ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of
the LSE atomic instructions and also make the code cleaner, convert
these pte functions to use the more generic cmpxchg()/xchg().
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

3bbf7157

12 6月, 2017 1 次提交

arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries · f02ab08a

由 Punit Agrawal 提交于 6月 08, 2017

When memory failure is enabled, a poisoned hugepage pte is marked as a
swap entry. huge_pte_offset() does not return the poisoned page table
entries when it encounters PUD/PMD hugepages.

This behaviour of huge_pte_offset() leads to error such as below when
munmap is called on poisoned hugepages.

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

Fix huge_pte_offset() to return the poisoned pte which is then
appropriately handled by the generic layer code.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Tested-by: NManoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

f02ab08a

23 3月, 2017 1 次提交

arm64: mm: set the contiguous bit for kernel mappings where appropriate · d27cfa1f

由 Ard Biesheuvel 提交于 3月 09, 2017

This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

          granule size |  cont PTE  |  cont PMD  |
          -------------+------------+------------+
               4 KB    |    64 KB   |   32 MB    |
              16 KB    |     2 MB   |    1 GB*   |
              64 KB    |     2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d27cfa1f

01 2月, 2017 1 次提交

arm64: Improve detection of user/non-user mappings in set_pte(_at) · ec663d96

由 Catalin Marinas 提交于 1月 27, 2017

Commit cab15ce6 ("arm64: Introduce execute-only page access
permissions") allowed a valid user PTE to have the PTE_USER bit clear.
As a consequence, the pte_valid_not_user() macro in set_pte() was
replaced with pte_valid_global() under the assumption that only user
pages have the nG bit set. EFI mappings, however, also have the nG bit
set and set_pte() wrongly ignores issuing the DSB+ISB.

This patch reinstates the pte_valid_not_user() macro and adds the
PTE_UXN bit check since all kernel mappings have this bit set. For
clarity, pte_exec() is renamed to pte_user_exec() as it only checks for
the absence of PTE_UXN. Consequently, the user executable check in
set_pte_at() drops the pte_ng() test since pte_user_exec() is
sufficient.

Fixes: cab15ce6 ("arm64: Introduce execute-only page access permissions")
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ec663d96

12 1月, 2017 1 次提交

arm64: Use __pa_symbol for kernel symbols · 2077be67

由 Laura Abbott 提交于 1月 10, 2017

__pa_symbol is technically the marcro that should be used for kernel
symbols. Switch to this as a pre-requisite for DEBUG_VIRTUAL which
will do bounds checking.
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2077be67

26 8月, 2016 2 次提交

arm64: hibernate: Support DEBUG_PAGEALLOC · 5ebe3a44

由 James Morse 提交于 8月 24, 2016

DEBUG_PAGEALLOC removes the valid bit of page table entries to prevent
any access to unallocated memory. Hibernate uses this as a hint that those
pages don't need to be saved/restored. This patch adds the
kernel_page_present() function it uses.

hibernate.c copies the resume kernel's linear map for use during restore.
Add _copy_pte() to fill-in the holes made by DEBUG_PAGEALLOC in the resume
kernel, so we can restore data the original kernel had at these addresses.

Finally, DEBUG_PAGEALLOC means the linear-map alias of KERNEL_START to
KERNEL_END may have holes in it, so we can't lazily clean this whole
area to the PoC. Only clean the new mmuoff region, and the kernel/kvm
idmaps.

This reverts commit da24eb1f.
Reported-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5ebe3a44

arm64: Introduce execute-only page access permissions · cab15ce6

由 Catalin Marinas 提交于 8月 11, 2016

The ARMv8 architecture allows execute-only user permissions by clearing
the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU
implementation without User Access Override (ARMv8.2 onwards) can still
access such page, so execute-only page permission does not protect
against read(2)/write(2) etc. accesses. Systems requiring such
protection must enable features like SECCOMP.

This patch changes the arm64 __P100 and __S100 protection_map[] macros
to the new __PAGE_EXECONLY attributes. A side effect is that
pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't
set. To work around this, the check is done on the PTE_NG bit via the
pte_ng() macro. VM_READ is also checked now for page faults.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

cab15ce6

04 8月, 2016 1 次提交

arm64: Fix copy-on-write referencing in HugeTLB · 747a70e6

由 Steve Capper 提交于 8月 03, 2016

set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
writing the entry to the table.

This can cause problems with the copy-on-write logic in hugetlb_cow:
 *) hugetlb_cow(.) called to handle a write fault on read only pte,
 *) Before the copy-on-write updates the new page table a call is
    made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
 *) Because set_pte_at(.) changed the pte, *ptep != pte, and the
    hugetlb_cow(.) code erroneously assumes that it lost the race,
 *) The new page is subsequently freed without being used.

On arm64 this problem only becomes apparent when we apply:
67961f9d mm/hugetlb: fix huge page reserve accounting for private
mappings

When one runs the libhugetlbfs test suite, there are allocation errors
and hugetlbfs pages become erroneously locked in memory as reserved.
(There is a high HugePages_Rsvd: count).

In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
allowing for the libhugetlbfs test suite to pass as expected and
without leaking any reserved HugeTLB pages.
Reported-by: NHuang Shijie <shijie.huang@arm.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

747a70e6

20 5月, 2016 1 次提交

arch: fix has_transparent_hugepage() · fd8cfd30

由 Hugh Dickins 提交于 5月 19, 2016

I've just discovered that the useful-sounding has_transparent_hugepage()
is actually an architecture-dependent minefield: on some arches it only
builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
not, but on some of those (arm and arm64) it then gives the wrong
answer; and on mips alone it's marked __init, which would crash if
called later (but so far it has not been called later).

Straighten this out: make it available to all configs, with a sensible
default in asm-generic/pgtable.h, removing its definitions from those
arches (arc, arm, arm64, sparc, tile) which are served by the default,
adding #define has_transparent_hugepage has_transparent_hugepage to
those (mips, powerpc, s390, x86) which need to override the default at
runtime, and removing the __init from mips (but maybe that kind of code
should be avoided after init: set a static variable the first time it's
called).
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Ning Qu <quning@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd8cfd30

10 5月, 2016 1 次提交

kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables · 06485053

由 Catalin Marinas 提交于 4月 13, 2016

The ARMv8.1 architecture extensions introduce support for hardware
updates of the access and dirty information in page table entries. With
VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
PTE_AF bit cleared in the stage 2 page table, instead of raising an
Access Flag fault to EL2 the CPU sets the actual page table entry bit
(10). To ensure that kernel modifications to the page table do not
inadvertently revert a bit set by hardware updates, certain Stage 2
software pte/pmd operations must be performed atomically.

The main user of the AF bit is the kvm_age_hva() mechanism. The
kvm_age_hva_handler() function performs a "test and clear young" action
on the pte/pmd. This needs to be atomic in respect of automatic hardware
updates of the AF bit. Since the AF bit is in the same position for both
Stage 1 and Stage 2, the patch reuses the existing
ptep_test_and_clear_young() functionality if
__HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
existing pte_young/pte_mkold mechanism is preserved.

The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
to perform atomic modifications in order to avoid a race with updates of
the AF bit. The arm64 implementation has been re-written using
exclusives.

Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
argument and modify the pte/pmd in place. However, these functions are
only used on local variables rather than actual page table entries, so
it makes more sense to follow the pte_mkwrite() approach for stage 1
attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
functions do not modify the actual page table entries.

The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
explicitly) do not need to be modified since hardware updates of the
dirty status are not supported by KVM, so there is no possibility of
losing such information.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

06485053

06 5月, 2016 4 次提交

arm64: Ensure pmd_present() returns false after pmd_mknotpresent() · 5bb1cc0f

由 Catalin Marinas 提交于 5月 05, 2016

Currently, pmd_present() only checks for a non-zero value, returning
true even after pmd_mknotpresent() (which only clears the type bits).
This patch converts pmd_present() to using pte_present(), similar to the
other pmd_*() checks. As a side effect, it will return true for
PROT_NONE mappings, though they are not yet used by the kernel with
transparent huge pages.

For consistency, also change pmd_mknotpresent() to only clear the
PMD_SECT_VALID bit, even though the PMD_TABLE_BIT is already 0 for block
mappings (no functional change). The unused PMD_SECT_PROT_NONE
definition is removed as transparent huge pages use the pte page prot
values.

Fixes: 9c7e535f ("arm64: mm: Route pmd thp functions through pte equivalents")
Cc: <stable@vger.kernel.org> # 3.15+
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5bb1cc0f

arm64: Replace hard-coded values in the pmd/pud_bad() macros · ab4db1f2

由 Catalin Marinas 提交于 5月 05, 2016

This patch replaces the hard-coded value 2 with PMD_TABLE_BIT in the
pmd/pud_bad() macros. Note that using these macros on pmd_trans_huge()
entries is giving incorrect results
(pmd_none_or_trans_huge_or_clear_bad() correctly checks for
pmd_trans_huge before pmd_bad).

Additionally, white-space clean-up for pmd_mkclean().
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ab4db1f2

arm64: Implement pmdp_set_access_flags() for hardware AF/DBM · 282aa705

由 Catalin Marinas 提交于 5月 05, 2016

The update to the accessed or dirty states for block mappings must be
done atomically on hardware with support for automatic AF/DBM. The
ptep_set_access_flags() function has been fixed as part of commit
66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware
AF/DBM"). This patch brings pmdp_set_access_flags() in line with the pte
counterpart.

Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: <stable@vger.kernel.org> # 4.4.x: 66dbd6e6: arm64: Implement ptep_set_access_flags() for hardware AF/DBM
Cc: <stable@vger.kernel.org> # 4.3+
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

282aa705

arm64: Fix typo in the pmdp_huge_get_and_clear() definition · 911f56ee

由 Catalin Marinas 提交于 5月 05, 2016

With hardware AF/DBM support, pmd modifications (transparent huge pages)
should be performed atomically using load/store exclusive. The initial
patches defined the get-and-clear function and __HAVE_ARCH_* macro
without the "huge" word, leaving the pmdp_huge_get_and_clear() to the
default, non-atomic implementation.

Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: <stable@vger.kernel.org> # 4.3+
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

911f56ee

21 4月, 2016 1 次提交

arm64: Introduce pmd_thp_or_huge · 0dbd3b18

由 Suzuki K Poulose 提交于 3月 15, 2016

Add a helper to determine if a given pmd represents a huge page
either by hugetlb or thp, as we have for arm. This will be used
by KVM MMU code.
Suggested-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>

0dbd3b18

16 4月, 2016 2 次提交

arm64: Implement ptep_set_access_flags() for hardware AF/DBM · 66dbd6e6

由 Catalin Marinas 提交于 4月 13, 2016

When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.

There are two main scenarios for this situation:

1. The CPU getting an access fault does not support hardware updates of
   the access/dirty flags. However, a different agent in the system
   (e.g. SMMU) can do this, therefore overriding a writable entry with a
   clean one could potentially lose the automatically updated dirty
   status

2. A more complex situation is possible when all CPUs support hardware
   AF/DBM:

   a) Initial state: shareable + writable vma and pte_none(pte)
   b) Read fault taken by two threads of the same process on different
      CPUs
   c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
      eventually reaches do_set_pte() which sets a writable + clean pte.
      CPU0 releases the mmap_sem
   d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
      pte entry it reads is present, writable and clean and it continues
      to pte_mkyoung()
   e) CPU1 calls ptep_set_access_flags()

   If between (d) and (e) the hardware (another CPU) updates the dirty
   state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
   marking the entry clean again.

This patch implements an arm64-specific ptep_set_access_flags() function
to perform an atomic update of the PTE flags.

Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NJulien Grall <julien.grall@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.3+
[will: reworded comment]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

66dbd6e6

arm64, mm, numa: Add NUMA balancing support for arm64. · 56166230

由 Ganapatrao Kulkarni 提交于 4月 08, 2016

Enable NUMA balancing for arm64 platforms.
Add pte, pmd protnone helpers for use by automatic NUMA balancing.
Reviewed-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NRobert Richter <rrichter@cavium.com>
Signed-off-by: NGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

56166230

14 4月, 2016 3 次提交

arm64: mm: move vmemmap region right below the linear region · 3e1907d5

由 Ard Biesheuvel 提交于 3月 30, 2016

This moves the vmemmap region right below PAGE_OFFSET, aka the start
of the linear region, and redefines its size to be a power of two.
Due to the placement of PAGE_OFFSET in the middle of the address space,
whose size is a power of two as well, this guarantees that virt to
page conversions and vice versa can be implemented efficiently, by
masking and shifting rather than ordinary arithmetic.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3e1907d5

arm64: mm: avoid virt_to_page() translation for the zero page · 22b6f3b0

由 Ard Biesheuvel 提交于 3月 30, 2016

The zero page is statically allocated, so grab its struct page pointer
without using virt_to_page(), which will be restricted to the linear
mapping later.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

22b6f3b0

Revert "arm64: account for sparsemem section alignment when choosing vmemmap offset" · 3bab79ed

由 Ard Biesheuvel 提交于 3月 30, 2016

This reverts commit 36e5cd6b, since the
section alignment is now guaranteed by construction when choosing the
value of memstart_addr.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3bab79ed

11 3月, 2016 1 次提交

arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission · fdc69e7d

由 Catalin Marinas 提交于 3月 09, 2016

The set_pte_at() function must update the hardware PTE_RDONLY bit
depending on the state of the PTE_WRITE and PTE_DIRTY bits of the given
entry value. However, it currently only performs this for pte_valid()
entries, ignoring PTE_PROT_NONE. The side-effect is that PROT_NONE
mappings would not have the PTE_RDONLY bit set. Without
CONFIG_ARM64_HW_AFDBM, this is not an issue since such PROT_NONE pages
are not accessible anyway.

With commit 2f4b829c ("arm64: Add support for hardware updates of
the access and dirty pte bits"), the ptep_set_wrprotect() function was
re-written to cope with automatic hardware updates of the dirty state.
As an optimisation, only PTE_RDONLY is checked to assess the "dirty"
status. Since set_pte_at() does not set this bit for PROT_NONE mappings,
such pages may be considered "dirty" as a result of
ptep_set_wrprotect().

This patch updates the pte_valid() check to pte_present() in
set_pte_at(). It also adds PTE_PROT_NONE to the swap entry bits comment.

Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Tested-by: NGanapatrao Kulkarni <gkulkarni@cavium.com>
Cc: <stable@vger.kernel.org>

fdc69e7d

09 3月, 2016 1 次提交

arm64: account for sparsemem section alignment when choosing vmemmap offset · 36e5cd6b

由 Ard Biesheuvel 提交于 3月 08, 2016

Commit dfd55ad8 ("arm64: vmemmap: use virtual projection of linear
region") fixed an issue where the struct page array would overflow into the
adjacent virtual memory region if system RAM was placed so high up in
physical memory that its addresses were not representable in the build time
configured virtual address size.

However, the fix failed to take into account that the vmemmap region needs
to be relatively aligned with respect to the sparsemem section size, so that
a sequence of page structs corresponding with a sparsemem section in the
linear region appears naturally aligned in the vmemmap region.

So round up vmemmap to sparsemem section size. Since this essentially moves
the projection of the linear region up in memory, also revert the reduction
of the size of the vmemmap region.

Cc: <stable@vger.kernel.org>
Fixes: dfd55ad8 ("arm64: vmemmap: use virtual projection of linear region")
Tested-by: NMark Langsdorf <mlangsdo@redhat.com>
Tested-by: NDavid Daney <david.daney@cavium.com>
Tested-by: NRobert Richter <rrichter@cavium.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

36e5cd6b

27 2月, 2016 1 次提交

arm64: vmemmap: use virtual projection of linear region · dfd55ad8

由 Ard Biesheuvel 提交于 2月 26, 2016

Commit dd006da2 ("arm64: mm: increase VA range of identity map") made
some changes to the memory mapping code to allow physical memory to reside
at an offset that exceeds the size of the virtual mapping.

However, since the size of the vmemmap area is proportional to the size of
the VA area, but it is populated relative to the physical space, we may
end up with the struct page array being mapped outside of the vmemmap
region. For instance, on my Seattle A0 box, I can see the following output
in the dmesg log.

vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum)
0xffffffbfc0000000 - 0xffffffbfd0000000 ( 256 MB actual)

We can fix this by deciding that the vmemmap region is not a projection of
the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
linear region. This way, we are guaranteed that the vmemmap region is of
sufficient size, and we can even reduce the size by half.

Cc: <stable@vger.kernel.org>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

dfd55ad8

26 2月, 2016 2 次提交

arm64: Remove fixmap include fragility · 3eca86e7

由 Mark Rutland 提交于 2月 26, 2016

The asm-generic fixmap.h depends on each architecture's fixmap.h to pull
in the definition of PAGE_KERNEL_RO, if this exists. In the absence of
this, FIXMAP_PAGE_RO will not be defined. In mm/early_ioremap.c the
definition of early_memremap_ro is predicated on FIXMAP_PAGE_RO being
defined.

Currently, the arm64 fixmap.h doesn't include pgtable.h for the
definition of PAGE_KERNEL_RO, and as a knock-on effect early_memremap_ro
is not always defined, leading to link-time failures when it is used.
This has been observed with defconfig on next-20160226.

Unfortunately, as pgtable.h includes fixmap.h, adding the include
introduces a circular dependency, which is just as fragile.

Instead, this patch factors out PAGE_KERNEL_RO and other prot
definitions into a new pgtable-prot header which can be included by poth
pgtable.h and fixmap.h, avoiding the  circular dependency, and ensuring
that early_memremap_ro is alwyas defined where it is used.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

3eca86e7

arm64: Fix building error with 16KB pages and 36-bit VA · cac4b8cd

由 Catalin Marinas 提交于 2月 25, 2016

In such configuration, Linux uses only two pages of page tables and
__pud_populate() should not be used. However, the BUILD_BUG() triggers
since pud_sect() is still defined and the compiler cannot eliminate such
code, even though at run-time it should not be triggered. This patch
extends the #ifdef ARM64_64K_PAGES condition for pud_sect to include
PGTABLE_LEVELS < 3.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

cac4b8cd

19 2月, 2016 2 次提交

arm64: move kernel image to base of vmalloc area · f9040773

由 Ard Biesheuvel 提交于 2月 16, 2016

This moves the module area to right before the vmalloc area, and moves
the kernel image to the base of the vmalloc area. This is an intermediate
step towards implementing KASLR, which allows the kernel image to be
located anywhere in the vmalloc area.

Since other subsystems such as hibernate may still need to refer to the
kernel text or data segments via their linears addresses, both are mapped
in the linear region as well. The linear alias of the text region is
mapped read-only/non-executable to prevent inadvertent modification or
execution.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

f9040773

arm64: pgtable: implement static [pte|pmd|pud]_offset variants · 6533945a

由 Ard Biesheuvel 提交于 2月 16, 2016

The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pte_offset_kimg(),
pmd_offset_kimg() and pud_offset_kimg(), which can be used instead
before any page tables have been allocated dynamically.
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6533945a

16 2月, 2016 4 次提交

arm64: mm: add functions to walk tables in fixmap · 961faac1

由 Mark Rutland 提交于 1月 25, 2016

As a preparatory step to allow us to allocate early page tables from
unmapped memory using memblock_alloc, add new p??_{set,clear}_fixmap*
functions which can be used to walk page tables outside of the linear
mapping by using fixmap slots.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

961faac1

arm64: mm: add functions to walk page tables by PA · dca56dca

由 Mark Rutland 提交于 1月 25, 2016

To allow us to walk tables allocated into the fixmap, we need to acquire
the physical address of a page, rather than the virtual address in the
linear map.

This patch adds new p??_page_paddr and p??_offset_phys functions to
acquire the physical address of a next-level table, and changes
p??_offset* into macros which simply convert this to a linear map VA.
This renders p??_page_vaddr unused, and hence they are removed.

At the pgd level, a new pgd_offset_raw function is added to find the
relevant PGD entry given the base of a PGD and a virtual address.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

dca56dca

arm64: mm: move pte_* macros · 053520f7

由 Mark Rutland 提交于 1月 25, 2016

For pmd, pud, and pgd levels of table, functions including p?d_index and
p?d_offset are defined after the p?d_page_vaddr function for the
immediately higher level of table.

The pte functions however are defined much earlier, even though several
rely on the later definition of pmd_page_vaddr. While this isn't
currently a problem as these are macros, it prevents the logical
grouping of later C functions (which cannot rely on prototypes for
functions not yet defined).

Move these definitions after pmd_page_vaddr, for consistency with the
placement of these functions for other levels of table.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

053520f7

arm64: mm: place empty_zero_page in bss · 5227cfa7

由 Mark Rutland 提交于 1月 25, 2016

Currently the zero page is set up in paging_init, and thus we cannot use
the zero page earlier. We use the zero page as a reserved TTBR value
from which no TLB entries may be allocated (e.g. when uninstalling the
idmap). To enable such usage earlier (as may be required for invasive
changes to the kernel page tables), and to minimise the time that the
idmap is active, we need to be able to use the zero page before
paging_init.

This patch follows the example set by x86, by allocating the zero page
at compile time, in .bss. This means that the zero page itself is
available immediately upon entry to start_kernel (as we zero .bss before
this), and also means that the zero page takes up no space in the raw
Image binary. The associated struct page is allocated in bootmem_init,
and remains unavailable until this time.

Outside of arch code, the only users of empty_zero_page assume that the
empty_zero_page symbol refers to the zeroed memory itself, and that
ZERO_PAGE(x) must be used to acquire the associated struct page,
following the example of x86. This patch also brings arm64 inline with
these assumptions.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

5227cfa7

25 1月, 2016 1 次提交

arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings · ac15bd63

由 Catalin Marinas 提交于 1月 07, 2016

Currently, set_pte_at() only checks the software PTE_WRITE bit for user
mappings when it sets or clears the hardware PTE_RDONLY accordingly. The
kernel ptes are written directly without any modification, relying
solely on the protection bits in macros like PAGE_KERNEL. However,
modifying kernel pte attributes via pte_wrprotect() would be ignored by
set_pte_at(). Since pte_wrprotect() does not set PTE_RDONLY (it only
clears PTE_WRITE), the new permission is not taken into account.

This patch changes set_pte_at() to adjust the read-only permission for
kernel ptes as well. As a side effect, existing PROT_* definitions used
for kernel ioremap*() need to include PTE_DIRTY | PTE_WRITE.

(additionally, white space fix for PTE_KERNEL_ROX)
Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ac15bd63

16 1月, 2016 2 次提交

arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP · 05ee26d9

由 Minchan Kim 提交于 1月 15, 2016

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_mkclean for THP page MADV_FREE support.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05ee26d9

arm64, thp: remove infrastructure for handling splitting PMDs · b7ed934a

由 Kirill A. Shutemov 提交于 1月 15, 2016

With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7ed934a

05 1月, 2016 1 次提交

arm64: mm: move pgd_cache initialisation to pgtable_cache_init · 39b5be9b

由 Will Deacon 提交于 1月 05, 2016

Initialising the suppport for EFI runtime services requires us to
allocate a pgd off the back of an early_initcall. On systems where the
PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
pgd_cache isn't initialised at this stage, and we panic with a NULL
dereference during boot:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000

  __create_mapping.isra.5+0x84/0x350
  create_pgd_mapping+0x20/0x28
  efi_create_mapping+0x5c/0x6c
  arm_enable_runtime_services+0x154/0x1e4
  do_one_initcall+0x8c/0x190
  kernel_init_freeable+0x84/0x1ec
  kernel_init+0x10/0xe0
  ret_from_fork+0x10/0x50

This patch fixes the problem by initialising the pgd_cache earlier, in
the pgtable_cache_init callback, which sounds suspiciously like what it
was intended for.
Reported-by: NDennis Chen <dennis.chen@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

39b5be9b

22 12月, 2015 1 次提交

arm64: hugetlb: add support for PTE contiguous bit · 66b3923a

由 David Woods 提交于 12月 17, 2015

The arm64 MMU supports a Contiguous bit which is a hint that the TTE
is one of a set of contiguous entries which can be cached in a single
TLB entry.  Supporting this bit adds new intermediate huge page sizes.

The set of huge page sizes available depends on the base page size.
Without using contiguous pages the huge page sizes are as follows.

 4KB:   2MB  1GB
64KB: 512MB

With a 4KB granule, the contiguous bit groups together sets of 16 pages
and with a 64KB granule it groups sets of 32 pages.  This enables two new
huge page sizes in each case, so that the full set of available sizes
is as follows.

 4KB:  64KB   2MB  32MB  1GB
64KB:   2MB 512MB  16GB

If a 16KB granule is used then the contiguous bit groups 128 pages
at the PTE level and 32 pages at the PMD level.

If the base page size is set to 64KB then 2MB pages are enabled by
default.  It is possible in the future to make 2MB the default huge
page size for both 4KB and 64KB granules.
Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: NSteve Capper <steve.capper@linaro.org>
Signed-off-by: NDavid Woods <dwoods@ezchip.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

66b3923a

11 12月, 2015 1 次提交

arm64: Improve error reporting on set_pte_at() checks · 82d34008

由 Catalin Marinas 提交于 12月 08, 2015

Currently the BUG_ON() checks do not give enough information about the
PTEs being set. This patch changes BUG_ON to WARN_ONCE and dumps the
values of the old and new PTEs. In addition, the checks are only made if
the new PTE entry is valid.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NMing Lei <tom.leiming@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>

82d34008

01 12月, 2015 1 次提交

arm64: pgtable: implement pte_accessible() · 76c714be

由 Will Deacon 提交于 10月 30, 2015

This patch implements the pte_accessible() macro, which can be used to
test whether or not a given pte is a candidate for allocation in the
TLB.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

76c714be

18 11月, 2015 1 次提交

arm64: Fix R/O permissions in mark_rodata_ro · 0b2aa5b8

由 Laura Abbott 提交于 11月 12, 2015

The permissions in mark_rodata_ro trigger a build error
with STRICT_MM_TYPECHECKS. Fix this by introducing
PAGE_KERNEL_ROX for the same reasons as PAGE_KERNEL_RO.
From Ard:

"PAGE_KERNEL_EXEC has PTE_WRITE set as well, making the range
writeable under the ARMv8.1 DBM feature, that manages the
dirty bit in hardware (writing to a page with the PTE_RDONLY
and PTE_WRITE bits both set will clear the PTE_RDONLY bit in that case)"
Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

0b2aa5b8

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功