提交 · 9339fd348dd98ed64879a48dc7290cc3fa4e2911 · openeuler / Kernel

29 9月, 2017 1 次提交

arm64: fault: Route pte translation faults via do_translation_fault · 760bfb47

由 Will Deacon 提交于 9月 29, 2017

We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).

In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:

  BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
  in_atomic(): 0, irqs_disabled(): 0, pid: 10290
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  [...]
  [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
  [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
  [<ffffff8e016977f0>] do_page_fault+0x140/0x330
  [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
  Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
  [...]
  [<ffffff8e016844fc>] el1_da+0x18/0x78
  [<ffffff8e017f399c>] path_parentat+0x44/0x88
  [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
  [<ffffff8e017f5044>] filename_create+0x4c/0x128
  [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
  [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
  Code: 36380080 d5384100 f9400800 9402566d (d4210000)
  ---[ end trace 2d01889f2bca9b9f ]---

Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.

Cc: <stable@vger.kernel.org>
Reported-by: NAnkit Jain <ankijain@codeaurora.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

760bfb47

23 8月, 2017 5 次提交

arm64: mm: abort uaccess retries upon fatal signal · 289d07a2

由 Mark Rutland 提交于 7月 11, 2017

When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: NSteve Capper <steve.capper@arm.com>
Tested-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Tested-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

289d07a2

arm64: hugetlb: Cleanup setup_hugepagesz · 828f193d

由 Steve Capper 提交于 8月 22, 2017

Replace a lot of if statements with switch and case labels to make it
much clearer which huge page sizes are supported.

Also, we prevent PUD_SIZE from being used on systems not running with
4KB PAGE_SIZE. Before if one supplied PUD_SIZE in these circumstances,
then unusuable huge page sizes would be in use.

Fixes: 084bd298 ("ARM64: mm: HugeTLB support.")
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

828f193d

arm64: Re-enable support for contiguous hugepages · 5cd028b9

由 Punit Agrawal 提交于 8月 22, 2017

also known as -

Revert "Revert "Revert "commit 66b3923a ("arm64: hugetlb: add
support for PTE contiguous bit")"""

Now that our hugetlb implementation is compliant with the
break-before-make requirements of the architecture and we have addressed
some of the issues in core code required for properly dealing with
hardware poisoning of contiguous hugepages let's re-enable support for
contiguous hugepages.

This reverts commit 6ae979ab.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

5cd028b9

arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages · a8d623ee

由 Punit Agrawal 提交于 8月 22, 2017

The default implementation of set_huge_swap_pte_at() does not support
hugepages consisting of contiguous ptes. Override it to add support for
contiguous hugepages.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

a8d623ee

arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages · c3e4ed5c

由 Punit Agrawal 提交于 8月 22, 2017

The default huge_pte_clear() implementation does not clear contiguous
page table entries when it encounters contiguous hugepages that are
supported on arm64.

Fix this by overriding the default implementation to clear all the
entries associated with contiguous hugepages.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c3e4ed5c

22 8月, 2017 5 次提交

arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages · 30f3ac00

由 Punit Agrawal 提交于 8月 22, 2017

huge_pte_offset() was updated to correctly handle swap entries for
hugepages. With the addition of the size parameter, it is now possible
to disambiguate whether the request is for a regular hugepage or a
contiguous hugepage.

Fix huge_pte_offset() for contiguous hugepages by using the size to find
the correct page table entry.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

30f3ac00

arm64: hugetlb: Add break-before-make logic for contiguous entries · d8bdcff2

由 Steve Capper 提交于 8月 22, 2017

It has become apparent that one has to take special care when modifying
attributes of memory mappings that employ the contiguous bit.

Both the requirement and the architecturally correct "Break-Before-Make"
technique of updating contiguous entries can be found described in:
ARM DDI 0487A.k_iss10775, "Misprogramming of the Contiguous bit",
page D4-1762.

The huge pte accessors currently replace the attributes of contiguous
pte entries in place thus can, on certain platforms, lead to TLB
conflict aborts or even erroneous results returned from TLB lookups.

This patch adds two helper functions -

* get_clear_flush(.) - clears a contiguous entry and returns the head
  pte (whilst taking care to retain dirty bit information that could
  have been modified by DBM).

* clear_flush(.) that clears a contiguous entry

A tlb invalidate is performed to then ensure that there is no
possibility of multiple tlb entries being present for the same region.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
(Added helper clear_flush(), updated commit log, and some cleanup)
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
[catalin.marinas@arm.com: remove CONFIG_ARM64_HW_AFDBM check]
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d8bdcff2

arm64: hugetlb: Spring clean huge pte accessors · 29a7287d

由 Steve Capper 提交于 8月 22, 2017

This patch aims to re-structure the huge pte accessors without affecting
their functionality. Control flow is changed to reduce indentation and
expanded use is made of post for loop variable modification.

It is then much easier to add break-before-make semantics in a subsequent
patch.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

29a7287d

arm64: hugetlb: Introduce pte_pgprot helper · b5b0be86

由 Steve Capper 提交于 8月 22, 2017

Rather than xor pte bits in various places, use this helper function.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b5b0be86

arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present · d3ea7952

由 Steve Capper 提交于 8月 22, 2017

This patch adds a WARN_ON to set_huge_pte_at as the accessor assumes
that entries to be written down are all present. (There are separate
accessors to clear huge ptes).

We will need to handle the !pte_present case where memory offlining
is used on hugetlb pages. swap and migration entries will be supplied
to set_huge_pte_at in this case.

Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d3ea7952

21 8月, 2017 5 次提交

arm64: dma-mapping: Mark atomic_pool as __ro_after_init · 8165f706

由 Vladimir Murzin 提交于 8月 14, 2017

atomic_pool is setup once while init stage and never changed after
that, so it is good candidate for __ro_after_init
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

8165f706

arm64: dma-mapping: Do not pass data to gen_pool_set_algo() · 2fa59ec8

由 Vladimir Murzin 提交于 8月 14, 2017

gen_pool_first_fit_order_align() does not make use of additional data,
so pass plain NULL there.
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2fa59ec8

arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths · af29678f

由 Catalin Marinas 提交于 7月 06, 2017

Since the pte handling for hardware AF/DBM works even when the hardware
feature is not present, make the pte accessors implementation permanent
and remove the corresponding #ifdefs. The Kconfig option is kept as it
can still be used to disable the feature at the hardware level.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

af29678f

arm64: Move PTE_RDONLY bit handling out of set_pte_at() · 73e86cb0

由 Catalin Marinas 提交于 7月 04, 2017

Currently PTE_RDONLY is treated as a hardware only bit and not handled
by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions.
The set_pte_at() function is responsible for setting this bit based on
the write permission or dirty state. This patch moves the PTE_RDONLY
handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect()
functions. The PAGE_* definitions to need to be updated to explicitly
include PTE_RDONLY when !PTE_WRITE.

The patch also removes the redundant PAGE_COPY(_EXEC) definitions as
they are identical to the corresponding PAGE_READONLY(_EXEC).
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

73e86cb0

arm64: Convert pte handling from inline asm to using (cmp)xchg · 3bbf7157

由 Catalin Marinas 提交于 6月 26, 2017

With the support for hardware updates of the access and dirty states,
the following pte handling functions had to be implemented using
exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(),
ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of
the LSE atomic instructions and also make the code cleaner, convert
these pte functions to use the more generic cmpxchg()/xchg().
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

3bbf7157

11 8月, 2017 1 次提交

arm64: fix pmem interface definition · caf5ef7d

由 Arnd Bergmann 提交于 8月 10, 2017

Defining the two functions as 'static inline' and exporting them
leads to the interesting case where we can use the interface
from loadable modules, but not from built-in drivers, as shown
in this link failure:

vers/nvdimm/claim.o: In function `nsio_rw_bytes':
claim.c:(.text+0x1b8): undefined reference to `arch_invalidate_pmem'
drivers/nvdimm/pmem.o: In function `pmem_dax_flush':
pmem.c:(.text+0x11c): undefined reference to `arch_wb_cache_pmem'
drivers/nvdimm/pmem.o: In function `pmem_make_request':
pmem.c:(.text+0x5a4): undefined reference to `arch_invalidate_pmem'
pmem.c:(.text+0x650): undefined reference to `arch_invalidate_pmem'
pmem.c:(.text+0x6d4): undefined reference to `arch_invalidate_pmem'

This removes the bogus 'static inline'.

Fixes: d50e071f ("arm64: Implement pmem API support")
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

caf5ef7d

09 8月, 2017 2 次提交

arm64: Implement pmem API support · d50e071f

由 Robin Murphy 提交于 7月 25, 2017

Add a clean-to-point-of-persistence cache maintenance helper, and wire
up the basic architectural support for the pmem driver based on it.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
[catalin.marinas@arm.com: move arch_*_pmem() functions to arch/arm64/mm/flush.c]
[catalin.marinas@arm.com: change dmb(sy) to dmb(osh)]
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d50e071f

arm64: Convert __inval_cache_range() to area-based · d46befef

由 Robin Murphy 提交于 7月 25, 2017

__inval_cache_range() is already the odd one out among our data cache
maintenance routines as the only remaining range-based one; as we're
going to want an invalidation routine to call from C code for the pmem
API, let's tweak the prototype and name to bring it in line with the
clean operations, and to make its relationship with __dma_inv_area()
neatly mirror that of __clean_dcache_area_poc() and __dma_clean_area().
The loop clearing the early page tables gets mildly massaged in the
process for the sake of consistency.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d46befef

07 8月, 2017 1 次提交

arm64: Decode information from ESR upon mem faults · 1f9b8936

由 Julien Thierry 提交于 8月 04, 2017

When receiving unhandled faults from the CPU, description is very sparse.
Adding information about faults decoded from ESR.

Added defines to esr.h corresponding ESR fields. Values are based on ARM
Archtecture Reference Manual (DDI 0487B.a), section D7.2.28 ESR_ELx, Exception
Syndrome Register (ELx) (pages D7-2275 to D7-2280).

New output is of the form:
[   77.818059] Mem abort info:
[   77.820826]   Exception class = DABT (current EL), IL = 32 bits
[   77.826706]   SET = 0, FnV = 0
[   77.829742]   EA = 0, S1PTW = 0
[   77.832849] Data abort info:
[   77.835713]   ISV = 0, ISS = 0x00000070
[   77.839522]   CM = 0, WnR = 1
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: fix "%lu" in a pr_alert() call]
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

1f9b8936

04 8月, 2017 1 次提交

arm64: Fix potential race with hardware DBM in ptep_set_access_flags() · 6d332747

由 Catalin Marinas 提交于 7月 25, 2017

In a system with DBM (dirty bit management) capable agents there is a
possible race between a CPU executing ptep_set_access_flags() (maybe
non-DBM capable) and a hardware update of the dirty state (clearing of
PTE_RDONLY). The scenario:

a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old
   (PTE_AF clear)
b) ptep_set_access_flags() is called as a result of a read access and it
   needs to set the pte to writable, clean and young (PTE_AF set)
c) a DBM-capable agent, as a result of a different write access, is
   marking the entry as young (setting PTE_AF) and dirty (clearing
   PTE_RDONLY)

The current ptep_set_access_flags() implementation would set the
PTE_RDONLY bit in the resulting value overriding the DBM update and
losing the dirty state.

This patch fixes such race by setting PTE_RDONLY to the most permissive
(lowest value) of the current entry and the new one.

Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6d332747

28 7月, 2017 1 次提交

arm64: mmu: Place guard page after mapping of kernel image · 92bbd16e

由 Will Deacon 提交于 7月 24, 2017

The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.

This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

92bbd16e

21 7月, 2017 1 次提交

arm64/numa: Drop duplicate message · ece4b206

由 Punit Agrawal 提交于 7月 20, 2017

When booting linux on a system without CONFIG_NUMA enabled, the
following messages are printed during boot -

NUMA: Faking a node at [mem 0x0000000000000000-0x00000083ffffffff]
NUMA: Adding memblock [0x8000000000 - 0x8000e7ffff] on node 0
NUMA: Adding memblock [0x8000e80000 - 0x83f65cffff] on node 0
NUMA: Adding memblock [0x83f65d0000 - 0x83f665ffff] on node 0
NUMA: Adding memblock [0x83f6660000 - 0x83f676ffff] on node 0
NUMA: Adding memblock [0x83f6770000 - 0x83f678ffff] on node 0
NUMA: Adding memblock [0x83f6790000 - 0x83fb82ffff] on node 0
NUMA: Adding memblock [0x83fb830000 - 0x83fbc0ffff] on node 0
NUMA: Adding memblock [0x83fbc10000 - 0x83fbdfffff] on node 0
NUMA: Adding memblock [0x83fbe00000 - 0x83fbffffff] on node 0
NUMA: Adding memblock [0x83fc000000 - 0x83fffbffff] on node 0
NUMA: Adding memblock [0x83fffc0000 - 0x83fffdffff] on node 0
NUMA: Adding memblock [0x83fffe0000 - 0x83ffffffff] on node 0
NUMA: Initmem setup node 0 [mem 0x8000000000-0x83ffffffff]
NUMA: NODE_DATA [mem 0x83fffec500-0x83fffedfff]

The information is then duplicated by core kernel messages right after
the above output.

Early memory node ranges
  node   0: [mem 0x0000008000000000-0x0000008000e7ffff]
  node   0: [mem 0x0000008000e80000-0x00000083f65cffff]
  node   0: [mem 0x00000083f65d0000-0x00000083f665ffff]
  node   0: [mem 0x00000083f6660000-0x00000083f676ffff]
  node   0: [mem 0x00000083f6770000-0x00000083f678ffff]
  node   0: [mem 0x00000083f6790000-0x00000083fb82ffff]
  node   0: [mem 0x00000083fb830000-0x00000083fbc0ffff]
  node   0: [mem 0x00000083fbc10000-0x00000083fbdfffff]
  node   0: [mem 0x00000083fbe00000-0x00000083fbffffff]
  node   0: [mem 0x00000083fc000000-0x00000083fffbffff]
  node   0: [mem 0x00000083fffc0000-0x00000083fffdffff]
  node   0: [mem 0x00000083fffe0000-0x00000083ffffffff]
Initmem setup node 0 [mem 0x0000008000000000-0x00000083ffffffff]

Remove the duplication of memblock layout information printed during
boot by dropping the messages from arm64 numa initialisation.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ece4b206

20 7月, 2017 1 次提交

dma-coherent: introduce interface for default DMA pool · 43fc509c

由 Vladimir Murzin 提交于 7月 20, 2017

Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.

This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.

To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.

[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431

Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: NRobin Murphy <robin.murphy@arm.com>
Tested-by: NAndras Szemzo <sza@esh.hu>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

43fc509c

13 7月, 2017 1 次提交

arm64/mmap: properly account for stack randomization in mmap_base · cf92251d

由 Rik van Riel 提交于 7月 12, 2017

When RLIMIT_STACK is, for example, 256MB, the current code results in a
gap between the top of the task and mmap_base of 256MB, failing to take
into account the amount by which the stack address was randomized.  In
other words, the stack gets less than RLIMIT_STACK space.

Ensure that the gap between the stack and mmap_base always takes stack
randomization and the stack guard gap into account.

Obtained from Daniel Micay's linux-hardened tree.

Link: http://lkml.kernel.org/r/20170622200033.25714-3-riel@redhat.comSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
Signed-off-by: NRik van Riel <riel@redhat.com>
Reported-by: NFlorian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf92251d

11 7月, 2017 1 次提交

arm64/kasan: don't allocate extra shadow memory · 3f9ec80f

由 Andrey Ryabinin 提交于 7月 10, 2017

We used to read several bytes of the shadow memory in advance.
Therefore additional shadow memory mapped to prevent crash if
speculative load would happen near the end of the mapped shadow memory.

Now we don't have such speculative loads, so we no longer need to map
additional shadow memory.

Link: http://lkml.kernel.org/r/20170601162338.23540-3-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f9ec80f

07 7月, 2017 3 次提交

mm/hugetlb: add size parameter to huge_pte_offset() · 7868a208

由 Punit Agrawal 提交于 7月 06, 2017

A poisoned or migrated hugepage is stored as a swap entry in the page
tables.  On architectures that support hugepages consisting of
contiguous page table entries (such as on arm64) this leads to ambiguity
in determining the page table entry to return in huge_pte_offset() when
a poisoned entry is encountered.

Let's remove the ambiguity by adding a size parameter to convey
additional information about the requested address.  Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7868a208

arm64: hugetlb: remove spurious calls to huge_ptep_offset() · f0b38d65

由 Steve Capper 提交于 7月 06, 2017

We don't need to call huge_ptep_offset as our accessors are already
supplied with the pte_t *.  This patch removes those spurious calls.

[punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering]
Link: http://lkml.kernel.org/r/20170524115409.31309-3-punit.agrawal@arm.comSigned-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0b38d65

arm64: hugetlb: refactor find_num_contig() · bb9dd3df

由 Steve Capper 提交于 7月 06, 2017

Patch series "Support for contiguous pte hugepages", v4.

This patchset updates the hugetlb code to fix issues arising from
contiguous pte hugepages (such as on arm64).  Compared to v3, This
version addresses a build failure on arm64 by including two cleanup
patches.  Other than the arm64 cleanups, the rest are generic code
changes.  The remaining arm64 support based on these patches will be
posted separately.  The patches are based on v4.12-rc2.  Previous
related postings can be found at [0], [1], [2], and [3].

The patches fall into three categories -

* Patch 1-2 - arm64 cleanups required to greatly simplify changing
  huge_pte_offset() prototype in Patch 5.

  Catalin, Will - are you happy for these patches to go via mm?

* Patches 3-4 address issues with gup

* Patches 5-8 relate to passing a size argument to hugepage helpers to
  disambiguate the size of the referred page. These changes are
  required to enable arch code to properly handle swap entries for
  contiguous pte hugepages.

  The changes to huge_pte_offset() (patch 5) touch multiple
  architectures but I've managed to minimise these changes for the
  other affected functions - huge_pte_clear() and set_huge_pte_at().

These patches gate the enabling of contiguous hugepages support on arm64
which has been requested for systems using !4k page granule.

The ARM64 architecture supports two flavours of hugepages -

* Block mappings at the pud/pmd level

  These are regular hugepages where a pmd or a pud page table entry
  points to a block of memory. Depending on the PAGE_SIZE in use the
  following size of block mappings are supported -

          PMD	PUD
          ---	---
  4K:      2M	 1G
  16K:    32M
  64K:   512M

  For certain applications/usecases such as HPC and large enterprise
  workloads, folks are using 64k page size but the minimum hugepage size
  of 512MB isn't very practical.

To overcome this ...

* Using the Contiguous bit

  The architecture provides a contiguous bit in the translation table
  entry which acts as a hint to the mmu to indicate that it is one of a
  contiguous set of entries that can be cached in a single TLB entry.

  We use the contiguous bit in Linux to increase the mapping size at the
  pmd and pte (last) level.

  The number of supported contiguous entries varies by page size and
  level of the page table.

  Using the contiguous bit allows additional hugepage sizes -

           CONT PTE    PMD    CONT PMD    PUD
           --------    ---    --------    ---
    4K:         64K     2M         32M     1G
    16K:         2M    32M          1G
    64K:         2M   512M         16G

  Of these, 64K with 4K and 2M with 64K pages have been explicitly
  requested by a few different users.

Entries with the contiguous bit set are required to be modified all
together - which makes things like memory poisoning and migration
impossible to do correctly without knowing the size of hugepage being
dealt with - the reason for adding size parameter to a few of the
hugepage helpers in this series.

This patch (of 8):

As we regularly check for contiguous pte's in the huge accessors, remove
this extra check from find_num_contig.

[punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering]
Link: http://lkml.kernel.org/r/20170524115409.31309-2-punit.agrawal@arm.comSigned-off-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb9dd3df

23 6月, 2017 3 次提交

arm/arm64: KVM: add guest SEA support · 621f48e4

由 Tyler Baicar 提交于 6月 21, 2017

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.
Signed-off-by: NTyler Baicar <tbaicar@codeaurora.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

621f48e4

acpi: apei: handle SEA notification type for ARMv8 · 7edda088

由 Tyler Baicar 提交于 6月 21, 2017

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
An SEA can interrupt code that had interrupts masked and is treated as
an NMI. To aid this the page of address space for mapping APEI buffers
while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
changed to use the helper methods to find the prot_t to map with in
the same way as ghes_ioremap_pfn_irq().
Signed-off-by: NTyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: NJames Morse <james.morse@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

7edda088

arm64: exception: handle Synchronous External Abort · 32015c23

由 Tyler Baicar 提交于 6月 21, 2017

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.
Signed-off-by: NTyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: NJames Morse <james.morse@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
[will: use NULL instead of 0 when assigning si_addr]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

32015c23

20 6月, 2017 1 次提交

arm64: remove DMA_ERROR_CODE · e0d60ac1

由 Christoph Hellwig 提交于 5月 21, 2017

The dma alloc interface returns an error by return NULL, and the
mapping interfaces rely on the mapping_error method, which the dummy
ops already implement correctly.

Thus remove the DMA_ERROR_CODE define.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>

e0d60ac1

15 6月, 2017 1 次提交

arm64/dma-mapping: Remove extraneous null-pointer checks · 577dfe16

由 Olav Haugan 提交于 6月 13, 2017

The current null-pointer check in __dma_alloc_coherent and
__dma_free_coherent is not needed anymore since the
__dma_alloc/__dma_free functions won't be called if !dev (dummy ops will
be called instead).
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NOlav Haugan <ohaugan@codeaurora.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

577dfe16

12 6月, 2017 6 次提交

arm64: mm: Update perf accounting to handle poison faults · 0e3a9026

由 Punit Agrawal 提交于 6月 08, 2017

Re-organise the perf accounting for fault handling in preparation for
enabling handling of hardware poison faults in subsequent commits. The
change updates perf accounting to be inline with the behaviour on
x86.

With this update, the perf fault accounting -

  * Always report PERF_COUNT_SW_PAGE_FAULTS

  * Doesn't report anything else for VM_FAULT_ERROR (which includes
    hwpoison faults)

  * Reports PERF_COUNT_SW_PAGE_FAULTS_MAJ if it's a major
    fault (indicated by VM_FAULT_MAJOR)

  * Otherwise, reports PERF_COUNT_SW_PAGE_FAULTS_MIN
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

0e3a9026

arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling · e7c600f1

由 Jonathan (Zhixiong) Zhang 提交于 6月 08, 2017

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.
Signed-off-by: NJonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: NTyler Baicar <tbaicar@codeaurora.org>
(fix new __do_user_fault call-site)
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Tested-by: NManoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e7c600f1

arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries · f02ab08a

由 Punit Agrawal 提交于 6月 08, 2017

When memory failure is enabled, a poisoned hugepage pte is marked as a
swap entry. huge_pte_offset() does not return the poisoned page table
entries when it encounters PUD/PMD hugepages.

This behaviour of huge_pte_offset() leads to error such as below when
munmap is called on poisoned hugepages.

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

Fix huge_pte_offset() to return the poisoned pte which is then
appropriately handled by the generic layer code.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Tested-by: NManoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

f02ab08a

arm64: fault: Print info about page table structure when dumping pte · 1eb34b6e

由 Will Deacon 提交于 5月 15, 2017

Whilst debugging a remote crash, I noticed that show_pte is unhelpful
when it comes to describing the structure of the page table being walked.
This is easily fixed by printing out the page table (swapper vs user),
page size and virtual address size when displaying the PGD address.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1eb34b6e

arm64: mm: print file name of faulting vma · 83016b20

由 Kristina Martsenko 提交于 6月 09, 2017

Print out the name of the file associated with the vma that faulted.
This is usually the executable or shared library name. We already print
out the task name, but also printing the library name is useful for
pinpointing bugs to libraries.

Also print the base address and size of the vma, which together with the
PC (printed by __show_regs) gives the offset into the library.

Fault prints now look like:
test[2361]: unhandled level 2 translation fault (11) at 0x00000012, esr 0x92000006, in libfoo.so[ffffa0145000+1000]

This is already done on x86, for more details see commit 03252919
("x86: print which shared library/executable faulted in segfault etc.
messages v3").
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

83016b20

arm64: mm: don't print out page table entries on EL0 faults · bf396c09

由 Kristina Martsenko 提交于 6月 09, 2017

When we take a fault from EL0 that can't be handled, we print out the
page table entries associated with the faulting address. This allows
userspace to print out any current page table entries, including kernel
(TTBR1) entries. Exposing kernel mappings like this could pose a
security risk, so don't print out page table information on EL0 faults.
(But still print it out for EL1 faults.) This also follows the same
behaviour as x86, printing out page table entries on kernel mode faults
but not user mode faults.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

bf396c09

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功