提交 · 2072d29c46b73e39b3c6c56c6027af77086f45fd · openeuler / Kernel

03 3月, 2014 1 次提交

arm64: KVM: force cache clean on page fault when caches are off · 2d58b733

由 Marc Zyngier 提交于 1月 14, 2014

In order for the guest with caches off to observe data written
contained in a given page, we need to make sure that page is
committed to memory, and not just hanging in the cache (as
guest accesses are completely bypassing the cache until it
decides to enable it).

For this purpose, hook into the coherent_icache_guest_page
function and flush the region if the guest SCTLR_EL1
register doesn't show the MMU  and caches as being enabled.
The function also get renamed to coherent_cache_guest_page.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>

2d58b733

08 2月, 2014 2 次提交

arm64: asm: remove redundant "cc" clobbers · 95c41896

由 Will Deacon 提交于 2月 04, 2014

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

95c41896

arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4

由 Will Deacon 提交于 2月 04, 2014

Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

8e86f0b4

06 2月, 2014 1 次提交

arm64: barriers: allow dsb macro to take option parameter · 4a7ac12e

由 Will Deacon 提交于 2月 06, 2014

The dsb instruction takes an option specifying both the target access
types and shareability domain.

This patch allows such an option to be passed to the dsb macro,
resulting in potentially more efficient code. Currently the option is
ignored until all callers are updated (unlike ARM, the option is
mandated by the assembler).
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

4a7ac12e

05 2月, 2014 3 次提交

arm64: compat: Wire up new AArch32 syscalls · 6290b53d

由 Catalin Marinas 提交于 2月 05, 2014

This patch enables sys_compat, sys_finit_module, sys_sched_setattr and
sys_sched_getattr for compat (AArch32) applications.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6290b53d

arm64: fix typo: s/SERRROR/SERROR/ · bfb67a56

由 Mark Rutland 提交于 2月 05, 2014

Somehow SERROR has acquired an additional 'R' in a couple of headers.
This patch removes them before they spread further. As neither instance
is in use yet, no other sites need to be fixed up.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

bfb67a56

arm64: add DSB after icache flush in __flush_icache_all() · 5044bad4

由 Vinayak Kale 提交于 2月 05, 2014

Add DSB after icache flush to complete the cache maintenance operation.
The function __flush_icache_all() is used only for user space mappings
and an ISB is not required because of an exception return before executing
user instructions. An exception return would behave like an ISB.
Signed-off-by: NVinayak Kale <vkale@apm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

5044bad4

31 1月, 2014 2 次提交

arm64: mm: Introduce PTE_WRITE · c2c93e5b

由 Steve Capper 提交于 1月 15, 2014

We have the following means for encoding writable or dirty ptes:

                                PTE_DIRTY       PTE_RDONLY
!pte_dirty && !pte_write        0               1
!pte_dirty && pte_write         0               1
pte_dirty && !pte_write         1               1
pte_dirty && pte_write          1               0

So we can't distinguish between writable clean ptes and read only
ptes. This can cause problems with ptes being incorrectly flagged as
read only when they are writable but not dirty.

This patch introduces a new software bit PTE_WRITE which allows us to
correctly identify writable ptes. PTE_RDONLY is now only clear for
valid ptes where a page is both writable and dirty.
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c2c93e5b

arm64: mm: Remove PTE_BIT_FUNC macro · 44b6dfc5

由 Steve Capper 提交于 1月 15, 2014

Expand out the pte manipulation functions. This makes our life easier
when using things like tags and cscope.
Signed-off-by: NSteve Capper <steve.capper@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

44b6dfc5

27 1月, 2014 1 次提交

arm64: fix build error if DMA_CMA is enabled · ac525f59

由 Pankaj Dubey 提交于 1月 24, 2014

arm64/include/asm/dma-contiguous.h is trying to include
<asm-genric/dma-contiguous.h> which does not exist, and thus failing
build for arm64 if we enable CONFIG_DMA_CMA. This patch fixes build
error by removing unwanted header inclusion from arm64's dma-contiguous.h.
Signed-off-by: NPankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: NSomraj Mani <somraj.mani@samsung.com>
Acked-by: NLaura Abbott <lauraa@codeaurora.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ac525f59

17 1月, 2014 1 次提交

Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache" · 4ce00dfc

由 Catalin Marinas 提交于 1月 16, 2014

This reverts commit 2f7dc602.

The above commit breaks the mapping type for Device memory because
pgprot_default already contains a Normal memory type. pgprot_default is
also not initialised early enough for earlyprintk resulting in an
inconsistent memory mapping with 64K PAGE_SIZE configuration.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NWill Deacon <will.deacon@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>

4ce00dfc

12 1月, 2014 1 次提交

arch: Introduce smp_load_acquire(), smp_store_release() · 47933ad4

由 Peter Zijlstra 提交于 11月 06, 2013

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]
Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

47933ad4

08 1月, 2014 5 次提交

arm64, jump label: optimize jump label implementation · 9732cafd

由 Jiang Liu 提交于 1月 07, 2014

Optimize jump label implementation for ARM64 by dynamically patching
kernel text.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJiang Liu <liuj97@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

9732cafd

arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions · 5c5bf25d

由 Jiang Liu 提交于 1月 07, 2014

Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJiang Liu <liuj97@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

5c5bf25d

arm64: move encode_insn_immediate() from module.c to insn.c · c84fced8

由 Jiang Liu 提交于 1月 07, 2014

Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJiang Liu <liuj97@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c84fced8

arm64: introduce interfaces to hotpatch kernel and module code · ae164807

由 Jiang Liu 提交于 1月 07, 2014

Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
	patch code without synchronization, it's caller's responsibility
	to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
	patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
	patch code and synchronize with stop_machine() if needed
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJiang Liu <liuj97@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ae164807

arm64: introduce basic aarch64 instruction decoding helpers · b11a64a4

由 Jiang Liu 提交于 1月 07, 2014

Introduce basic aarch64 instruction decoding helper
aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJiang Liu <liuj97@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b11a64a4

28 12月, 2013 1 次提交

arm64: KVM: Add Kconfig option for max VCPUs per-Guest · da781470

由 Anup Patel 提交于 12月 12, 2013

Current max VCPUs per-Guest is set to 4 which is preventing
us from creating a Guest (or VM) with 8 VCPUs on Host (e.g.
X-Gene Storm SOC) with 8 Host CPUs.

The correct value of max VCPUs per-Guest should be same as
the max CPUs supported by GICv2 which is 8 but, increasing
value of max VCPUs per-Guest can make things slower hence
we add Kconfig option to let KVM users select appropriate
max VCPUs per-Guest.
Signed-off-by: NAnup Patel <anup.patel@linaro.org>
Signed-off-by: NPranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

da781470

20 12月, 2013 9 次提交

arm64: Enable CMA · 6ac2104d

由 Laura Abbott 提交于 12月 12, 2013

arm64 bit targets need the features CMA provides. Add the appropriate
hooks, header files, and Kconfig to allow this to happen.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6ac2104d

arm64: drop redundant macros from read_cpuid() · 148eb0a1

由 Ard Biesheuvel 提交于 12月 16, 2013

asm/cputype.h contains a bunch of #defines for CPU id registers
that essentially map to themselves. Remove the #defines and pass
the tokens directly to the inline asm() that reads the registers.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

148eb0a1

arm64: cmpxchg: update macros to prevent warnings · 60010e50

由 Mark Hambleton 提交于 12月 03, 2013

Make sure the value we are going to return is referenced in order to
avoid warnings from newer GCCs such as:

arch/arm64/include/asm/cmpxchg.h:162:3: warning: value computed is not used [-Wunused-value]
  ((__typeof__(*(ptr)))__cmpxchg_mb((ptr),   \
   ^
net/netfilter/nf_conntrack_core.c:674:2: note: in expansion of macro ‘cmpxchg’
  cmpxchg(&nf_conntrack_hash_rnd, 0, rand);

[Modified to use the current underlying implementation as current
mainline for both cmpxchg() and cmpxchg_local() does -- broonie]
Signed-off-by: NMark Hambleton <mahamble@broadcom.com>
Signed-off-by: NMark Brown <broonie@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

60010e50

arm64: support single-step and breakpoint handler hooks · ee6214ce

由 Sandeepa Prabhu 提交于 12月 04, 2013

AArch64 Single Steping and Breakpoint debug exceptions will be
used by multiple debug framworks like kprobes & kgdb.

This patch implements the hooks for those frameworks to register
their own handlers for handling breakpoint and single step events.

Reworked the debug exception handler in entry.S: do_dbg to route
software breakpoint (BRK64) exception to do_debug_exception()
Signed-off-by: NSandeepa Prabhu <sandeepa.prabhu@linaro.org>
Signed-off-by: NDeepak Saxena <dsaxena@linaro.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ee6214ce

arm64: dcache: select DCACHE_WORD_ACCESS for little-endian CPUs · 7bc13fd3

由 Will Deacon 提交于 11月 06, 2013

DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string
comparisons in the vfs layer.

This patch implements support for load_unaligned_zeropad in much the
same way as has been done for ARM, although big-endian systems are also
supported.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

7bc13fd3

arm64: futex: ensure .fixup entries are sufficiently aligned · 4da7a56c

由 Will Deacon 提交于 11月 06, 2013

AArch64 instructions must be 4-byte aligned, so make sure this is true
for the futex .fixup section.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

4da7a56c

arm64: use generic strnlen_user and strncpy_from_user functions · 12a0ef7b

由 Will Deacon 提交于 11月 06, 2013

This patch implements the word-at-a-time interface for arm64 using the
same algorithm as ARM. We use the fls64 macro, which expands to a clz
instruction via a compiler builtin. Big-endian configurations make use
of the implementation from asm-generic.

With this implemented, we can replace our byte-at-a-time strnlen_user
and strncpy_from_user functions with the optimised generic versions.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

12a0ef7b

arm64: percpu: implement optimised pcpu access using tpidr_el1 · 71586276

由 Will Deacon 提交于 11月 05, 2013

This patch implements optimised percpu variable accesses using the
el1 r/w thread register (tpidr_el1) along the same lines as arch/arm/.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

71586276

arm64: Correct virt_addr_valid · e26db3f3

由 Laura Abbott 提交于 12月 11, 2013

The definition of virt_addr_valid is that virt_addr_valid should
return true if and only if virt_to_page returns a valid pointer.
The current definition of virt_addr_valid only checks against the
virtual address range. There's no guarantee that just because a
virtual address falls bewteen PAGE_OFFSET and high_memory the
associated physical memory has a valid backing struct page. Follow
the example of other architectures and convert to pfn_valid to
verify that the virtual address is actually valid.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

e26db3f3

18 12月, 2013 1 次提交
- D
  lib: Add missing arch generic-y entries for asm-generic/hash.h · e3fec2f7
  由 David S. Miller 提交于 12月 17, 2013
```
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  e3fec2f7
17 12月, 2013 5 次提交

arm64: enable generic clockevent broadcast · 1f85008e

由 Lorenzo Pieralisi 提交于 9月 04, 2013

On platforms with power management capabilities, timers that are shut
down when a CPU enters deep C-states must be emulated using an always-on
timer and a timer IPI to relay the timer IRQ to target CPUs on an SMP
system.

This patch enables the generic clockevents broadcast infrastructure for
arm64, by providing the required Kconfig entries and adding the timer
IPI infrastructure.
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

1f85008e

arm64: kernel: cpu_{suspend/resume} implementation · 95322526

由 Lorenzo Pieralisi 提交于 7月 22, 2013

Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.

The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:

int cpu_suspend(unsigned long arg);

allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.

Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.

Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.

Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.

The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.

Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

95322526

arm64: kernel: suspend/resume registers save/restore · 6732bc65

由 Lorenzo Pieralisi 提交于 7月 17, 2013

Power management software requires the kernel to save and restore
CPU registers while going through suspend and resume operations
triggered by kernel subsystems like CPU idle and suspend to RAM.

This patch implements code that provides save and restore mechanism
for the arm v8 implementation. Memory for the context is passed as
parameter to both cpu_do_suspend and cpu_do_resume functions, and allows
the callers to implement context allocation as they deem fit.

The registers that are saved and restored correspond to the registers set
actually required by the kernel to be up and running which represents a
subset of v8 ISA.
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

6732bc65

arm64: kernel: build MPIDR_EL1 hash function data structure · 976d7d3f

由 Lorenzo Pieralisi 提交于 5月 16, 2013

On ARM64 SMP systems, cores are identified by their MPIDR_EL1 register.
The MPIDR_EL1 guidelines in the ARM ARM do not provide strict enforcement of
MPIDR_EL1 layout, only recommendations that, if followed, split the MPIDR_EL1
on ARM 64 bit platforms in four affinity levels. In multi-cluster
systems like big.LITTLE, if the affinity guidelines are followed, the
MPIDR_EL1 can not be considered a linear index. This means that the
association between logical CPU in the kernel and the HW CPU identifier
becomes somewhat more complicated requiring methods like hashing to
associate a given MPIDR_EL1 to a CPU logical index, in order for the look-up
to be carried out in an efficient and scalable way.

This patch provides a function in the kernel that starting from the
cpu_logical_map, implement collision-free hashing of MPIDR_EL1 values by
checking all significative bits of MPIDR_EL1 affinity level bitfields.
The hashing can then be carried out through bits shifting and ORing; the
resulting hash algorithm is a collision-free though not minimal hash that can
be executed with few assembly instructions. The mpidr_el1 is filtered through a
mpidr mask that is built by checking all bits that toggle in the set of
MPIDR_EL1s corresponding to possible CPUs. Bits that do not toggle do not
carry information so they do not contribute to the resulting hash.

Pseudo code:

/* check all bits that toggle, so they are required */
for (i = 1, mpidr_el1_mask = 0; i < num_possible_cpus(); i++)
	mpidr_el1_mask |= (cpu_logical_map(i) ^ cpu_logical_map(0));

/*
 * Build shifts to be applied to aff0, aff1, aff2, aff3 values to hash the
 * mpidr_el1
 * fls() returns the last bit set in a word, 0 if none
 * ffs() returns the first bit set in a word, 0 if none
 */
fs0 = mpidr_el1_mask[7:0] ? ffs(mpidr_el1_mask[7:0]) - 1 : 0;
fs1 = mpidr_el1_mask[15:8] ? ffs(mpidr_el1_mask[15:8]) - 1 : 0;
fs2 = mpidr_el1_mask[23:16] ? ffs(mpidr_el1_mask[23:16]) - 1 : 0;
fs3 = mpidr_el1_mask[39:32] ? ffs(mpidr_el1_mask[39:32]) - 1 : 0;
ls0 = fls(mpidr_el1_mask[7:0]);
ls1 = fls(mpidr_el1_mask[15:8]);
ls2 = fls(mpidr_el1_mask[23:16]);
ls3 = fls(mpidr_el1_mask[39:32]);
bits0 = ls0 - fs0;
bits1 = ls1 - fs1;
bits2 = ls2 - fs2;
bits3 = ls3 - fs3;
aff0_shift = fs0;
aff1_shift = 8 + fs1 - bits0;
aff2_shift = 16 + fs2 - (bits0 + bits1);
aff3_shift = 32 + fs3 - (bits0 + bits1 + bits2);
u32 hash(u64 mpidr_el1) {
	u32 l[4];
	u64 mpidr_el1_masked = mpidr_el1 & mpidr_el1_mask;
	l[0] = mpidr_el1_masked & 0xff;
	l[1] = mpidr_el1_masked & 0xff00;
	l[2] = mpidr_el1_masked & 0xff0000;
	l[3] = mpidr_el1_masked & 0xff00000000;
	return (l[0] >> aff0_shift | l[1] >> aff1_shift | l[2] >> aff2_shift |
		l[3] >> aff3_shift);
}

The hashing algorithm relies on the inherent properties set in the ARM ARM
recommendations for the MPIDR_EL1. Exotic configurations, where for instance
the MPIDR_EL1 values at a given affinity level have large holes, can end up
requiring big hash tables since the compression of values that can be achieved
through shifting is somewhat crippled when holes are present. Kernel warns if
the number of buckets of the resulting hash table exceeds the number of
possible CPUs by a factor of 4, which is a symptom of a very sparse HW
MPIDR_EL1 configuration.

The hash algorithm is quite simple and can easily be implemented in assembly
code, to be used in code paths where the kernel virtual address space is
not set-up (ie cpu_resume) and instruction and data fetches are strongly
ordered so code must be compact and must carry out few data accesses.
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

976d7d3f

arm64: kernel: add MPIDR_EL1 accessors macros · b058450f

由 Lorenzo Pieralisi 提交于 8月 05, 2013

In order to simplify access to different affinity levels within the
MPIDR_EL1 register values, this patch implements some preprocessor
macros that allow to retrieve the MPIDR_EL1 affinity level value according
to the level passed as input parameter.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

b058450f

12 12月, 2013 2 次提交

arm/arm64: kvm: Use virt_to_idmap instead of virt_to_phys for idmap mappings · 4fda342c

由 Santosh Shilimkar 提交于 11月 19, 2013

KVM initialisation fails on architectures implementing virt_to_idmap()
because virt_to_phys() on such architectures won't fetch you the correct
idmap page.

So update the KVM ARM code to use the virt_to_idmap() to fix the issue.
Since the KVM code is shared between arm and arm64, we create
kvm_virt_to_phys() and handle the redirection in respective headers.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

4fda342c

xen/arm64: do not call the swiotlb functions twice · 02ab71cd

由 Stefano Stabellini 提交于 12月 09, 2013

On arm64 the dma_map_ops implementation is based on the swiotlb.
swiotlb-xen, used by default in dom0 on Xen, is also based on the
swiotlb.

Avoid calling into the default arm64 dma_map_ops functions from
xen_dma_map_page, xen_dma_unmap_page, xen_dma_sync_single_for_cpu, and
xen_dma_sync_single_for_device otherwise we end up calling into the
swiotlb twice.

When arm64 gets a non-swiotlb based implementation of dma_map_ops, we'll
probably have to reintroduce dma_map_ops calls in page-coherent.h.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: catalin.marinas@arm.com
CC: Will.Deacon@arm.com
CC: Ian.Campbell@citrix.com

02ab71cd

07 12月, 2013 2 次提交

arm64: mm: Fix PMD_SECT_PROT_NONE definition · db4ed53c

由 Steve Capper 提交于 12月 05, 2013

Modify the value of PMD_SECT_PROT_NONE to match that of PTE_NONE. This
should have been in commit 3676f9ef (Move PTE_PROT_NONE higher up).
Signed-off-by: NSteve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> # 3.11+: 3676f9ef: arm64: Move PTE_PROT_NONE higher up
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

db4ed53c

arm64: Fix memory shareability attribute for ioremap_wc/cache · 2f7dc602

由 Catalin Marinas 提交于 12月 05, 2013

Write-combine and cacheable mappings use Normal memory on arm64. On SMP
systems, the pte needs the shareability bit which is set in
pgprot_default. Use this for defining PROT_DEFAULT used by ioremap_wc
and ioremap_cache (Device memory is shareable by default, does not need
additional attributes).
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2f7dc602

29 11月, 2013 2 次提交

arm64: Move PTE_PROT_NONE higher up · 3676f9ef

由 Catalin Marinas 提交于 11月 27, 2013

PTE_PROT_NONE means that a pte is present but does not have any
read/write attributes. However, setting the memory type like
pgprot_writecombine() is allowed and such bits overlap with
PTE_PROT_NONE. This causes mmap/munmap issues in drivers that change the
vma->vm_pg_prot on PROT_NONE mappings.

This patch reverts the PTE_FILE/PTE_PROT_NONE shift in commit
59911ca4 (ARM64: mm: Move PTE_PROT_NONE bit) and moves PTE_PROT_NONE
together with the other software bits.
Signed-off-by: NSteve Capper <steve.capper@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NSteve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> # 3.11+

3676f9ef

arm64: Use Normal NonCacheable memory for writecombine · 4f00130b

由 Catalin Marinas 提交于 11月 29, 2013

This provides better performance compared to Device GRE and also allows
unaligned accesses. Such memory is intended to be used with standard RAM
(e.g. framebuffers) and not I/O.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

4f00130b

26 11月, 2013 1 次提交

arm64: Unmask asynchronous aborts when in kernel mode · b3bf6aa7

由 Catalin Marinas 提交于 11月 21, 2013

The asynchronous aborts are generally fatal for the kernel but they can
be masked via the pstate A bit. If a system error happens while in
kernel mode, it won't be visible until returning to user space. This
patch enables this kind of abort early to help identifying the cause.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b3bf6aa7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功