提交 · 84d69848c97faab0c25aa2667b273404d2e2a64a · openanolis / cloud-kernel

12 8月, 2016 1 次提交

ARM: 8595/2: apply more __ro_after_init · 7619751f

由 Kees Cook 提交于 8月 10, 2016

Guided by grsecurity's analogous __read_only markings in arch/arm,
this applies several uses of __ro_after_init to structures that are
only updated during __init.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

7619751f

08 8月, 2016 1 次提交
- A
  arm: move exports to definitions · 4dd1837d
  由 Al Viro 提交于 1月 13, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  4dd1837d
23 6月, 2016 1 次提交

ARM: 8306/1: loop_udelay: remove bogomips value limitation · 215e362d

由 Nicolas Pitre 提交于 2月 25, 2015

Now that we don't support ARMv3 anymore, the loop based delay code can
convert microsecs into number of loops using a 64-bit multiplication
and more precision.

This allows us to lift the hard limit of 3355 on the bogomips value as
loops_per_jiffy may now safely span the full 32-bit range.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

215e362d

16 1月, 2016 1 次提交

arm, thp: remove infrastructure for handling splitting PMDs · 0ebd7446

由 Kirill A. Shutemov 提交于 1月 15, 2016

With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

[arnd@arndb.de: fix unterminated ifdef in header file]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ebd7446

17 12月, 2015 1 次提交

ARM: 8477/1: runtime patch udiv/sdiv instructions into __aeabi_{u}idiv() · 42f25bdd

由 Nicolas Pitre 提交于 12月 12, 2015

The ARM compiler inserts calls to __aeabi_idiv() and
__aeabi_uidiv() when it needs to perform division on signed and
unsigned integers. If a processor has support for the sdiv and
udiv instructions, the kernel may overwrite the beginning of those
functions with those instructions and a "bx lr" to get better
performance.

To ensure that those functions are aligned to a 32-bit word for easier
patching (which might not always be the case in Thumb mode) and that
the two patched instructions end up in the same cache line, a 8-byte
alignment is enforced when ARM_PATCH_IDIV is selected.

This was heavily inspired by a previous patch from Stephen Boyd.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

42f25bdd

15 12月, 2015 1 次提交

ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN · c014953d

由 Russell King 提交于 12月 05, 2015

The uaccess_with_memcpy() code is currently incompatible with the SW
PAN code: it takes locks within the region that we've changed the DACR,
potentially sleeping as a result.  As we do not save and restore the
DACR across co-operative sleep events, can lead to an incorrect DACR
value later in this code path.
Reported-by: NPeter Rosin <peda@axentia.se>
Tested-by: NPeter Rosin <peda@axentia.se>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c014953d

03 10月, 2015 1 次提交

ARM: 8438/1: Add unwinding to __clear_user_std() · 6f56a68d

由 Stephen Boyd 提交于 9月 15, 2015

Add unwinding annotations so that unwinding from this function
works properly.
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6f56a68d

27 8月, 2015 1 次提交

ARM: software-based priviledged-no-access support · a5e090ac

由 Russell King 提交于 8月 19, 2015

Provide a software-based implementation of the priviledged no access
support found in ARMv8.1.

Userspace pages are mapped using a different domain number from the
kernel and IO mappings.  If we switch the user domain to "no access"
when we enter the kernel, we can prevent the kernel from touching
userspace.

However, the kernel needs to be able to access userspace via the
various user accessor functions.  With the wrapping in the previous
patch, we can temporarily enable access when the kernel needs user
access, and re-disable it afterwards.

This allows us to trap non-intended accesses to userspace, eg, caused
by an inadvertent dereference of the LIST_POISON* values, which, with
appropriate user mappings setup, can be made to succeed.  This in turn
can allow use-after-free bugs to be further exploited than would
otherwise be possible.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a5e090ac

25 8月, 2015 1 次提交

ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore() · 3fba7e23

由 Russell King 提交于 8月 19, 2015

Provide uaccess_save_and_enable() and uaccess_restore() to permit
control of userspace visibility to the kernel, and hook these into
the appropriate places in the kernel where we need to access
userspace.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

3fba7e23

18 8月, 2015 1 次提交

ARM: 8414/1: __copy_to_user_memcpy: fix mmap semaphore usage · 0f64b247

由 Nicolas Pitre 提交于 8月 12, 2015

The mmap semaphore should not be taken when page faults are disabled.
Since pagefault_disable() no longer disables preemption, we now need
to use faulthandler_disabled() in place of in_atomic().
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Tested-by: NMark Salter <msalter@redhat.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

0f64b247

04 7月, 2015 1 次提交

ARM: avoid unwanted GCC memset()/memcpy() optimisations for IO variants · 1bd46782

由 Russell King 提交于 7月 03, 2015

We don't want GCC optimising our memset_io(), memcpy_fromio() or
memcpy_toio() variants, so we must not call one of the standard
functions.  Provide a separate name for our assembly memcpy() and
memset() functions, and use that instead, thereby bypassing GCC's
ability to optimise these operations.

GCCs optimisation may introduce unaligned accesses which are invalid
for device mappings.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

1bd46782

26 5月, 2015 1 次提交

ARM: lib/lib1funcs.S: fix typo s/substractions/subtractions/ · 82350ab1

由 Antonio Ospite 提交于 4月 28, 2015

Signed-off-by: NAntonio Ospite <ao2@ao2.it>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

82350ab1

09 5月, 2015 1 次提交

ARM: replace BSYM() with badr assembly macro · 14327c66

由 Russell King 提交于 4月 21, 2015

BSYM() was invented to allow us to work around a problem with the
assembler, where local symbols resolved by the assembler for the 'adr'
instruction did not take account of their ISA.

Since we don't want BSYM() used elsewhere, replace BSYM() with a new
macro 'badr', which is like the 'adr' pseudo-op, but with the BSYM()
mechanics integrated into it.  This ensures that the BSYM()-ification
is only used in conjunction with 'adr'.
Acked-by: NDave Martin <Dave.Martin@arm.com>
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

14327c66

15 4月, 2015 1 次提交

ARM: ensure delay timer has sufficient accuracy for delays · 57ca654b

由 Russell King 提交于 4月 13, 2015

We have recently had an example of someone wanting to use a 90kHz timer
for the software delay loop.

udelay() needs to have at least microsecond resolution to allow drivers
access to a delay mechanism with a reasonable chance of delaying the
period they requested within at least a 50% marging of error, especially
for small delays.

Discussion about the udelay() accuracy can be found at:
	https://lkml.org/lkml/2011/1/9/37

Reject timers which are unable to supply this level of resolution.
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

57ca654b

30 3月, 2015 1 次提交

ARM: 8322/1: keep .text and .fixup regions closer together · c4a84ae3

由 Ard Biesheuvel 提交于 3月 24, 2015

This moves all fixup snippets to the .text.fixup section, which is
a special section that gets emitted along with the .text section
for each input object file, i.e., the snippets are kept much closer
to the code they refer to, which helps prevent linker failure on
large kernels.
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c4a84ae3

16 1月, 2015 1 次提交

ARM: 8285/1: remove ARMv3 user access code again · c2563038

由 Nicolas Pitre 提交于 1月 15, 2015

This code was restored with commit 080fc66f ("ARM: Bring back ARMv3 IO
and user access code") because the RiscPC memory bus does not understand
half-word load/stores. However only the IO code needed restoring since
the alternative user access code contains no half-word accesses, is
already used when CONFIG_PREEMPT is set and runs faster on a StrongARM.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c2563038

28 11月, 2014 3 次提交

ARM: 8225/1: Add unwinding support for memory copy functions · 279f487e

由 Lin Yongting 提交于 11月 26, 2014

The memory copy functions(memcpy, __copy_from_user, __copy_to_user)
never had unwinding annotations added. Currently, when accessing
invalid pointer by these functions occurs the backtrace shown will
stop at these functions or some completely unrelated function.
Add unwinding annotations in hopes of getting a more useful backtrace
in following cases:
1. die on accessing invalid pointer by these functions
2. kprobe trapped at any instruction within these functions
3. interrupted at any instruction within these functions
Signed-off-by: NLin Yongting <linyongting@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

279f487e

ARM: 8224/1: Add unwinding support for memmove function · 207a6cb0

由 Lin Yongting 提交于 11月 26, 2014

The memmove function never had unwinding annotations added.
Currently, when accessing invalid pointer by memmove occurs the
backtrace shown will stop at memmove or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by memmove
2. kprobe trapped at any instruction within memmove
3. interrupted at any instruction within memmove
Signed-off-by: NLin Yongting <linyongting@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

207a6cb0

ARM: 8223/1: Add unwinding support for __memzero function · 20cb6abf

由 Lin Yongting 提交于 11月 26, 2014

The __memzero function never had unwinding annotations added.
Currently, when accessing invalid pointer by __memzero occurs the
backtrace shown will stop at __memzero or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by __memzero
2. kprobe trapped at any instruction within __memzero
3. interrupted at any instruction within __memzero
Signed-off-by: NLin Yongting <linyongting@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

20cb6abf

21 11月, 2014 1 次提交

ARM: 8204/1: Add unwinding support for memset function · c2459d35

由 Lin Yongting 提交于 11月 16, 2014

The memset function never had unwinding annotations added.
Currently, when accessing NULL pointer by memset occurs the
backtrace shown will stop at memset or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace when accessing NULL pointer by memset, kprobe
or interrupt.
Signed-off-by: NLin Yongting <linyongting@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c2459d35

13 9月, 2014 1 次提交

ARM: 8137/1: fix get_user BE behavior for target variable with size of 8 bytes · d9981380

由 Victor Kamensky 提交于 9月 04, 2014

e38361d0 'ARM: 8091/2: add get_user() support for 8 byte types' commit
broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size
is 32 bit or smaller. e38361d0 changed type of __r2 from 'register
unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change
even when target variable size was 64 bit, __r2 was still 32 bit.
But after e38361d0 commit, for target var of 64 bit size, __r2 became 64
bit and now it should occupy 2 registers r2, and r3. The issue in BE case
that r3 register is least significant word of __r2 and r2 register is most
significant word of __r2. But __get_user_4 still copies result into r2 (most
significant word of __r2). Subsequent code copies from __r2 into x, but
for situation described it will pick up only garbage from r3 register.

Special __get_user_64t_(124) functions are introduced. They are similar to
corresponding __get_user_(124) function but result stored in r3 register
(lsw in case of 64 bit __r2 in BE image). Those function are used by
get_user macro in case of BE and target var size is 64bit.

Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent
naming accross all cases.
Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
Suggested-by: NDaniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

d9981380

18 7月, 2014 2 次提交

ARM: 8091/2: add get_user() support for 8 byte types · e38361d0

由 Daniel Thompson 提交于 7月 10, 2014

Recent contributions, including to DRM and binder, introduce 64-bit
values in their interfaces. A common motivation for this is to allow
the same ABI for 32- and 64-bit userspaces (and therefore also a shared
ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to
avoid gotchas like having to use copy_from_user().

This feature is already implemented on x86-32 and the majority of other
32-bit architectures. The current list of get_user_8 hold out
architectures are: arm, avr32, blackfin, m32r, metag, microblaze,
mn10300, sh.

Credit:

    My name sits rather uneasily at the top of this patch. The v1 and
    v2 versions of the patch were written by Rob Clark and to produce v4
    I mostly copied code from Russell King and H. Peter Anvin. However I
    have mangled the patch sufficiently that *blame* is rightfully mine
    even if credit should more widely shared.

Changelog:

v5: updated to use the ret macro (requested by Russell King)
v4: remove an inlined add on big endian systems (spotted by Russell King),
    used __ARMEB__ rather than BIG_ENDIAN (to match rest of file),
    cleared r3 on EFAULT during __get_user_8.
v3: fix a couple of checkpatch issues
v2: pass correct size to check_uaccess, and better handling of narrowing
    double word read with __get_user_xb() (Russell King's suggestion)
v1: original
Signed-off-by: NRob Clark <robdclark@gmail.com>
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

e38361d0

ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ · 6ebbf2ce

由 Russell King 提交于 6月 30, 2014

ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls. Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).

We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.

Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code. This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.
Reported-by: NWill Deacon <will.deacon@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
Tested-by: NShawn Guo <shawn.guo@freescale.com>
Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6ebbf2ce

17 6月, 2014 1 次提交

ARM: choose highest resolution delay timer · 5930c1a1

由 Peter De Schrijver 提交于 6月 12, 2014

In case there are several possible delay timers, choose the one with the
highest resolution. This code relies on the fact secondary CPUs have not yet
been brought online when register_current_timer_delay() is called. This is
ensured by implementing calibration_delay_done(),
Signed-off-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NStephen Warren <swarren@nvidia.com>

5930c1a1

25 2月, 2014 2 次提交

ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull · d98b90ea

由 Victor Kamensky 提交于 2月 25, 2014

Renames logical shift macros, 'push' and 'pull', defined in
arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'.
That eliminates name conflict between 'push' logical shift macro
and 'push' instruction mnemonic. That allows assembler.h to be
included in .S files that use 'push' instruction.
Suggested-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

d98b90ea

ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics · c32ffce0

由 Will Deacon 提交于 2月 21, 2014

After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).

This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c32ffce0

29 12月, 2013 2 次提交

ARM: 7877/1: use built-in byte swap function · 017f161a

由 Kim Phillips 提交于 11月 06, 2013

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
Signed-off-by: NKim Phillips <kim.phillips@freescale.com>
Reviewed-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

017f161a

ARM: make kernel oops easier to read · ef41b5c9

由 Russell King 提交于 10月 20, 2013

We don't need the offset for the first function name in each backtrace
entry; this needlessly consumes screen space. This is virtually always
the first or second instruction in the called function.

Also, recognise stmfd instructions which include r10 as a valid stack
saving instruction, and when dumping the registers, dump six registers
per line rather than five, and fix the wrapping.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

ef41b5c9

01 12月, 2013 1 次提交

ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation · 11d4bb1b

由 Fabio Estevam 提交于 11月 30, 2013

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.
Reported-by: NTom Evans <tom_usenet@optusnet.com.au>
Suggested-by: NTom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

11d4bb1b

21 11月, 2013 1 次提交

ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP · b7ec6994

由 Will Deacon 提交于 11月 19, 2013

Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:

  arch/arm/lib/changebit.S: Assembler messages:
  arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
			not allowed for the current base architecture

This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.

This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.
Tested-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

b7ec6994

29 10月, 2013 1 次提交

ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware · a3a9ea65

由 Steven Capper 提交于 10月 14, 2013

The memory pinning code in uaccess_with_memcpy.c does not check
for HugeTLB or THP pmds, and will enter an infinite loop should
a __copy_to_user or __clear_user occur against a huge page.

This patch adds detection code for huge pages to pin_page_for_write.
As this code can be executed in a fast path it refers to the actual
pmds rather than the vma. If a HugeTLB or THP is found (they have
the same pmd representation on ARM), the page table spinlock is
taken to prevent modification whilst the page is pinned.

On ARM, huge pages are only represented as pmds, thus no huge pud
checks are performed. (For huge puds one would lock the page table
in a similar manner as in the pmd case).

Two helper functions are introduced; pmd_thp_or_huge will check
whether or not a page is huge or transparent huge (which have the
same pmd layout on ARM), and pmd_hugewillfault will detect whether
or not a page fault will occur on write to the page.

Running the following test (with the chunking from read_zero
removed):
 $ dd if=/dev/zero of=/dev/null bs=10M count=1024
Gave:  2.3 GB/s backed by normal pages,
       2.9 GB/s backed by huge pages,
       5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.

After some discussion, it was decided not to adopt the HPAGE_MASK,
as this would have a significant detrimental effect on the overall
system latency due to page_table_lock being held for too long.
This could be revisited if split huge page locks are adopted.
Signed-off-by: NSteve Capper <steve.capper@linaro.org>
Reviewed-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a3a9ea65

30 9月, 2013 1 次提交

ARM: bitops: prefetch the destination word for write prior to strex · d779c07d

由 Will Deacon 提交于 6月 27, 2013

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d779c07d

17 9月, 2013 1 次提交

ARM: delete mach-shark · 136dfa5e

由 Linus Walleij 提交于 9月 03, 2013

The Shark machine sub-architecture (also known as DNARD, the
DIGITAL Network Appliance Reference Design) lacks a maintainer
able to apply and test patches to modernize the architecture.

It is suspected that the current kernel, while it compiles,
does not even boot on this machine. The listed maintainer has
expressed that he will not be able to spend any time on the
maintenance for the coming year.

So let's delete it from the kernel for now. It can always be
resurrected with git revert if maintenance is resumed.

As the VIA82c505 PCI adapter was only used by this
architecture, that gets deleted too.

Cc: arm@kernel.org
Cc: Alexander Schulz <alex@shark-linux.de>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

136dfa5e

09 9月, 2013 1 次提交

ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled · 9319206d

由 Ard Biesheuvel 提交于 9月 09, 2013

Commit 01956597 introduced a NEON accelerated version of the xor_blocks()
function, but it needs the changes in this patch to allow it to be built
as a module rather than statically into the kernel.

This patch creates a separate module xor-neon.ko which exports the NEON
inner xor_blocks() functions depended upon by the regular xor.ko if it
is built with CONFIG_KERNEL_MODE_NEON=y
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

9319206d

15 7月, 2013 1 次提交

arm: delete __cpuinit/__CPUINIT usage from all ARM users · 8bd26e3a

由 Paul Gortmaker 提交于 6月 17, 2013

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
and are flagged as __cpuinit  -- so if we remove the __cpuinit from
the arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
related content into no-ops as early as possible, since that will get
rid of these warnings.  In any case, they are temporary and harmless.

This removes all the ARM uses of the __cpuinit macros from C code,
and all __CPUINIT from assembly code.  It also had two ".previous"
section statements that were paired off against __CPUINIT
(aka .section ".cpuinit.text") that also get removed here.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

8bd26e3a

09 7月, 2013 1 次提交

ARM: crypto: add NEON accelerated XOR implementation · 01956597

由 Ard Biesheuvel 提交于 5月 17, 2013

Add a source file xor-neon.c (which is really just the reference
C implementation passed through the GCC vectorizer) and hook it
up to the XOR framework.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NNicolas Pitre <nico@linaro.org>

01956597

03 4月, 2013 1 次提交

ARM: 7685/1: delay: use private ticks_per_jiffy field for timer-based delay ops · 6f3d90e5

由 Will Deacon 提交于 3月 28, 2013

Commit 70264367 ("ARM: 7653/2: do not scale loops_per_jiffy when
using a constant delay clock") fixed a problem with our timer-based
delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
by the timer delay ops.

This patch fixes the problem in a more elegant way by keeping a private
ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
and therefore not subject to scaling. The loop-based delay continues to
use loops_per_jiffy directly, as it should.
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6f3d90e5

12 3月, 2013 1 次提交

ARM: 7670/1: fix the memset fix · 418df63a

由 Nicolas Pitre 提交于 3月 12, 2013

Commit 455bd4c4 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value.  However the memset itself became broken
by that patch for misaligned pointers.

This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.

Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.

While at it, the entry instructions are slightly reworked to help dual
issue pipelines.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Tested-by: NAlexander Holler <holler@ahsoftware.de>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

418df63a

08 3月, 2013 1 次提交

ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations · 455bd4c4

由 Ivan Djelic 提交于 3月 06, 2013

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: NIvan Djelic <ivan.djelic@parrot.com>
Reviewed-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NDirk Behme <dirk.behme@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

455bd4c4

21 2月, 2013 1 次提交

ARM: 7653/2: do not scale loops_per_jiffy when using a constant delay clock · 70264367

由 Nicolas Pitre 提交于 2月 18, 2013

When udelay() is implemented using an architected timer, it is wrong
to scale loops_per_jiffy when changing the CPU clock frequency since
the timer clock remains constant.

The lpj should probably become an implementation detail relevant to
the CPU loop based delay routine only and more confined to it. In the
mean time this is the minimal fix needed to have expected delays with
the timer based implementation when cpufreq is also in use.
Reported-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Tested-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NLiviu Dudau <Liviu.Dudau@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

70264367

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功