提交 · 538bf4694898b19e76f32d554cc16135cf37b51c · OpenHarmony / kernel_linux

11 2月, 2016 6 次提交

ARM: 8513/1: xip: Move XIP linking to a separate file · 538bf469

由 Chris Brandt 提交于 2月 03, 2016

When building an XIP kernel, the linker script needs to be much different
than a conventional kernel's script. Over time, it's been difficult to
maintain both XIP and non-XIP layouts in one linker script. Therefore,
this patch separates the two procedures into two completely different
files.

The new linker script is essentially a straight copy of the current script
with all the non-CONFIG_XIP_KERNEL portions removed.

Additionally, all CONFIG_XIP_KERNEL portions have been removed from the
existing linker script...never to return again.

It should be noted that this does not fix any current XIP issues, but
rather is the first move in fixing them properly with subsequent patches.
Signed-off-by: NChris Brandt <chris.brandt@renesas.com>
Acked-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

538bf469

ARM: 8511/1: ARM64: kernel: PSCI: move PSCI idle management code to drivers/firmware · 8b6f2499

由 Lorenzo Pieralisi 提交于 2月 01, 2016

ARM64 PSCI kernel interfaces that initialize idle states and implement
the suspend API to enter them are generic and can be shared with the
ARM architecture.

To achieve that goal, this patch moves ARM64 PSCI idle management
code to drivers/firmware, so that the interface to initialize and
enter idle states can actually be shared by ARM and ARM64 arches
back-ends.

The ARM generic CPUidle implementation also requires the definition of
a cpuidle_ops section entry for the kernel to initialize the CPUidle
operations at boot based on the enable-method (ie ARM64 has the
statically initialized cpu_ops counterparts for that purpose); therefore
this patch also adds the required section entry on CONFIG_ARM for PSCI so
that the kernel can initialize the PSCI CPUidle back-end when PSCI is
the probed enable-method.

On ARM64 this patch provides no functional change.
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arch/arm64]
Acked-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NJisheng Zhang <jszhang@marvell.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

8b6f2499

ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies · 1b9bdf5c

由 Lorenzo Pieralisi 提交于 2月 01, 2016

The code enabled by the ARM_CPU_SUSPEND config option is used by
kernel subsystems for purposes that go beyond system suspend so its
config entry should be augmented to take more default options into
account and avoid forcing its selection to prevent dependencies
override.

To achieve this goal, this patch reworks the ARM_CPU_SUSPEND config
entry and updates its default config value (by adding the BL_SWITCHER
option to it) and its dependencies (ARCH_SUSPEND_POSSIBLE), so that the
symbol is still selected by default by the subsystems requiring it and
at the same time enforcing the dependencies correctly.
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

1b9bdf5c

ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc · 14d3ae2e

由 Doug Anderson 提交于 1月 29, 2016

If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory. The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
Please note that mapping memory with larger pages significantly
improves performance, especially when IOMMU has a little TLB
cache. This can be easily observed when multimedia devices do
processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory. Decoding video is also pretty heavy and the TLB misses aren't a
huge deal. Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full. In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory. This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out. In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another. Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly. Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks. By eating all the big
chunks we're causing extra work for the rest of the system. We also may
start making other memory allocations fail. While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

14d3ae2e

ARM: 8505/1: dma-mapping: Optimize allocation · 33298ef6

由 Doug Anderson 提交于 1月 29, 2016

The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers.  Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes.  The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.

The current function is very, very slow.

One problem is the way it keeps trying and trying to allocate big
chunks.  Imagine a very fragmented memory that has 4M free but no
contiguous pages at all.  Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
  - Try to allocate order 10 (no retry)
  - Try to allocate order 9 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- For page 2:
  - Try to allocate order 9 (no retry)
  - Try to allocate order 8 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- ...
- ...

Total number of calls to alloc() calls for this case is:
  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
  => 9228

The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases.  I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).

A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them.  We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.

Let's adjust the allocation like this:

1. If a big chunk fails, stop trying to hard and bump down to lower
   order allocations.
2. Don't try useless orders.  The whole point of big chunks is to
   optimize the TLB and it can really only make use of 2M, 1M, 64K and
   4K sizes.

We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users.  A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks.  Presumably this would
be a sane strategy for some callers.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Reviewed-by: NTomasz Figa <tfiga@chromium.org>
Tested-by: NJavier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

33298ef6

ARM: 8504/1: __arch_xprod_64(): small optimization · 73e592f3

由 Nicolas Pitre 提交于 1月 27, 2016

The tmp variable is used twice: first to pose as a register containing
a value of zero, and then to provide a temporary register that initially
is zero and get added some value. But somehow gcc decides to split those
two usages in different registers.

Example code:

u64 div64const1000(u64 x)
{
	u32 y = 1000;
	do_div(x, y);
	return x;
}

Result:

div64const1000:
	push	{r4, r5, r6, r7, lr}
	mov	lr, #0
	mov	r6, r0
	mov	r7, r1
	adr	r5, .L8
	ldrd	r4, [r5]
	mov	r1, lr
	umull	r2, r3, r4, r6
	cmn	r2, r4
	adcs	r3, r3, r5
	adc	r2, lr, #0
	umlal	r3, r2, r5, r6
	umlal	r3, r1, r4, r7
	mov	r3, #0
	adds	r2, r1, r2
	adc	r3, r3, #0
	umlal	r2, r3, r5, r7
	lsr	r0, r2, #9
	lsr	r1, r3, #9
	orr	r0, r0, r3, lsl #23
	pop	{r4, r5, r6, r7, pc}
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

Full kernel build size:

   text	   data	    bss	    dec	    hex	filename
13663814	1553940	 351368	15569122	 ed90e2	vmlinux

Here the two instances of 'tmp' are assigned to r1 and lr.

To avoid that, let's mark the first 'tmp' usage in __arch_xprod_64()
with a "+r" constraint even if the register is not written to, so to
create a dependency for the second usage with the effect of enforcing
a single temporary register throughout.

Result:

div64const1000:
	push	{r4, r5, r6, r7}
	movs	r3, #0
	adr	r5, .L8
	ldrd	r4, [r5]
	umull	r6, r7, r4, r0
	cmn	r6, r4
	adcs	r7, r7, r5
	adc	r6, r3, #0
	umlal	r7, r6, r5, r0
	umlal	r7, r3, r4, r1
	mov	r7, #0
	adds	r6, r3, r6
	adc	r7, r7, #0
	umlal	r6, r7, r5, r1
	lsr	r0, r6, #9
	lsr	r1, r7, #9
	orr	r0, r0, r7, lsl #23
	pop	{r4, r5, r6, r7}
	bx	lr
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

   text	   data	    bss	    dec	    hex	filename
13663438	1553940	 351368	15568746	 ed8f6a	vmlinux

This time 'tmp' is assigned to r3 and used throughout. However, by being
assigned to r3, that blocks usage of the r2-r3 double register slot for
64-bit values, forcing more registers to be spilled on the stack. Let's
try to help it by forcing 'tmp' to the caller-saved ip register.

Result:

div64const1000:
	stmfd	sp!, {r4, r5}
	mov	ip, #0
	adr	r5, .L8
	ldrd	r4, [r5]
	umull	r2, r3, r4, r0
	cmn	r2, r4
	adcs	r3, r3, r5
	adc	r2, ip, #0
	umlal	r3, r2, r5, r0
	umlal	r3, ip, r4, r1
	mov	r3, #0
	adds	r2, ip, r2
	adc	r3, r3, #0
	umlal	r2, r3, r5, r1
	mov	r0, r2, lsr #9
	mov	r1, r3, lsr #9
	orr	r0, r0, r3, asl #23
	ldmfd	sp!, {r4, r5}
	bx	lr
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

   text	   data	    bss	    dec	    hex	filename
13662838	1553940	 351368	15568146	 ed8d12	vmlinux

We could make the code marginally smaller yet by forcing 'tmp' to lr
instead, but that would have a negative inpact on branch prediction for
which "bx lr" is optimal.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

73e592f3

08 2月, 2016 3 次提交

ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA · 25362dc4

由 Kees Cook 提交于 1月 26, 2016

The use of CONFIG_DEBUG_RODATA is generally seen as an essential part of
kernel self-protection:
http://www.openwall.com/lists/kernel-hardening/2015/11/30/13
Additionally, its name has grown to mean things beyond just rodata. To
get ARM closer to this, we ought to rearrange the names of the configs
that control how the kernel protects its memory. What was called
CONFIG_ARM_KERNMEM_PERMS is realy doing the work that other architectures
call CONFIG_DEBUG_RODATA.

This redefines CONFIG_DEBUG_RODATA to actually do the bulk of the
ROing (and NXing). In the place of the old CONFIG_DEBUG_RODATA, use
CONFIG_DEBUG_ALIGN_RODATA, since that's what the option does: adds
section alignment for making rodata explicitly NX, as arm does not split
the page tables like arm64 does without _ALIGN_RODATA.

Also adds human readable names to the sections so I could more easily
debug my typos, and makes CONFIG_DEBUG_RODATA default "y" for CPU_V7.

Results in /sys/kernel/debug/kernel_page_tables for each config state:

 # CONFIG_DEBUG_RODATA is not set
 # CONFIG_DEBUG_ALIGN_RODATA is not set

---[ Kernel Mapping ]---
0x80000000-0x80900000           9M     RW x  SHD
0x80900000-0xa0000000         503M     RW NX SHD

 CONFIG_DEBUG_RODATA=y
 CONFIG_DEBUG_ALIGN_RODATA=y

---[ Kernel Mapping ]---
0x80000000-0x80100000           1M     RW NX SHD
0x80100000-0x80700000           6M     ro x  SHD
0x80700000-0x80a00000           3M     ro NX SHD
0x80a00000-0xa0000000         502M     RW NX SHD

 CONFIG_DEBUG_RODATA=y
 # CONFIG_DEBUG_ALIGN_RODATA is not set

---[ Kernel Mapping ]---
0x80000000-0x80100000           1M     RW NX SHD
0x80100000-0x80a00000           9M     ro x  SHD
0x80a00000-0xa0000000         502M     RW NX SHD
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NLaura Abbott <labbott@fedoraproject.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

25362dc4

ARM: use virt_to_idmap() for soft_restart() · 4138323e

由 Russell King 提交于 1月 11, 2016

Code run via soft_restart() is run with the MMU disabled, so we need to
pass the identity map physical address rather than the address obtained
from virt_to_phys(). Therefore, replace virt_to_phys() with
virt_to_idmap() for all callers of soft_restart().
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

4138323e

ARM: make virt_to_idmap() return unsigned long · 28410293

由 Russell King 提交于 1月 11, 2016

Make virt_to_idmap() return an unsigned long rather than phys_addr_t.

Returning phys_addr_t here makes no sense, because the definition of
virt_to_idmap() is that it shall return a physical address which maps
identically with the virtual address. Since virtual addresses are
limited to 32-bit, identity mapped physical addresses are as well.

Almost all users already had an implicit narrowing cast to unsigned long
so let's make this official and part of this interface.
Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

28410293

27 1月, 2016 3 次提交

ARM: 8499/1: irq: l2c: do not print error in case of missing l2c from · 9023cc82

由 Andiii 提交于 1月 14, 2016

arm: irq: l2c: do not print error in case of missing l2c from dtb

In some architectures the L2 cache controller is integrated in the
processor's block itself and it doesn't use any external cache
controller. This means that an entry in the board's dtb related
to the l2c is not necessary.

Distinguish between error codes and do not print anything in case
l2x0_of_init() doesn't find any L2C DTB entry and returns -ENODEV.

This patch mutes the following error message:

   L2C: failed to init: -19

on boards like odroid-xu4, cortex A7/A15, which don't have
external cache controller.
Signed-off-by: NAndi Shyti <andi.shyti@samsung.com>
Reported-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

9023cc82

ARM: 8497/1: initialize cpu_scale to its default · d78e13a8

由 Juri Lelli 提交于 1月 07, 2016

Instead of looping through all cpus calling set_capacity_scale, we can
initialise cpu_scale per-cpu variables to SCHED_CAPACITY_SCALE with their
definition.
Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

d78e13a8

ARM: orion: implement ARM delay timer · f19768ce

由 Russell King 提交于 10月 19, 2015

Implement an ARM delay timer to be used for udelay() on orion legacy
platforms.  This allows us to skip the delay loop calibration at boot.

It also means that udelay() will be unaffected by CPU frequency changes
when cpufreq is enabled on these platforms.
Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Acked-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

f19768ce

24 1月, 2016 28 次提交

MIPS: zboot: Add support for serial debug using the PROM · dbb98314

由 Alban Bedel 提交于 12月 10, 2015

As most platforms implement the PROM serial interface prom_putchar()
add a simple bridge to allow re-using this code for zboot.
Signed-off-by: NAlban Bedel <albeu@free.fr>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/11811/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

dbb98314

MIPS: zboot: Avoid useless rebuilds · 25f66096

由 Alban Bedel 提交于 12月 10, 2015

Add dummy.o to the targets list, and fill targets automatically from
$(vmlinuzobjs) to avoid having to maintain two lists.

When building with XZ compression copy ashldi3.c to the build
directory to use a different object file for the kernel and zboot.
Without this the same object file need to be build with different
flags which cause a rebuild at every run.
Signed-off-by: NAlban Bedel <albeu@free.fr>
Cc: linux-mips@linux-mips.org
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/11810/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

25f66096

MIPS: BMIPS: Enable ARCH_WANT_OPTIONAL_GPIOLIB · a7b43812

由 Florian Fainelli 提交于 1月 06, 2016

Allow BMIPS_GENERIC supported platforms to build GPIO controller
drivers.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NDragan Stancevic <dragan.stancevic@gmail.com>
Cc: cernekee@gmail.com
Cc: jaedon.shin@gmail.com
Cc: gregory.0xf0@gmail.com
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12019/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

a7b43812

MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function · 5bdb102b

由 Simon Arlott 提交于 12月 13, 2015

Remove bcm63xx_nvram_get_psi_size() as it now has no users.
Signed-off-by: NSimon Arlott <simon@fire.lp0.eu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: MIPS Mailing List <linux-mips@linux-mips.org>
Cc: MTD Maling List <linux-mtd@lists.infradead.org>
Patchwork: https://patchwork.linux-mips.org/patch/11836/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

5bdb102b

MIPS: bcm963xx: Move Broadcom BCM963xx image tag data structure · 8fce60b8

由 Simon Arlott 提交于 12月 13, 2015

Move Broadcom BCM963xx image tag data structure to include/linux/
so that drivers outside of mach-bcm63xx can use it.
Signed-off-by: NSimon Arlott <simon@fire.lp0.eu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: MIPS Mailing List <linux-mips@linux-mips.org>
Cc: MTD Maling List <linux-mtd@lists.infradead.org>
Patchwork: https://patchwork.linux-mips.org/patch/11832/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

8fce60b8

MIPS: bcm63xx: nvram: Use nvram structure definition from header file · 5a8b0b13

由 Simon Arlott 提交于 12月 13, 2015

Use the common definition of the nvram structure from the header file
include/linux/bcm963xx_nvram.h instead of maintaining a separate copy.

Read the version 5 size of nvram data from memory and then call the
new checksum verification function from the header file.
Signed-off-by: NSimon Arlott <simon@fire.lp0.eu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: MIPS Mailing List <linux-mips@linux-mips.org>
Cc: MTD Maling List <linux-mtd@lists.infradead.org>
Patchwork: https://patchwork.linux-mips.org/patch/11831/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

5a8b0b13

MIPS: KVM: Add missing newline to kvm_err() · f7fdcb60