提交 · b2d8b0cb6ca9cb81dd71626642f764ac70d10813 · openanolis / cloud-kernel

26 8月, 2016 9 次提交

Revert "arm64: hibernate: Refuse to hibernate if the boot cpu is offline" · b2d8b0cb

由 James Morse 提交于 8月 17, 2016

Now that we use the MPIDR to resume on the same CPU that we hibernated on,
we no longer need to refuse to hibernate if the boot cpu is offline. (Which
we can't possibly know if kexec causes logical CPUs to be renumbered).

This reverts commit 1fe492ce.
Signed-off-by: NJames Morse <james.morse@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b2d8b0cb

arm64: hibernate: Resume when hibernate image created on non-boot CPU · 8ec058fd

由 James Morse 提交于 8月 17, 2016

disable_nonboot_cpus() assumes that the lowest numbered online CPU is
the boot CPU, and that this is the correct CPU to run any power
management code on.

On arm64 CPU0 can be taken offline. For hibernate/resume this means we
may hibernate on a CPU other than CPU0. If the system is rebooted with
kexec 'CPU0' will be assigned to a different CPU. This complicates
hibernate/resume as now we can't trust the CPU numbers.

We currently forbid hibernate if CPU0 has been hotplugged out to avoid
this situation without kexec.

Save the MPIDR of the CPU we hibernated on in the hibernate arch-header,
use hibernate_resume_nonboot_cpu_disable() to direct which CPU we should
resume on based on the MPIDR of the CPU we hibernated on. This allows us to
hibernate/resume on any CPU, even if the logical numbers have been
shuffled by kexec.
Signed-off-by: NJames Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8ec058fd

arm64: always enable DEBUG_RODATA and remove the Kconfig option · 40982fd6

由 Mark Rutland 提交于 8月 25, 2016

Follow the example set by x86 in commit 9ccaf77c ("x86/mm:
Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option"), and
make these protections a fundamental security feature rather than an
opt-in. This also results in a minor code simplification.

For those rare cases when users wish to disable this protection (e.g.
for debugging), this can be done by passing 'rodata=off' on the command
line.

As DEBUG_RODATA_ALIGN is only intended to address a performance/memory
tradeoff, and does not affect correctness, this is left user-selectable.
DEBUG_MODULE_RONX is also left user-selectable until the core code
provides a boot-time option to disable the protection for debugging
use-cases.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

40982fd6

arm64: mark reserved memblock regions explicitly in iomem · e7cd1903

由 AKASHI Takahiro 提交于 8月 22, 2016

Kdump(kexec-tools) parses /proc/iomem to identify all the memory regions
on the system. Since the current kernel names "nomap" regions, like UEFI
runtime services code/data, as "System RAM," kexec-tools sets up elf core
header to include them in a crash dump file (/proc/vmcore).

Then crash dump kernel parses UEFI memory map again, re-marks those regions
as "nomap" and does not create a memory mapping for them unlike the other
areas of System RAM. In this case, copying /proc/vmcore through
copy_oldmem_page() on crash dump kernel will end up with a kernel abort,
as reported in [1].

This patch names all the "nomap" regions explicitly as "reserved" so that
we can exclude them from a crash dump file. acpi_os_ioremap() must also
be modified because those regions have WB attributes [2].

Apart from kdump, this change also matches x86's use of acpi (and
/proc/iomem).

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/448186.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450089.htmlReviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e7cd1903

arm64: hibernate: Support DEBUG_PAGEALLOC · 5ebe3a44

由 James Morse 提交于 8月 24, 2016

DEBUG_PAGEALLOC removes the valid bit of page table entries to prevent
any access to unallocated memory. Hibernate uses this as a hint that those
pages don't need to be saved/restored. This patch adds the
kernel_page_present() function it uses.

hibernate.c copies the resume kernel's linear map for use during restore.
Add _copy_pte() to fill-in the holes made by DEBUG_PAGEALLOC in the resume
kernel, so we can restore data the original kernel had at these addresses.

Finally, DEBUG_PAGEALLOC means the linear-map alias of KERNEL_START to
KERNEL_END may have holes in it, so we can't lazily clean this whole
area to the PoC. Only clean the new mmuoff region, and the kernel/kvm
idmaps.

This reverts commit da24eb1f.
Reported-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5ebe3a44

arm64: vmlinux.ld: Add mmuoff data sections and move mmuoff text into idmap · b6113038

由 James Morse 提交于 8月 24, 2016

Resume from hibernate needs to clean any text executed by the kernel with
the MMU off to the PoC. Collect these functions together into the
.idmap.text section as all this code is tightly coupled and also needs
the same cleaning after resume.

Data is more complicated, secondary_holding_pen_release is written with
the MMU on, clean and invalidated, then read with the MMU off. In contrast
__boot_cpu_mode is written with the MMU off, the corresponding cache line
is invalidated, so when we read it with the MMU on we don't get stale data.
These cache maintenance operations conflict with each other if the values
are within a Cache Writeback Granule (CWG) of each other.
Collect the data into two sections .mmuoff.data.read and .mmuoff.data.write,
the linker script ensures mmuoff.data.write section is aligned to the
architectural maximum CWG of 2KB.
Signed-off-by: NJames Morse <james.morse@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b6113038

arm64: Create sections.h · ee78fdc7

由 James Morse 提交于 8月 24, 2016

Each time new section markers are added, kernel/vmlinux.ld.S is updated,
and new extern char __start_foo[] definitions are scattered through the
tree.

Create asm/include/sections.h to collect these definitions (and include
the existing asm-generic version).
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ee78fdc7

arm64: Introduce execute-only page access permissions · cab15ce6

由 Catalin Marinas 提交于 8月 11, 2016

The ARMv8 architecture allows execute-only user permissions by clearing
the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU
implementation without User Access Override (ARMv8.2 onwards) can still
access such page, so execute-only page permission does not protect
against read(2)/write(2) etc. accesses. Systems requiring such
protection must enable features like SECCOMP.

This patch changes the arm64 __P100 and __S100 protection_map[] macros
to the new __PAGE_EXECONLY attributes. A side effect is that
pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't
set. To work around this, the check is done on the PTE_NG bit via the
pte_ng() macro. VM_READ is also checked now for page faults.
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

cab15ce6

arm64: kprobe: Always clear pstate.D in breakpoint exception handler · 7419333f

由 Pratyush Anand 提交于 8月 22, 2016

Whenever we are hitting a kprobe from a none-kprobe debug exception handler,
we hit an infinite occurrences of "Unexpected kernel single-step exception
at EL1"

PSTATE.D is debug exception mask bit. It is set whenever we enter into an
exception mode. When it is set then Watchpoint, Breakpoint, and Software
Step exceptions are masked. However, software Breakpoint Instruction
exceptions can never be masked. Therefore, if we ever execute a BRK
instruction, irrespective of D-bit setting, we will be receiving a
corresponding breakpoint exception.

For example:

- We are executing kprobe pre/post handler, and kprobe has been inserted in
  one of the instruction of a function called by handler. So, it executes
  BRK instruction and we land into the case of KPROBE_REENTER. (This case is
  already handled by current code)

- We are executing uprobe handler or any other BRK handler such as in
  WARN_ON (BRK BUG_BRK_IMM), and we trace that path using kprobe.So, we
  enter into kprobe breakpoint handler,from another BRK handler.(This case
  is not being handled currently)

In all such cases kprobe breakpoint exception will be raised when we were
already in debug exception mode. SPSR's D bit (bit 9) shows the value of
PSTATE.D immediately before the exception was taken. So, in above example
cases we would find it set in kprobe breakpoint handler.  Single step
exception will always be followed by a kprobe breakpoint exception.However,
it will only be raised gracefully if we clear D bit while returning from
breakpoint exception.  If D bit is set then, it results into undefined
exception and when it's handler enables dbg then single step exception is
generated, however it will never be handled(because address does not match
and therefore treated as unexpected).

This patch clears D-flag unconditionally in setup_singlestep, so that we can
always get single step exception correctly after returning from breakpoint
exception. Additionally, it also removes D-flag set statement for
KPROBE_REENTER return path, because debug exception for KPROBE_REENTER will
always take place in a debug exception state. So, D-flag will already be set
in this case.
Acked-by: NSandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NPratyush Anand <panand@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

7419333f

22 8月, 2016 10 次提交

arm64: head.S: get rid of x25 and x26 with 'global' scope · aea73abb

由 Ard Biesheuvel 提交于 8月 16, 2016

Currently, x25 and x26 hold the physical addresses of idmap_pg_dir
and swapper_pg_dir, respectively, when running early boot code. But
having registers with 'global' scope in files that contain different
sections with different lifetimes, and that are called by different
CPUs at different times is a bit messy, especially since stashing the
values does not buy us anything in terms of code size or clarity.

So simply replace each reference to x25 or x26 with an adrp instruction
referring to idmap_pg_dir or swapper_pg_dir directly.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

aea73abb

arm64: apply __ro_after_init to some objects · 5a9e3e15

由 Jisheng Zhang 提交于 8月 15, 2016

These objects are set during initialization, thereafter are read only.

Previously I only want to mark vdso_pages, vdso_spec, vectors_page and
cpu_ops as __read_mostly from performance point of view. Then inspired
by Kees's patch[1] to apply more __ro_after_init for arm, I think it's
better to mark them as __ro_after_init. What's more, I find some more
objects are also read only after init. So apply __ro_after_init to all
of them.

This patch also removes global vdso_pagelist and tries to clean up
vdso_spec[] assignment code.

[1] http://www.spinics.net/lists/arm-kernel/msg523188.htmlAcked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5a9e3e15

arm64: vdso: constify vm_special_mapping used for aarch32 vectors page · b6d081bd

由 Jisheng Zhang 提交于 8月 15, 2016

The vm_special_mapping spec which is used for aarch32 vectors page is
never modified, so mark it as const.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b6d081bd

arm64: vdso: add __init section marker to alloc_vectors_page · 1aed28f9

由 Jisheng Zhang 提交于 8月 15, 2016

It is not needed after booting, this patch moves the alloc_vectors_page
function to the __init section.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1aed28f9

arm64: remove redundant "select HAVE_CLK" · 67060ed1

由 Masahiro Yamada 提交于 8月 16, 2016

HAVE_CLK is select'ed by CLKDEV_LOOKUP, which is select'ed by
COMMON_CLK, which is select'ed by ARM64.  No sub-architecture
needs to select HAVE_CLK explicitly.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

67060ed1

arm64: remove traces of perf_ops_bp · da752563

由 Mark Rutland 提交于 8月 11, 2016

Even though perf_ops_bp was removed/renamed back in commit
b0a873eb ("perf: Register PMU implementations"), as part of
v2.6.37, its definition still lives on in some arch headers.

This patch removes the vestigal definition from arm64.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

da752563

arm64: perf: Use the builtin_platform_driver · 826d0562

由 Kefeng Wang 提交于 8月 10, 2016

Use the builtin_platform_driver() to simplify code.
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

826d0562

arm64: mm: convert __dma_* routines to use start, size · d34fdb70

由 Kwangwoo Lee 提交于 8月 02, 2016

__dma_* routines have been converted to use start and size instread of
start and end addresses. The patch was origianlly for adding
__clean_dcache_area_poc() which will be used in pmem driver to clean
dcache to the PoC(Point of Coherency) in arch_wb_cache_pmem().

The functionality of __clean_dcache_area_poc()  was equivalent to
__dma_clean_range(). The difference was __dma_clean_range() uses the end
address, but __clean_dcache_area_poc() uses the size to clean.

Thus, __clean_dcache_area_poc() has been revised with a fallthrough
function of __dma_clean_range() after the change that __dma_* routines
use start and size instead of using start and end.

As a consequence of using start and size, the name of __dma_* routines
has also been altered following the terminology below:
    area: takes a start and size
    range: takes a start and end
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NKwangwoo Lee <kwangwoo.lee@sk.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d34fdb70

arm64: factor work_pending state machine to C · 421dd6fa

由 Chris Metcalf 提交于 7月 14, 2016

Currently ret_fast_syscall, work_pending, and ret_to_user form an ad-hoc
state machine that can be difficult to reason about due to duplicated
code and a large number of branch targets.

This patch factors the common logic out into the existing
do_notify_resume function, converting the code to C in the process,
making the code more legible.

This patch tries to closely mirror the existing behaviour while using
the usual C control flow primitives. As local_irq_{disable,enable} may
be instrumented, we balance exception entry (where we will almost most
likely enable IRQs) with a call to trace_hardirqs_on just before the
return to userspace.
Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

421dd6fa

arm64: hibernate: reduce TLB maintenance scope · 0a7d87a7

由 Mark Rutland 提交于 8月 08, 2016

In break_before_make_ttbr_switch we perform broadcast TLB maintenance
for the inner shareable domain, and use a DSB ISH to complete this.
However, at the point we execute this, secondary CPUs are either
physically offline, or executing code outside of the kernel. Upon
entering the kernel, secondary CPUs will invalidate their TLBs before
enabling their MMUs.

Thus we do not need to invalidate TLBs of other CPUs, and as with
idmap_cpu_replace_ttbr1 we can reduce the scope of maintenance to the
TLBs of the local CPU. This keeps our TLB maintenance code consistent,
and is a minor optimisation.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

0a7d87a7

20 8月, 2016 2 次提交

parisc: Fix order of EREFUSED define in errno.h · 3eb53b20

由 Helge Deller 提交于 8月 20, 2016

When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.

Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.

Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org

3eb53b20

parisc: Fix automatic selection of cr16 clocksource · ae141830

由 Helge Deller 提交于 8月 19, 2016

Commit 54b66800 (parisc: Add native high-resolution sched_clock()
implementation) added support to use the CPU-internal cr16 counters as reliable
clocksource with the help of HAVE_UNSTABLE_SCHED_CLOCK.

Sadly the commit missed to remove the hack which prevented cr16 to become the
default clocksource even on SMP systems.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.7+

ae141830

18 8月, 2016 5 次提交

arm64: Fix shift warning in arch/arm64/mm/dump.c · a93a4d62

由 Catalin Marinas 提交于 12月 05, 2014

When building with 48-bit VAs and 16K page configuration, it's possible
to get the following warning when building the arm64 page table dumping
code:

arch/arm64/mm/dump.c: In function ‘walk_pud’:
arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow]

This is because pud_offset(pgd, 0) performs a shift to the right by 36
while the value 0 has the type 'int' by default, therefore 32-bit.

This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to
use 0UL for the address argument.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

a93a4d62

x86/smp: Fix __max_logical_packages value setup · 7b0501b1

由 Jiri Olsa 提交于 8月 15, 2016

Frank reported kernel panic when he disabled several cores in BIOS
via following option:

  Core Disable Bitmap(Hex)   [0]

with number 0xFFE, which leaves 16 CPUs in system (out of 48).

The kernel panic below goes along with following messages:

 smpboot: Max logical packages: 2^M
 smpboot: APIC(0) Converting physical 0 to logical package 0^M
 smpboot: APIC(20) Converting physical 1 to logical package 1^M
 smpboot: APIC(40) Package 2 exceeds logical package map^M
 smpboot: CPU 8 APICId 40 disabled^M
 smpboot: APIC(60) Package 3 exceeds logical package map^M
 smpboot: CPU 12 APICId 60 disabled^M
 ...
 general protection fault: 0000 [#1] SMP^M
 Modules linked in:^M
 CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
 Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
 task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
 RIP: 0010:[<ffffffff81014d54>]  [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
 ...
  [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
  [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
  [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
  [<ffffffff81002190>] do_one_initcall+0x50/0x190^M
  [<ffffffff810ab193>] ? parse_args+0x293/0x480^M
  [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
  [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
  [<ffffffff816dc19e>] kernel_init+0xe/0x110^M
  [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
  [<ffffffff816dc190>] ? rest_init+0x80/0x80^M

The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).

The __max_logical_packages is computed as:

  DIV_ROUND_UP(total_cpus, ncpus);
  - ncpus being number of cores

With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).

Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.

The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.

After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.

If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.

The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.

Prarit Bhargava tested the patch and confirms that it solves
the problem:

  From dmidecode:
          Core Count: 24
          Core Enabled: 24
          Thread Count: 48

Orig kernel boot log:

 [    0.464981] smpboot: Max logical packages: 19
 [    0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3

1.  nr_cpus=8, should stop enumerating in package 0:

 [    0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.539596] smpboot: Max logical packages: 19

2.  max_cpus=8, should still enumerate all packages:

 [    0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.550524] smpboot: Max logical packages: 19

3.  nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
    package 2:

 [    0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.539368] smpboot: Max logical packages: 19

4.  maxcpus=49, should still enumerate all packages:

 [    0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.549624] smpboot: Max logical packages: 19

5.  kdump (nr_cpus=1) works as well.
Reported-by: NFrank Ramsay <framsay@redhat.com>
Tested-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Reviewed-by: NPrarit Bhargava <prarit@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>

7b0501b1

x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y · 88b2f634

由 Borislav Petkov 提交于 8月 17, 2016

Similar to:

  efaad554 ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")

... fix microcode loading from the initrd on AMD by adding the
randomization offset to the microcode patch container within the initrd.
Reported-and-tested-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

88b2f634

arm64: kernel: avoid literal load of virtual address with MMU off · bc9f3d77

由 Ard Biesheuvel 提交于 8月 17, 2016

Literal loads of virtual addresses are subject to runtime relocation when
CONFIG_RELOCATABLE=y, and given that the relocation routines run with the
MMU and caches enabled, literal loads of relocated values performed with
the MMU off are not guaranteed to return the latest value unless the
memory covering the literal is cleaned to the PoC explicitly.

So defer the literal load until after the MMU has been enabled, just like
we do for primary_switch() and secondary_switch() in head.S.

Fixes: 1e48ef7f ("arm64: add support for building vmlinux as a relocatable PIE binary")
Cc: <stable@vger.kernel.org> # 4.6+
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

bc9f3d77

arm64: Fix NUMA build error when !CONFIG_ACPI · bfe6c8a8

由 Catalin Marinas 提交于 8月 15, 2016

Since asm/acpi.h is only included by linux/acpi.h when CONFIG_ACPI is
enabled, disabling the latter leads to the following build error on
arm64:

arch/arm64/mm/numa.c: In function ‘arm64_numa_init’:
arch/arm64/mm/numa.c:395:24: error: ‘arm64_acpi_numa_init’ undeclared (first use in this function)
   if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))

This patch include the asm/acpi.h explicitly in arch/arm64/mm/numa.c for
the arm64_acpi_numa_init() definition.

Fixes: d8b47fca ("arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT")
Reviewed-by: NHanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

bfe6c8a8

16 8月, 2016 1 次提交

x86/power/64: Use __pa() for physical address computation · 5d87f493

由 Rafael J. Wysocki 提交于 8月 14, 2016

The value of temp_level4_pgt is the physical address of the
top-level page directory, so use __pa() to compute it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NIngo Molnar <mingo@kernel.org>

5d87f493

13 8月, 2016 7 次提交

h8300: Add missing include file to asm/io.h · 2b05980d

由 Guenter Roeck 提交于 6月 08, 2016

h8300 builds fail with

arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’

and many related errors.

Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix")
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

2b05980d

unicore32: mm: Add missing parameter to arch_vma_access_permitted · 783011b1

由 Guenter Roeck 提交于 3月 21, 2016

unicore32 fails to compile with the following errors.

mm/memory.c: In function ‘__handle_mm_fault’:
mm/memory.c:3381: error:
	too many arguments to function ‘arch_vma_access_permitted’
mm/gup.c: In function ‘check_vma_flags’:
mm/gup.c:456: error:
	too many arguments to function ‘arch_vma_access_permitted’
mm/gup.c: In function ‘vma_permits_fault’:
mm/gup.c:640: error:
	too many arguments to function ‘arch_vma_access_permitted’

Fixes: d61172b4 ("mm/core, x86/mm/pkeys: Differentiate instruction fetches")
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Acked-by: NGuan Xuetao <gxt@mprc.pku.edu.cn>

783011b1

arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO · 53fb45d3

由 Masahiro Yamada 提交于 6月 08, 2016

When CONFIG_LOCALVERSION_AUTO is disabled, the version string is
just a tag name (or with a '+' appended if HEAD is not a tagged
commit).

During the development (and especially when git-bisecting), longer
version string would be helpful to identify the commit we are running.

This is a default y option, so drop the unset to enable it.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

53fb45d3

arm64: defconfig: add options for virtualization and containers · 2323439f

由 Riku Voipio 提交于 6月 15, 2016

Enable options commonly needed by popular virtualization
and container applications. Use modules when possible to
avoid too much overhead for users not interested.

- add namespace and cgroup options needed
- add seccomp - optional, but enhances Qemu etc
- bridge, nat, veth, macvtap and multicast for routing
  guests and containers
- btfrs and overlayfs modules for container COW backends
- while near it, make fuse a module instead of built-in.

Generated with make saveconfig and dropping unrelated spurious
change hunks while commiting. bloat-o-meter old-vmlinux vmlinux:

add/remove: 905/390 grow/shrink: 767/229 up/down: 183513/-94861 (88652)
....
Total: Before=10515408, After=10604060, chg +0.84%
Signed-off-by: NRiku Voipio <riku.voipio@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2323439f

arm64: hibernate: handle allocation failures · dfbca61a

由 Mark Rutland 提交于 8月 11, 2016

In create_safe_exec_page(), we create a copy of the hibernate exit text,
along with some page tables to map this via TTBR0. We then install the
new tables in TTBR0.

In swsusp_arch_resume() we call create_safe_exec_page() before trying a
number of operations which may fail (e.g. copying the linear map page
tables). If these fail, we bail out of swsusp_arch_resume() and return
an error code, but leave TTBR0 as-is. Subsequently, the core hibernate
code will call free_basic_memory_bitmaps(), which will free all of the
memory allocations we made, including the page tables installed in
TTBR0.

Thus, we may have TTBR0 pointing at dangling freed memory for some
period of time. If the hibernate attempt was triggered by a user
requesting a hibernate test via the reboot syscall, we may return to
userspace with the clobbered TTBR0 value.

Avoid these issues by reorganising swsusp_arch_resume() such that we
have no failure paths after create_safe_exec_page(). We also add a check
that the zero page allocation succeeded, matching what we have for other
allocations.

Fixes: 82869ac5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NJames Morse <james.morse@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

dfbca61a

arm64: hibernate: avoid potential TLB conflict · 0194e760

由 Mark Rutland 提交于 8月 11, 2016

In create_safe_exec_page we install a set of global mappings in TTBR0,
then subsequently invalidate TLBs. While TTBR0 points at the zero page,
and the TLBs should be free of stale global entries, we may have stale
ASID-tagged entries (e.g. from the EFI runtime services mappings) for
the same VAs. Per the ARM ARM these ASID-tagged entries may conflict
with newly-allocated global entries, and we must follow a
Break-Before-Make approach to avoid issues resulting from this.

This patch reworks create_safe_exec_page to invalidate TLBs while the
zero page is still in place, ensuring that there are no potential
conflicts when the new TTBR0 value is installed. As a single CPU is
online while this code executes, we do not need to perform broadcast TLB
maintenance, and can call local_flush_tlb_all(), which also subsumes
some barriers. The remaining assembly is converted to use write_sysreg()
and isb().

Other than this, we safely manipulate TTBRs in the hibernate dance. The
code we install as part of the new TTBR0 mapping (the hibernated
kernel's swsusp_arch_suspend_exit) installs a zero page into TTBR1,
invalidates TLBs, then installs its preferred value. Upon being restored
to the middle of swsusp_arch_suspend, the new image will call
__cpu_suspend_exit, which will call cpu_uninstall_idmap, installing the
zero page in TTBR0 and invalidating all TLB entries.

Fixes: 82869ac5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NJames Morse <james.morse@arm.com>
Tested-by: NJames Morse <james.morse@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

0194e760

arm64: Handle el1 synchronous instruction aborts cleanly · 9adeb8e7

由 Laura Abbott 提交于 8月 09, 2016

Executing from a non-executable area gives an ugly message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0e08
lkdtm: attempting bad execution at ffff000008880700
Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
Hardware name: linux,dummy-virt (DT)
task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88

The 'IABT (current EL)' indicates the error but it's a bit cryptic
without knowledge of the ARM ARM. There is also no indication of the
specific address which triggered the fault. The increase in kernel
page permissions makes hitting this case more likely as well.
Handling the case in the vectors gives a much more familiar looking
error message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0840
lkdtm: attempting bad execution at ffff000008880680
Unable to handle kernel paging request at virtual address ffff000008880680
pgd = ffff8000089b2000
[ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000
Internal error: Oops: 8400000e [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24
Hardware name: linux,dummy-virt (DT)
task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

9adeb8e7

12 8月, 2016 6 次提交

MIPS: KVM: Propagate kseg0/mapped tlb fault errors · 9b731bcf

由 James Hogan 提交于 8月 11, 2016

Propagate errors from kvm_mips_handle_kseg0_tlb_fault() and
kvm_mips_handle_mapped_seg_tlb_fault(), usually triggering an internal
error since they normally indicate the guest accessed bad physical
memory or the commpage in an unexpected way.

Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9b731bcf

MIPS: KVM: Fix gfn range check in kseg0 tlb faults · 0741f52d

由 James Hogan 提交于 8月 11, 2016

Two consecutive gfns are loaded into host TLB, so ensure the range check
isn't off by one if guest_pmap_npages is odd.

Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0741f52d

MIPS: KVM: Add missing gfn range check · 8985d503

由 James Hogan 提交于 8月 11, 2016

kvm_mips_handle_mapped_seg_tlb_fault() calculates the guest frame number
based on the guest TLB EntryLo values, however it is not range checked
to ensure it lies within the guest_pmap. If the physical memory the
guest refers to is out of range then dump the guest TLB and emit an
internal error.

Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8985d503

MIPS: KVM: Fix mapped fault broken commpage handling · c604cffa

由 James Hogan 提交于 8月 11, 2016

kvm_mips_handle_mapped_seg_tlb_fault() appears to map the guest page at
virtual address 0 to PFN 0 if the guest has created its own mapping
there. The intention is unclear, but it may have been an attempt to
protect the zero page from being mapped to anything but the comm page in
code paths you wouldn't expect from genuine commpage accesses (guest
kernel mode cache instructions on that address, hitting trapping
instructions when executing from that address with a coincidental TLB
eviction during the KVM handling, and guest user mode accesses to that
address).

Fix this to check for mappings exactly at KVM_GUEST_COMMPAGE_ADDR (it
may not be at address 0 since commit 42aa12e7 ("MIPS: KVM: Move
commpage so 0x0 is unmapped")), and set the corresponding EntryLo to be
interpreted as 0 (invalid).

Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c604cffa

KVM: Protect device ops->create and list_add with kvm->lock · a28ebea2

由 Christoffer Dall 提交于 8月 09, 2016

KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.

Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.

The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a28ebea2

KVM: PPC: Move xics_debugfs_init out of create · 023e9fdd

由 Christoffer Dall 提交于 8月 09, 2016

As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.

Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

023e9fdd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功