提交 · de78a9c42a790011f179bc94a7da3f5d8721f4cc · openeuler / Kernel

21 4月, 2019 5 次提交

powerpc: Add a framework for Kernel Userspace Access Protection · de78a9c4

由 Christophe Leroy 提交于 4月 18, 2019

This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have the possibility to provide their own
implementation by providing setup_kuap() and
allow/prevent_user_access().

Some platforms will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.

mpe: Rename to allow/prevent rather than unlock/lock, and add
read/write wrappers. Drop the 32-bit code for now until we have an
implementation for it. Add kuap to pt_regs for 64-bit as well as
32-bit. Don't split strings, use pr_crit_ratelimited().
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de78a9c4

powerpc: Add skeleton for Kernel Userspace Execution Prevention · 0fb1c25a

由 Christophe Leroy 提交于 4月 18, 2019

This patch adds a skeleton for Kernel Userspace Execution Prevention.

Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
and provide setup_kuep() function.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
[mpe: Don't split strings, use pr_crit_ratelimited()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0fb1c25a

powerpc: Add framework for Kernel Userspace Protection · 69795cab

由 Christophe Leroy 提交于 4月 18, 2019

This patch adds a skeleton for Kernel Userspace Protection
functionnalities like Kernel Userspace Access Protection and Kernel
Userspace Execution Prevention

The subsequent implementation of KUAP for radix makes use of a MMU
feature in order to patch out assembly when KUAP is disabled or
unsupported. This won't work unless there's an entry point for KUP
support before the feature magic happens, so for PPC64 setup_kup() is
called early in setup.

On PPC32, feature_fixup() is done too early to allow the same.
Suggested-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

69795cab

powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle · 53a712ba

由 Michael Ellerman 提交于 4月 18, 2019

In order to implement KUAP (Kernel Userspace Access Protection) on
Power9 we will be using the AMR, and therefore indirectly the
UAMOR/AMOR.

So save/restore these regs in the idle code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

53a712ba

powerpc/powernv/idle: Restore IAMR after idle · a3f3072d

由 Russell Currey 提交于 4月 18, 2019

Without restoring the IAMR after idle, execution prevention on POWER9
with Radix MMU is overwritten and the kernel can freely execute
userspace without faulting.

This is necessary when returning from any stop state that modifies
user state, as well as hypervisor state.

To test how this fails without this patch, load the lkdtm driver and
do the following:

  $ echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT

which won't fault, then boot the kernel with powersave=off, where it
will fault. Applying this patch will fix this.

Fixes: 3b10d009 ("powerpc/mm/radix: Prevent kernel execution of user space")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Reviewed-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3f3072d

20 4月, 2019 23 次提交

powerpc/numa: document topology_updates_enabled, disable by default · 558f8649

由 Nathan Lynch 提交于 4月 18, 2019

Changing the NUMA associations for CPUs and memory at runtime is
basically unsupported by the core mm, scheduler etc. We see all manner
of crashes, warnings and instability when the pseries code tries to do
this. Disable this behavior by default, and document the switch a bit.
Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

558f8649

powerpc/numa: improve control of topology updates · 2d4d9b30

由 Nathan Lynch 提交于 4月 18, 2019

When booted with "topology_updates=no", or when "off" is written to
/proc/powerpc/topology_updates, NUMA reassignments are inhibited for
PRRN and VPHN events. However, migration and suspend unconditionally
re-enable reassignments via start_topology_update(). This is
incoherent.

Check the topology_updates_enabled flag in
start/stop_topology_update() so that callers of those APIs need not be
aware of whether reassignments are enabled. This allows the
administrative decision on reassignments to remain in force across
migrations and suspensions.
Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2d4d9b30

powerpc/powernv: Squash sparse warnings in opal-call.c · 2f9196b6

由 Andrew Donnellan 提交于 3月 14, 2019

sparse complains a lot about opal-call.c:

arch/powerpc/platforms/powernv/opal-call.c:128:1: warning: symbol 'opal_invalid_call' was not declared. Should it be static?
arch/powerpc/platforms/powernv/opal-call.c:129:1: warning: symbol 'opal_console_write' was not declared. Should it be static?
arch/powerpc/platforms/powernv/opal-call.c:130:1: warning: symbol 'opal_console_read' was not declared. Should it be static?

Those symbols are forward declared in opal.h, but we can't include that
because the function signatures in opal.h are different. So instead, just
add an extra forward declaration to the OPAL_CALL macro to shut sparse up.
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2f9196b6

powerpc/crypto: Use cheaper random numbers for crc-vpmsum self-test · 80d04b7f

由 George Spelvin 提交于 3月 21, 2019

This code was filling a 64K buffer from /dev/urandom in order to
compute a CRC over (on average half of) it by two different methods,
comparing the CRCs, and repeating.

This is not a remotely security-critical application, so use the far
faster and cheaper prandom_u32() generator.

And, while we're at it, only fill as much of the buffer as we plan to use.
Signed-off-by: NGeorge Spelvin <lkml@sdf.org>
Acked-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

80d04b7f

powerpc: Remove duplicate headers · 6917735e

由 Jagadeesh Pagadala 提交于 3月 23, 2019

Remove duplicate headers inclusions.
Signed-off-by: NJagadeesh Pagadala <jagdsh.linux@gmail.com>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6917735e

powerpc/8xx: Fix possible device node reference leak · cc76404f

由 Wen Yang 提交于 3月 26, 2019

The call to of_find_compatible_node() returns a node pointer with
refcount incremented thus it must be explicitly decremented after the
last usage.

irq_domain_add_linear() also calls of_node_get() to increase refcount,
so irq_domain() will not be affected when it is released.

Detected by coccinelle.

Fixes: a8db8cf0 ("irq_domain: Replace irq_alloc_host() with revmap-specific initializers")
Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
Suggested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NPeng Hao <peng.hao2@zte.com.cn>
Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cc76404f

powerpc/pseries: hwpoison the pages upon hitting UE · 7f177f98

由 Ganesh Goudar 提交于 4月 15, 2019

Add support to hwpoison the pages upon hitting machine check
exception.

This patch queues the address where UE is hit to percpu array
and schedules work to plumb it into memory poison infrastructure.
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com>
[mpe: Combine #ifdefs, drop PPC_BIT8(), and empty inline stub]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7f177f98

powerpc/83xx: Add missing of_node_put() after of_device_is_available() · 4df2cb63

由 Julia Lawall 提交于 2月 23, 2019

Add an of_node_put() when a tested device node is not available.

Fixes: c026c987 ("powerpc/83xx: Do not configure or probe disabled FSL DR USB controllers")
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4df2cb63

powerpc/pseries/pmem: Fix a set but not used value · e663e1e0

由 Qian Cai 提交于 4月 06, 2019

The commit 4c5d87db ("powerpc/pseries: PAPR persistent memory
support") set a local variable "count" in dlpar_hp_pmem() but never
use it.

  arch/powerpc/platforms/pseries/pmem.c: In function 'dlpar_hp_pmem':
  arch/powerpc/platforms/pseries/pmem.c:109:6: warning: variable 'count' set but not used
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e663e1e0

powerpc/pseries/iommu: Fix set but not used values · c05f57fd

由 Qian Cai 提交于 4月 06, 2019

The commit b7d6bf4f ("powerpc/pseries/pci: Remove obsolete SW
invalidate") left 2 variables unused.

  arch/powerpc/platforms/pseries/iommu.c:108:17: warning: variable 'tces' set but not used
    __be64 *tcep, *tces;
                   ^~~~

  arch/powerpc/platforms/pseries/iommu.c:132:17: warning: variable 'tces' set but not used
    __be64 *tcep, *tces;
                   ^~~~

Also, the commit 68c0449e ("powerpc/pseries/iommu: Use memory@
nodes in max RAM address calculation") set "ranges" in
ddw_memory_hotplug_max() but never use it.

  arch/powerpc/platforms/pseries/iommu.c: In function 'ddw_memory_hotplug_max':
  arch/powerpc/platforms/pseries/iommu.c:948:7: warning: variable 'ranges' set but not used
     int ranges, n_mem_addr_cells, n_mem_size_cells, len;
         ^~~~~~
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c05f57fd

powerpc/mm: Silence unused-but-set-variable warnings · bff25143

由 Qian Cai 提交于 3月 07, 2019

pte_unmap() compiles away on some powerpc platforms, so silence the
warnings below by making it a static inline function.

  mm/memory.c: In function 'copy_pte_range':
  mm/memory.c:820:24: warning: variable 'orig_dst_pte' set but not used
  mm/memory.c:820:9: warning: variable 'orig_src_pte' set but not used
  mm/madvise.c: In function 'madvise_free_pte_range':
  mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
  mm/swap_state.c: In function 'swap_ra_info':
  mm/swap_state.c:634:15: warning: variable 'orig_pte' set but not used
Suggested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bff25143

powerpc/mm: move warning from resize_hpt_for_hotplug() · f172acbf

由 Laurent Vivier 提交于 3月 13, 2019

resize_hpt_for_hotplug() reports a warning when it cannot
resize the hash page table ("Unable to resize hash page
table to target order") but in some cases it's not a problem
and can make user thinks something has not worked properly.

This patch moves the warning to arch_remove_memory() to
only report the problem when it is needed.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f172acbf

powerpc/mm/radix: Don't do SLB preload when using the radix MMU · f89bd8ba

由 Aneesh Kumar K.V 提交于 4月 09, 2019

Add radix_enabled() check to avoid SLB preload with radix translation.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f89bd8ba

powerpc/configs: Enable CONFIG_USB_XHCI_HCD by default · 24c174bb

由 Thomas Huth 提交于 2月 11, 2019

Recent versions of QEMU provide a XHCI device by default these
days instead of an old-fashioned OHCI device:

 https://git.qemu.org/?p=qemu.git;a=commitdiff;h=57040d451315320b7d27

So to get the keyboard working in the graphical console there again,
we should now include XHCI support in the kernel by default, too.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

24c174bb

powerpc/pseries/mce: Improve array initialization. · c9d8dda4

由 Mahesh Salgaonkar 提交于 3月 28, 2019

This is a follow up to the patch that fixed misleading print for TLB
mutlihit due to wrongly populated mc_err_types[] array. Convert all the
static array initialization to '[x] = val' style for better
readability of array indexing and avoid any further confusion.
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c9d8dda4

powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX · 56c46bba

由 Russell Currey 提交于 3月 27, 2019

With STRICT_KERNEL_RWX enabled anything marked __init is placed at a 16M
boundary.  This is necessary so that it can be repurposed later with
different permissions.  However, in kernels with text larger than 16M,
this pushes early_setup past 32M, incapable of being reached by the
branch instruction.

Fix this by setting the CTR and branching there instead.

Fixes: 1e0fc9d1 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: NRussell Currey <ruscur@russell.cc>
[mpe: Fix it to work on BE by using DOTSYM()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

56c46bba

powerpc/embedded6xx: Remove unused functions holly_power_off and holly_halt · 62611c1e

由 Mathieu Malaterre 提交于 3月 26, 2019

Silence the following warnings triggered using W=1:

arch/powerpc/platforms/embedded6xx/holly.c:236:6: error: no previous prototype for 'holly_power_off'
arch/powerpc/platforms/embedded6xx/holly.c:243:6: error: no previous prototype for 'holly_halt'
Signed-off-by: NMathieu Malaterre <malat@debian.org>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62611c1e

powerpc/embedded6xx: Make some functions static · 308be6c7

由 Mathieu Malaterre 提交于 3月 26, 2019

In commit cb9e4d10 ("[POWERPC] Add support for 750CL Holly board")
new functions were added. Since most of these functions can be made
static, make it so.

Both holly_power_off and holly_halt functions were not changed since
they are unused, making them static would have triggered the following
warning (treated as error):

arch/powerpc/platforms/embedded6xx/holly.c:244:13: error: 'holly_halt' defined but not used

Silence the following warnings triggered using W=1:

arch/powerpc/platforms/embedded6xx/holly.c:47:5: error: no previous prototype for 'holly_exclude_device'
arch/powerpc/platforms/embedded6xx/holly.c:190:6: error: no previous prototype for 'holly_show_cpuinfo'
arch/powerpc/platforms/embedded6xx/holly.c:196:17: error: no previous prototype for 'holly_restart'
Signed-off-by: NMathieu Malaterre <malat@debian.org>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

308be6c7

powerpc: vdso: Make vdso32 installation conditional in vdso_install · ff6d2782

由 Ben Hutchings 提交于 3月 22, 2019

The 32-bit vDSO is not needed and not normally built for 64-bit
little-endian configurations.  However, the vdso_install target still
builds and installs it.  Add the same config condition as is normally
used for the build.

Fixes: e0d00591 ("powerpc/vdso: Disable building the 32-bit VDSO ...")
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ff6d2782

powerpc/mm/64: Document the sizes of/sizes mapped by Pxx_INDEX_SIZE · eea86aa4

由 Michael Ellerman 提交于 3月 14, 2019

Add comments describing the size in bytes of the various levels of the
page table tree, and the size of the virtual address space mapped by
each level, to make it clear what the sizes are without having to also
look up other definitions.

The code that calculates the sizes actually uses sizeof(pgd_t) etc.,
so in theory these comments could skew vs the code, but the size of
pgd_t etc. is unlikely to change very often.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eea86aa4

powerpc/highmem: Change BUG_ON() to WARN_ON() · 6c84f8c5

由 Christophe Leroy 提交于 3月 07, 2019

In arch/powerpc/mm/highmem.c, BUG_ON() is called only when
CONFIG_DEBUG_HIGHMEM is selected, this means the BUG_ON() is not vital
and can be replaced by a a WARN_ON().

At the same time, use IS_ENABLED() instead of #ifdef to clean a bit.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6c84f8c5

powerpc: Fix defconfig choice logic when cross compiling · af5cd05d

由 Michael Ellerman 提交于 2月 07, 2019

Our logic for choosing defconfig doesn't work well in some situations.

For example if you're on a ppc64le machine but you specify a non-empty
CROSS_COMPILE, in order to use a non-default toolchain, then defconfig
will give you ppc64_defconfig (big endian):

  $ make CROSS_COMPILE=~/toolchains/gcc-8/bin/powerpc-linux- defconfig
  *** Default configuration is based on 'ppc64_defconfig'

This is because we assume that CROSS_COMPILE being set means we
can't be on a ppc machine and rather than checking we just default to
ppc64_defconfig.

We should just ignore CROSS_COMPILE, instead check the machine with
uname and if it's one of ppc, ppc64 or ppc64le then use that
defconfig. If it's none of those then we fall back to ppc64_defconfig.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

af5cd05d

powerpc/32: Add ppc_defconfig · a273fa38

由 Michael Ellerman 提交于 2月 07, 2019

Add a generic 32-bit defconfig called ppc_defconfig. This means we'll
have a defconfig matching "uname -m" for all cases.

This config is mostly intended for build testing but if someone wants
to tweak it to get it booting on something that would be fine too.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Tested-by: NMathieu Malaterre <malat@debian.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a273fa38

10 4月, 2019 1 次提交

powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs · cf7cf697

由 Michael Ellerman 提交于 4月 09, 2019

The recent commit 8bc08689 ("powerpc/mm: Only define
MAX_PHYSMEM_BITS in SPARSEMEM configurations") removed our definition
of MAX_PHYSMEM_BITS when SPARSEMEM is disabled.

This inadvertently broke some 64-bit FLATMEM using configs with eg:

  arch/powerpc/include/asm/book3s/64/mmu-hash.h:584:6: error: "MAX_PHYSMEM_BITS" is not defined, evaluates to 0
   #if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
        ^~~~~~~~~~~~~~~~

Fix it by making sure we define MAX_PHYSMEM_BITS for all 64-bit
configs regardless of SPARSEMEM.

Fixes: 8bc08689 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations")
Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
Reported-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cf7cf697

08 4月, 2019 2 次提交

powerpc/64s/radix: Fix radix segment exception handling · 7100e870

由 Nicholas Piggin 提交于 3月 29, 2019

Commit 48e7b769 ("powerpc/64s/hash: Convert SLB miss handlers to C")
broke the radix-mode segment exception handler. In radix mode, this is
exception is not an SLB miss, rather it signals that the EA is outside
the range translated by any page table.

The commit lost the radix feature alternate code patch, which can
cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!

The original radix code would send faults to slb_miss_large_addr,
which would end up faulting due to slb_addr_limit being 0. This patch
sends radix directly to do_bad_slb_fault, which is a bit clearer.

Fixes: 48e7b769 ("powerpc/64s/hash: Convert SLB miss handlers to C")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7100e870

powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 · dd9a994f

由 Christophe Leroy 提交于 4月 04, 2019

Commit b5b4453e ("powerpc/vdso64: Fix CLOCK_MONOTONIC
inconsistencies across Y2038") changed the type of wtom_clock_sec
to s64 on PPC64. Therefore, VDSO32 needs to read it with a 4 bytes
shift in order to retrieve the lower part of it.

Fixes: b5b4453e ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dd9a994f

01 4月, 2019 1 次提交

powerpc/32: Fix early boot failure with RTAS built-in · fd427103

由 Christophe Leroy 提交于 3月 25, 2019

Commit 0df977ea ("powerpc/6xx: Don't use SPRN_SPRG2 for storing
stack pointer while in RTAS") changes the code to use a field in
thread struct to store the stack pointer while in RTAS instead of
using SPRN_SPRG2. It therefore converts all places which were
manipulating SPRN_SPRG2 to use that field. During early startup, the
zeroing of SPRN_SPRG2 has been replaced by a zeroing of that field in
thread struct. But at least in start_here, that's done wrongly because
it used the physical address of the fields while MMU is on at that
time.

So the virtual address of the field should be used instead, but in
the meantime, thread struct has already been zeroed and initialised
so we can just drop this initialisation.
Reported-by: NLarry Finger <Larry.Finger@lwfinger.net>
Fixes: 0df977ea ("powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS")
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fd427103

29 3月, 2019 8 次提交

x86/realmode: Make set_real_mode_mem() static inline · f560bd19

由 Matteo Croce 提交于 3月 28, 2019

Remove the unused @size argument and move it into a header file, so it
can be inlined.

 [ bp: Massage. ]
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190328114233.27835-1-mcroce@redhat.com

f560bd19

powerpc/pseries/mce: Fix misleading print for TLB mutlihit · 6f845ebe

由 Mahesh Salgaonkar 提交于 3月 26, 2019

On pseries, TLB multihit are reported as D-Cache Multihit. This is because
the wrongly populated mc_err_types[] array. Per PAPR, TLB error type is 0x04
and mc_err_types[4] points to "D-Cache" instead of "TLB" string. Fixup the
mc_err_types[] array.

Machine check error type per PAPR:
  0x00 = Uncorrectable Memory Error (UE)
  0x01 = SLB error
  0x02 = ERAT Error
  0x04 = TLB error
  0x05 = D-Cache error
  0x07 = I-Cache error

Fixes: 8f0b8056 ("powerpc/pseries: Display machine check error details.")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6f845ebe

KVM: x86: update %rip after emulating IO · 45def77e

由 Sean Christopherson 提交于 3月 11, 2019

Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.

To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
Skip pio instruction when it is emulated, not executed") changed the
behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
instruction prior to exiting to userspace.  Full emulation doesn't need
such tricks becase re-emulating the instruction will naturally handle
%rip being changed to point at the reset vector.

Updating %rip prior to executing to userspace has several drawbacks:

  - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
    fails it will likely yell about the wrong address.
  - Single step exits to userspace for are effectively dropped as
    KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
  - Behavior of PIO emulation is different depending on whether it
    goes down the fast path or the slow path.

Rather than skip the PIO instruction before exiting to userspace,
snapshot the linear %rip and cancel PIO completion if the current
value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
common scenario, the snapshot and comparison has negligible overhead
as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
VMREAD in this case.

All other alternatives to snapshotting the linear %rip that don't
rely on an explicit reset announcenment suffer from one corner case
or another.  For example, canceling PIO completion on any write to
%rip fails if userspace does a save/restore of %rip, and attempting to
avoid that issue by canceling PIO only if %rip changed then fails if PIO
collides with the reset %rip.  Attempting to zero in on the exact reset
vector won't work for APs, which means adding more hooks such as the
vCPU's MP_STATE, and so on and so forth.

Checking for a linear %rip match technically suffers from corner cases,
e.g. userspace could theoretically rewrite the underlying code page and
expect a different instruction to execute, or the guest hardcodes a PIO
reset at 0xfffffff0, but those are far, far outside of what can be
considered normal operation.

Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
Cc: <stable@vger.kernel.org>
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45def77e

x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init · 013cc6eb

由 Vitaly Kuznetsov 提交于 3月 13, 2019

When userspace initializes guest vCPUs it may want to zero all supported
MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/
HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5 ("kvm/x86: Update SynIC
timers on guest entry only") we began doing stimer_mark_pending()
unconditionally on every config change.

The issue I'm observing manifests itself as following:
- Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as
  pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER;
- kvm_hv_has_stimer_pending() starts returning true;
- kvm_vcpu_has_events() starts returning true;
- kvm_arch_vcpu_runnable() starts returning true;
- when kvm_arch_vcpu_ioctl_run() gets into
  (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case:
  - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns
    immediately, avoiding normal wait path;
  - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing
    userspace to retry.

So instead of normal wait path we get a busy loop on all secondary vCPUs
before they get INIT signal. This seems to be undesirable, especially given
that this happens even when Hyper-V extensions are not used.

Generally, it seems to be pointless to mark an stimer as pending in
stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing
kvm_hv_process_stimers() will do is clear the corresponding bit. We may
just not mark disabled timers as pending instead.

Fixes: f3b138c5 ("kvm/x86: Update SynIC timers on guest entry only")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

013cc6eb

kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs · 2bdb76c0

由 Xiaoyao Li 提交于 3月 08, 2019

Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if
host doesn't suppot it. We should move it to array emulated_msrs from
arry msrs_to_save, to report to userspace that guest support this msr.
Signed-off-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bdb76c0

KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 0cf9135b

由 Sean Christopherson 提交于 3月 07, 2019

The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
regardless of hardware support under the pretense that KVM fully
emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).

Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
that it's emulated on AMD hosts.

Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
Cc: stable@vger.kernel.org
Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0cf9135b

kvm: mmu: Used range based flushing in slot_handle_level_range · f285c633

由 Ben Gardon 提交于 3月 12, 2019

Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address
in slot_handle_level_range. When range based flushes are not enabled
kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs.

This changes the behavior of many functions that indirectly use
slot_handle_level_range, iff the range based flushes are enabled. The
only potential problem I see with this is that kvm->tlbs_dirty will be
cleared less often, however the only caller of slot_handle_level_range that
checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which
checks it and does a kvm_flush_remote_tlbs after calling
kvm_unmap_hva_range anyway.

Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
without this patch. The patch introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f285c633

KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported · 3d9683cf

由 Masahiro Yamada 提交于 3月 18, 2019

I do not see any consistency about headers_install of <linux/kvm_para.h>
and <asm/kvm_para.h>.

According to my analysis of Linux 5.1-rc1, there are 3 groups:

 [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported

    alpha, arm, hexagon, mips, powerpc, s390, sparc, x86

 [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not

    arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc,
    parisc, sh, unicore32, xtensa

 [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported

    csky, nds32, riscv

This does not match to the actual KVM support. At least, [2] is
half-baked.

Nor do arch maintainers look like they care about this. For example,
commit 0add5371 ("microblaze: Add missing kvm_para.h to Kbuild")
exported <asm/kvm_para.h> to user-space in order to fix an in-kernel
build error.

We have two ways to make this consistent:

 [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all
     architectures, irrespective of the KVM support

 [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h>
     to the KVM support

My first attempt was [A] because the code looks cleaner, but Paolo
suggested [B].

So, this commit goes with [B].

For most architectures, <asm/kvm_para.h> was moved to the kernel-space.
I changed include/uapi/linux/Kbuild so that it checks generated
asm/kvm_para.h as well as check-in ones.

After this commit, there will be two groups:

 [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported

    arm, arm64, mips, powerpc, s390, x86

 [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported

    alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze,
    nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d9683cf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功